Re: [pcre-dev] cannot find regex for selectively replacing h…

Αρχική Σελίδα
Delete this message
Συντάκτης: Sheri
Ημερομηνία:  
Προς: pcre-dev
Αντικείμενο: Re: [pcre-dev] cannot find regex for selectively replacing html
Roman Blöth wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello folks,
>
>
> I'm desparately trying to find the correct regex for automatically
> surrounding every <img>-tag with a hyperlink using PHP's preg_replace
> (which uses PCRE).
>
> The catch is not to find the <img ...>-occurrences and surround them
> with a hyperlink, the problem is NOT TO do it, when the <img>-tag
> ALREADY IS surrounded by a hyperlink.
>
> The pattern I use is the following:
>
>     /(<img.*?[^>]*>)/i

>
> This gives me a fine backreference to every single <img>-tag in my
> string. But this pattern SHOULD NOT match, when right in front of the
> "<img ...>" comes a ">" or else when right behind the <img ...> comes a "<".
>
> I have tried to use patterns like e.g. /(<img.*?[^>]*>)[^<]/i, but this
> doesn't help anything -- after hours of try and error I have decided to
> give this list a try.
>
>
> Any help/comment really appreciated!
> Regards, Roman.
>

You need to look at lookahead and lookbehind assertions. They will let
you spell out what must or must not come before and after matches. If
its really as straightforward as not ">" before and not "<" after, it
might look like this:

/(?<!>)(<img.*?[^>]*>)(?!<)/i

That should give you the same matches you had before, except only when not immediately preceded with > or followed by <. It should also work equally well without surrounding the main match with parentheses, you could still reference the whole match by using $0 instead of $1.

Lookbehind assertions must be fixed length. In situations where you want to use a quantifier, instead of a lookbehind assertion you can use \K to reset the start of the match. \K is a relatively recent feature, your php would have to be using PCRE 7.2 or above.

Regards,
Sheri