Auteur: Ze'ev Atlas Date: À: pcre-dev@exim.org Sujet: Re: [pcre-dev] Ignoring a whole set of unicode characters
Philip wrote:>Final thought: It might be easier in the DFA matching function, because >that moves along the subject character by character, without
>backtracking. Thank you PhilipYou are probably correct to point that the DFA matching function is a better place to add such option because there, it won't be different performance wise (or even better) than writing (assuming conceptual Posix classes)if ($a=~ /\b([[:desired language consonant:](?:[:desired language mark:]*)]+)\b/) {...}to identify a 'valid' word in the desired language. However the pattern above (did not test it, so it might have some bugs) would not remove the undesired characters from the captured result.
In the end of the day, you are also correct to point that I could do something like$a=~s/[:desired language mark:]//;prior to the match exercise, so I guess I should go with that type of solution, at least until the wishlist is fulfilled :) Ze'ev Atlas