Re: [pcre-dev] PCRE with UTF-8

Top Page
Delete this message
Author: Juergen Leising
Date:  
To: pcre-dev
Subject: Re: [pcre-dev] PCRE with UTF-8
On Thu, Jun 12, 2008 at 07:03:15PM +0530, Manohar S wrote:
> hi,
> I am Manohar. I am facing a problem with Utf-8 pattern matching. I have a
> text "select * from account where a = 'ਠਡਢಉಉವಷಡಢಣತಥವಷಡಢಣತಥ';"
> With PCRE_UTF8 and ".*" pattern, I am able to match the whole string. But is
> there any other way to match Utf-8 characters other than ".*" since I want
> to extract out *only* the utf-8 characters and nothing else. If i use ".* "
> it will match everything till '\n', is there any alternative to match all
> UTF-8 characters?


Hello Manohar,

/(\p{Gurmukhi}|\p{Kannada})+/

seems to work in UTF-8 mode. Or, for example:

echo "abcਠਡਢಉಉವಷಡಢಣತಥವಷಡಢಣತಥ';defg" | pcregrep -u --color=always "[\p{Gurmukhi}\p{Kannada}]+"

Bye, bye

Juergen