[pcre-dev] Getting PCRE to accept *any* valid letter

Author: Patrick Questembert
Date:
To: pcre-dev
Subject: [pcre-dev] Getting PCRE to accept *any* valid letter

Hi guys,

I am using the PCRE library as part of the iPhone & Android ScanBizCards
business card reader - many thanks for a vital product!

We support 22 scanning languages and so we need PCRE to accept letters in
all these languages (French, German, Turkish etc). The primary needs are
for:
- \w to correctly treat letters as such
- :i to properly recognize the uppercase / lowercase correspondence of
letters

Neither are working in the default configuration and I started digging into
the issues - only to find rather complex manners of building "tables" for
what constitutes characters in various locales.

I may be missing something but assuming my needs are the modest ones
described above, is there no way to tell PCRE to accept ANY character in ANY
language? After all, if I parse a sentence in a French book and there
happens to be a Greek character on the page why should I not treat it as a
letter? I don't believe there is a significant performance impact of
accepting all letters always, testing for letters is a simple matter of
accepting entire ranges in the Unicode table.

I have simple functions elsewhere in my code (isLetter, isLower, isUpper,
toLower, toUpper) doing precisely that - if PCRE doesn't have a generic
mode per the above could you advise in which source file(s) I would need to
insert my functions to obtain the desired generic character acceptance?

Thanks,
Patrick

This message is part of the following thread:
	the complete thread tree sorted by date

	Philip Hazel at