On Mon, 1 Oct 2012, Zoltán Herczeg wrote:
> Pcre has a nice feature, that you can change options by passing
> special control strings. E.g: /(*UTF8)a/ makes the pattern an UTF8
> pattern. I am sure most people are not aware of this feature. Its side
> effect can be used for denial service attacks, since the valid UTF
> checks are not affected by recursion limit checks. So the pattern
> above can slow down a web service, which runs patterns on an ascii
> input where the input buffer is huge. My problem is, that these flag
> changes cannot be prevented by software, and I think most developers
> are unaware of it (since this is just an extension). I know it is
> useful in certain cases, but I feel it may be exploited by harmful
> software.
>
> I have not any solution for this issue at the moment, I am just
> curious what do you think? Is this a real risk or not?
The complete list of options that can be changed within a pattern is:
(?i) caseless
(?J) allow duplicate names
(?m) multiline
(?s) single line (dotall)
(?U) default ungreedy (lazy)
(?x) extended (ignore white space)
(*UTF) PCRE_UTF8/16
(*UCP) PCRE_UCP
(*NO_START_OPT) PCRE_NO_START_OPTIMIZE
(*CR) )
(*LF) )
(*CRLF) ) select newline style
(*ANY) )
(*ANYCRLF) )
(*BSR_ANYCRLF) \R matches any of CR, LF, CRLF
(*BSR_UNICODE) \R matches any Unicode newline
The (*...) ones were added because users who could not alter their
application's code wanted access to the options.
If people are worried about this, we could provide a facility to compile
PCRE with the (*...) features disabled. Alternatively, we could provide
pcre_compile() and/or pcre_exec() options to disable them.
Philip
--
Philip Hazel