Hi,
I recently got a notification that PCRE does not support \u in PCRE_JAVASCRIPT_COMPAT mode.
I have checked the latest standard here:
http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-262.pdf
And it says:
15.10.1 Patterns
CharacterEscape ::
[...]
HexEscapeSequence
UnicodeEscapeSequence
[...]
Later, in 15.10.2.10 CharacterEscape:
The production CharacterEscape :: HexEscapeSequence evaluates by evaluating the CV of the HexEscapeSequence (see 7.8.4) and returning its character result.
The production CharacterEscape :: UnicodeEscapeSequence evaluates by evaluating the CV of the UnicodeEscapeSequence (see 7.8.4) and returning its character result.
7.8.4 String Literals
[...]
HexEscapeSequence ::
x HexDigit HexDigit
UnicodeEscapeSequence ::
u HexDigit HexDigit HexDigit HexDigit
[...]
Thus a \x hex escape in PCRE_JAVASCRIPT_COMPAT mode must be followed by two hexadecimal character, and evaluated as byte, while \u must be followed by four hexadecimal character, and evaluated as an unsigned short.
I have checked what happens if \u does not followed by 4 hex numbers
/b\u0041x/ matches to "bAx"
/b\u041x/ matches to "bu041x"
/b\x41x/ matches to "bAx"
/b\x1x/ matches to "bx1x"
So \u is simply converted to u, and \x as x if not followed by enough hex characters.
Philip, could we follow the standard here?
Regards,
Zoltan