[pcre-dev] Detecting starting code units

Top Page
Delete this message
Author: ND
Date:  
To: Pcre-dev
Subject: [pcre-dev] Detecting starting code units
Good day!


PCRE try to detect starting code units in attempt to apply a start
optimization.
As we can see from next two examples, it detects starting code units for
"[^ab]", but don't doing this for "[^a]". I think it looks a bit curiously.
May be "[^a]" can use the same algorithm as "[^ab]"?


PCRE2 version 10.33 2019-04-16
/[^a]/info
Capture group count = 0
Subject length lower bound = 1
x
0: x

/[^ab]/info
Capture group count = 0
Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0a
\x0b \x0c \x0d \x0e \x0f \x10 \x11 \x12 \x13 \x14 \x15 \x16 \x17 \x18
\x19
\x1a \x1b \x1c \x1d \x1e \x1f \x20 ! " # $ % & ' ( ) * + , - . / 0 1 2 3
4
5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y
Z [ \ ] ^ _ ` c d e f g h i j k l m n o p q r s t u v w x y z { | } ~
\x7f
\x80 \x81 \x82 \x83 \x84 \x85 \x86 \x87 \x88 \x89 \x8a \x8b \x8c \x8d
\x8e
\x8f \x90 \x91 \x92 \x93 \x94 \x95 \x96 \x97 \x98 \x99 \x9a \x9b \x9c
\x9d
\x9e \x9f \xa0 \xa1 \xa2 \xa3 \xa4 \xa5 \xa6 \xa7 \xa8 \xa9 \xaa \xab
\xac
\xad \xae \xaf \xb0 \xb1 \xb2 \xb3 \xb4 \xb5 \xb6 \xb7 \xb8 \xb9 \xba
\xbb
\xbc \xbd \xbe \xbf \xc0 \xc1 \xc2 \xc3 \xc4 \xc5 \xc6 \xc7 \xc8 \xc9
\xca
\xcb \xcc \xcd \xce \xcf \xd0 \xd1 \xd2 \xd3 \xd4 \xd5 \xd6 \xd7 \xd8
\xd9
\xda \xdb \xdc \xdd \xde \xdf \xe0 \xe1 \xe2 \xe3 \xe4 \xe5 \xe6 \xe7
\xe8
\xe9 \xea \xeb \xec \xed \xee \xef \xf0 \xf1 \xf2 \xf3 \xf4 \xf5 \xf6
\xf7
\xf8 \xf9 \xfa \xfb \xfc \xfd \xfe \xff
Subject length lower bound = 1
x
0: x



Thanks