Re: [pcre-dev] A question about testinput18

Top Page
Delete this message
Author: Ze'ev Atlas
Date:  
To: pcre-dev@exim.org
Subject: Re: [pcre-dev] A question about testinput18
Hi PhilI have a different explanation, and in order to demonstrate it I've sent you screen shots in a simple text file (so we won't have the formatting issue.)  In my opinion pcre2test complains about the first character after the pattern - usually the first character in the next line.  See lines 49-51 where it complains about the '\' or, even more dramatic, lines 44-45 where it complains about the 'i' modifier.Look at line 87-88 (I skipped many lines to the tail of the output) where it complains about the '\' instead of complaining about a bad escape sequence. Ze'ev Atlas




      From: "ph10@???" <ph10@???>
 To: Ze'ev Atlas <zatlas1@???> 
Cc: Pcre Exim <pcre-dev@???>
 Sent: Monday, December 28, 2015 7:52 AM
 Subject: Re: A question about testinput18


On Mon, 28 Dec 2015, Ze'ev Atlas wrote:

> Hi PhilipI am testing PCRE2 21 RC1 in EBCDIC (on native z/OS).


Thanks!

Your emails are still coming through rather mangled and losing newlines
for some reason. I looked at the raw email text and found that you are
sending both a plain text and an HTML text copy. The HTML is formatted
OK, enabling me to read it, but the plain test starts like this:

------=_Part_4009314_1515033155.1451267841618
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi PhilipI am testing PCRE2 21 RC1 in EBCDIC (on native z/OS). =C2=A0There =
were quite a few changes in your test suite so I will have to go back and r=
edo some of the EBCDIC specific stuff. =C2=A0But before I do that I looked =
at my output vs. yours in what I deemed as relevant tests. =C2=A0That, in o=
rder to identify differences and either explain them or develop an EBCDIC c=
ounterpart. =C2=A0The most striking differences were in test 18.
in test 18 you get:-----------------------/abc/=C2=A0 =C2=A0abc\=3Dfind_lim=
its** Ignored with POSIX interface: find_limits=C2=A00: abc
/abc/=C2=A0 abc\=3Dpartial_hard** Ignored with POSIX interface: partial_har=
d=C2=A00: abc
# Real tests

As you can see, it looks like =C2=A0 appears where there should be
newlines, but sometimes there is nothing where a newline should be. I
used two different email clients but neither of them displays the plain
text sensibly.

Anyway, on to your substantive question:

  While I get:
 
  /abc/
     abc\=find_limits
  ** Invalid pattern delimiter ' ' (x40).

That looks very much as though pcre2test thinks that the subject line is
a pattern line, which is why it is complaining about an invalid pattern
delimiter. That seems very odd, since it hasn't complained about the
pattern line. The code around line 7259 of pcre2test.c is where it tests
to see if it has a compiled pattern, and if so, processes a data line.

  BOOL expectdata = TEST(compiled_code, !=, NULL);                       
#ifdef SUPPORT_PCRE2_8                                                 
  expectdata |= preg.re_pcre2_code != NULL;                       
#endif   

The second line is testing for a compiled pattern using the POSIX
interface. If that is going wrong, it would explain what you are seeing.

Philip

--
Philip Hazel

PCRE2 version 10.21-RC1 2015-12-15
# This set of tests is run only with the 8-bit library. It tests the POSIX      
# interface, which is supported only with the 8-bit library. This test should   
# not be run with JIT (which is not available for the POSIX interface).         
                                                                                
#forbid_utf                                                                     
#pattern posix                                                                  
                                                                                
# Test invalid options                                                          
                                                                                
/abc/auto_callout                                                               
** Ignored with POSIX interface: auto_callout                                   
                                                                                
/abc/                                                                           
   abc\=find_limits                                                             
** Invalid pattern delimiter ' ' (x40).                                         
                                                                                
/abc/                                                                           
  abc\=partial_hard                                                             
** Invalid pattern delimiter ' ' (x40).                                         
                                                                                
# Real tests                                                                    
                                                                                
/abc/                                                                           
    abc                                                                         
** Invalid pattern delimiter ' ' (x40).                                         
                                                                                
/¬abc|def/                                                                      
    abcdef                                                                      
** Invalid pattern delimiter ' ' (x40).                                         
    abcdef\=notbol                                                              
                                                                                
/.*((abc)$|(def))/                                                              
    defabc                                                                      
** Invalid pattern delimiter ' ' (x40).                                         
    defabc\=noteol                                                              
                                                                                
/the quick brown fox/                                                           
    the quick brown fox                                                         
** Invalid pattern delimiter ' ' (x40).                                         
\= Expect no match                                                              
    The Quick Brown Fox                                                         
                                                                                
/the quick brown fox/i                                                          
** Unrecognized modifier 'i' in 'i'                                             
    the quick brown fox                                                         
    The Quick Brown Fox                                                         
                                                                                
/(*LF)abc.def/                                                                  
\= Expect no match                                                              
** Invalid pattern delimiter '\' (xe0).                                         
    abc\ndef                                                                    
                                                                                
/(*LF)abc$/                                                                     
    abc                                                                         
** Invalid pattern delimiter ' ' (x40).                                         
    abc\n                                                                       
                                                                                
/(abc)\2/                                                                       
Failed: POSIX code 6: invalid backreference number                              
                                                                                
/(abc\1)/                                                                       
Failed: POSIX code 6: invalid backreference number                              
\= Expect no match                                                              
    abc                                                                         
                                                                                
/a*(b+)(z)(z)/                                                                  
    aaaabbbbzzzz                                                                
** Invalid pattern delimiter ' ' (x40).                                         
    aaaabbbbzzzz\=ovector=0                                                     
    aaaabbbbzzzz\=ovector=1                                                     
    aaaabbbbzzzz\=ovector=2                                                     
                                                                                
/(*ANY)ab.cd/                                                                   
    ab-cd                                                                       
** Invalid pattern delimiter ' ' (x40).                                         
    ab=cd                                                                       
............skipped many lines
/\w+A/ungreedy                                                                  
   CDAAAAB                                                                      
** Invalid pattern delimiter ' ' (x40).                                         
                                                                                
/\Biss\B/I,aftertext                                                            
** Unrecognized modifier 'I' in 'I'                                             
    Mississippi                                                                 
                                                                                
/abc/\                                                                          
Failed: POSIX code 5: last character is \                                       
                                                                                
"(?(?C)"                                                                        
                                                                                
/abcd/substitute_extended                                                       
** Ignored with POSIX interface: substitute_extended                            
                                                                                
/\[A]{1000000}**/expand,regerror_buffsize=31                                    
Failed: POSIX code 12: out of memory                                            
                                                                                
/\[A]{1000000}**/expand,regerror_buffsize=32                                    
Failed: POSIX code 12: out of memory                                            
                                                                                
# End of testdata/testinput18