add pcre to in tree libs

git-svn-id: http://svn.freeswitch.org/svn/freeswitch/trunk@3732 d0543943-73ff-0310-b7d9-9358b9ac24b2
This commit is contained in:
Michael Jerris
2006-12-19 19:54:26 +00:00
parent f82b80b57c
commit 9da5d7e90f
166 changed files with 125793 additions and 0 deletions

View File

@@ -0,0 +1,128 @@
<html>
<head>
<title>PCRE specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>Perl-compatible Regular Expressions (PCRE)</h1>
<p>
The HTML documentation for PCRE comprises the following pages:
</p>
<table>
<tr><td><a href="pcre.html">pcre</a></td>
<td>&nbsp;&nbsp;Introductory page</td></tr>
<tr><td><a href="pcreapi.html">pcreapi</a></td>
<td>&nbsp;&nbsp;PCRE's native API</td></tr>
<tr><td><a href="pcrebuild.html">pcrebuild</a></td>
<td>&nbsp;&nbsp;Options for building PCRE</td></tr>
<tr><td><a href="pcrecallout.html">pcrecallout</a></td>
<td>&nbsp;&nbsp;The <i>callout</i> facility</td></tr>
<tr><td><a href="pcrecompat.html">pcrecompat</a></td>
<td>&nbsp;&nbsp;Compability with Perl</td></tr>
<tr><td><a href="pcrecpp.html">pcrecpp</a></td>
<td>&nbsp;&nbsp;The C++ wrapper for the PCRE library</td></tr>
<tr><td><a href="pcregrep.html">pcregrep</a></td>
<td>&nbsp;&nbsp;The <b>pcregrep</b> command</td></tr>
<tr><td><a href="pcrematching.html">pcrematching</a></td>
<td>&nbsp;&nbsp;Discussion of the two matching algorithms</td></tr>
<tr><td><a href="pcrepartial.html">pcrepartial</a></td>
<td>&nbsp;&nbsp;Using PCRE for partial matching</td></tr>
<tr><td><a href="pcrepattern.html">pcrepattern</a></td>
<td>&nbsp;&nbsp;Specification of the regular expressions supported by PCRE</td></tr>
<tr><td><a href="pcreperform.html">pcreperform</a></td>
<td>&nbsp;&nbsp;Some comments on performance</td></tr>
<tr><td><a href="pcreposix.html">pcreposix</a></td>
<td>&nbsp;&nbsp;The POSIX API to the PCRE library</td></tr>
<tr><td><a href="pcreprecompile.html">pcreprecompile</a></td>
<td>&nbsp;&nbsp;How to save and re-use compiled patterns</td></tr>
<tr><td><a href="pcresample.html">pcresample</a></td>
<td>&nbsp;&nbsp;Description of the sample program</td></tr>
<tr><td><a href="pcrestack.html">pcrestack</a></td>
<td>&nbsp;&nbsp;Discussion of PCRE's stack usage</td></tr>
<tr><td><a href="pcretest.html">pcretest</a></td>
<td>&nbsp;&nbsp;The <b>pcretest</b> command for testing PCRE</td></tr>
</table>
<p>
There are also individual pages that summarize the interface for each function
in the library:
</p>
<table>
<tr><td><a href="pcre_compile.html">pcre_compile</a></td>
<td>&nbsp;&nbsp;Compile a regular expression</td></tr>
<tr><td><a href="pcre_compile2.html">pcre_compile2</a></td>
<td>&nbsp;&nbsp;Compile a regular expression (alternate interface)</td></tr>
<tr><td><a href="pcre_config.html">pcre_config</a></td>
<td>&nbsp;&nbsp;Show build-time configuration options</td></tr>
<tr><td><a href="pcre_copy_named_substring.html">pcre_copy_named_substring</a></td>
<td>&nbsp;&nbsp;Extract named substring into given buffer</td></tr>
<tr><td><a href="pcre_copy_substring.html">pcre_copy_substring</a></td>
<td>&nbsp;&nbsp;Extract numbered substring into given buffer</td></tr>
<tr><td><a href="pcre_dfa_exec.html">pcre_dfa_exec</a></td>
<td>&nbsp;&nbsp;Match a compiled pattern to a subject string
(DFA algorithm; <i>not</i> Perl compatible)</td></tr>
<tr><td><a href="pcre_exec.html">pcre_exec</a></td>
<td>&nbsp;&nbsp;Match a compiled pattern to a subject string
(Perl compatible)</td></tr>
<tr><td><a href="pcre_free_substring.html">pcre_free_substring</a></td>
<td>&nbsp;&nbsp;Free extracted substring</td></tr>
<tr><td><a href="pcre_free_substring_list.html">pcre_free_substring_list</a></td>
<td>&nbsp;&nbsp;Free list of extracted substrings</td></tr>
<tr><td><a href="pcre_fullinfo.html">pcre_fullinfo</a></td>
<td>&nbsp;&nbsp;Extract information about a pattern</td></tr>
<tr><td><a href="pcre_get_named_substring.html">pcre_get_named_substring</a></td>
<td>&nbsp;&nbsp;Extract named substring into new memory</td></tr>
<tr><td><a href="pcre_get_stringnumber.html">pcre_get_stringnumber</a></td>
<td>&nbsp;&nbsp;Convert captured string name to number</td></tr>
<tr><td><a href="pcre_get_substring.html">pcre_get_substring</a></td>
<td>&nbsp;&nbsp;Extract numbered substring into new memory</td></tr>
<tr><td><a href="pcre_get_substring_list.html">pcre_get_substring_list</a></td>
<td>&nbsp;&nbsp;Extract all substrings into new memory</td></tr>
<tr><td><a href="pcre_info.html">pcre_info</a></td>
<td>&nbsp;&nbsp;Obsolete information extraction function</td></tr>
<tr><td><a href="pcre_maketables.html">pcre_maketables</a></td>
<td>&nbsp;&nbsp;Build character tables in current locale</td></tr>
<tr><td><a href="pcre_refcount.html">pcre_refcount</a></td>
<td>&nbsp;&nbsp;Maintain reference count in compiled pattern</td></tr>
<tr><td><a href="pcre_study.html">pcre_study</a></td>
<td>&nbsp;&nbsp;Study a compiled pattern</td></tr>
<tr><td><a href="pcre_version.html">pcre_version</a></td>
<td>&nbsp;&nbsp;Return PCRE version and release date</td></tr>
</table>
</html>

View File

@@ -0,0 +1,252 @@
<html>
<head>
<title>pcre specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">INTRODUCTION</a>
<li><a name="TOC2" href="#SEC2">USER DOCUMENTATION</a>
<li><a name="TOC3" href="#SEC3">LIMITATIONS</a>
<li><a name="TOC4" href="#SEC4">UTF-8 AND UNICODE PROPERTY SUPPORT</a>
<li><a name="TOC5" href="#SEC5">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">INTRODUCTION</a><br>
<P>
The PCRE library is a set of functions that implement regular expression
pattern matching using the same syntax and semantics as Perl, with just a few
differences. The current implementation of PCRE (release 6.x) corresponds
approximately with Perl 5.8, including support for UTF-8 encoded strings and
Unicode general category properties. However, this support has to be explicitly
enabled; it is not the default.
</P>
<P>
In addition to the Perl-compatible matching function, PCRE also contains an
alternative matching function that matches the same compiled patterns in a
different way. In certain circumstances, the alternative function has some
advantages. For a discussion of the two matching algorithms, see the
<a href="pcrematching.html"><b>pcrematching</b></a>
page.
</P>
<P>
PCRE is written in C and released as a C library. A number of people have
written wrappers and interfaces of various kinds. In particular, Google Inc.
have provided a comprehensive C++ wrapper. This is now included as part of the
PCRE distribution. The
<a href="pcrecpp.html"><b>pcrecpp</b></a>
page has details of this interface. Other people's contributions can be found
in the <i>Contrib</i> directory at the primary FTP site, which is:
<a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre</a>
</P>
<P>
Details of exactly which Perl regular expression features are and are not
supported by PCRE are given in separate documents. See the
<a href="pcrepattern.html"><b>pcrepattern</b></a>
and
<a href="pcrecompat.html"><b>pcrecompat</b></a>
pages.
</P>
<P>
Some features of PCRE can be included, excluded, or changed when the library is
built. The
<a href="pcre_config.html"><b>pcre_config()</b></a>
function makes it possible for a client to discover which features are
available. The features themselves are described in the
<a href="pcrebuild.html"><b>pcrebuild</b></a>
page. Documentation about building PCRE for various operating systems can be
found in the <b>README</b> file in the source distribution.
</P>
<P>
The library contains a number of undocumented internal functions and data
tables that are used by more than one of the exported external functions, but
which are not intended for use by external callers. Their names all begin with
"_pcre_", which hopefully will not provoke any name clashes. In some
environments, it is possible to control which external symbols are exported
when a shared library is built, and in these cases the undocumented symbols are
not exported.
</P>
<br><a name="SEC2" href="#TOC1">USER DOCUMENTATION</a><br>
<P>
The user documentation for PCRE comprises a number of different sections. In
the "man" format, each of these is a separate "man page". In the HTML format,
each is a separate page, linked from the index page. In the plain text format,
all the sections are concatenated, for ease of searching. The sections are as
follows:
<pre>
pcre this document
pcreapi details of PCRE's native C API
pcrebuild options for building PCRE
pcrecallout details of the callout feature
pcrecompat discussion of Perl compatibility
pcrecpp details of the C++ wrapper
pcregrep description of the <b>pcregrep</b> command
pcrematching discussion of the two matching algorithms
pcrepartial details of the partial matching facility
pcrepattern syntax and semantics of supported regular expressions
pcreperform discussion of performance issues
pcreposix the POSIX-compatible C API
pcreprecompile details of saving and re-using precompiled patterns
pcresample discussion of the sample program
pcrestack discussion of stack usage
pcretest description of the <b>pcretest</b> testing command
</pre>
In addition, in the "man" and HTML formats, there is a short page for each
C library function, listing its arguments and results.
</P>
<br><a name="SEC3" href="#TOC1">LIMITATIONS</a><br>
<P>
There are some size limitations in PCRE but it is hoped that they will never in
practice be relevant.
</P>
<P>
The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is
compiled with the default internal linkage size of 2. If you want to process
regular expressions that are truly enormous, you can compile PCRE with an
internal linkage size of 3 or 4 (see the <b>README</b> file in the source
distribution and the
<a href="pcrebuild.html"><b>pcrebuild</b></a>
documentation for details). In these cases the limit is substantially larger.
However, the speed of execution will be slower.
</P>
<P>
All values in repeating quantifiers must be less than 65536. The maximum
compiled length of subpattern with an explicit repeat count is 30000 bytes. The
maximum number of capturing subpatterns is 65535.
</P>
<P>
There is no limit to the number of non-capturing subpatterns, but the maximum
depth of nesting of all kinds of parenthesized subpattern, including capturing
subpatterns, assertions, and other types of subpattern, is 200.
</P>
<P>
The maximum length of name for a named subpattern is 32, and the maximum number
of named subpatterns is 10000.
</P>
<P>
The maximum length of a subject string is the largest positive number that an
integer variable can hold. However, when using the traditional matching
function, PCRE uses recursion to handle subpatterns and indefinite repetition.
This means that the available stack space may limit the size of a subject
string that can be processed by certain patterns. For a discussion of stack
issues, see the
<a href="pcrestack.html"><b>pcrestack</b></a>
documentation.
<a name="utf8support"></a></P>
<br><a name="SEC4" href="#TOC1">UTF-8 AND UNICODE PROPERTY SUPPORT</a><br>
<P>
From release 3.3, PCRE has had some support for character strings encoded in
the UTF-8 format. For release 4.0 this was greatly extended to cover most
common requirements, and in release 5.0 additional support for Unicode general
category properties was added.
</P>
<P>
In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
the code, and, in addition, you must call
<a href="pcre_compile.html"><b>pcre_compile()</b></a>
with the PCRE_UTF8 option flag. When you do this, both the pattern and any
subject strings that are matched against it are treated as UTF-8 strings
instead of just strings of bytes.
</P>
<P>
If you compile PCRE with UTF-8 support, but do not use it at run time, the
library will be a bit bigger, but the additional run time overhead is limited
to testing the PCRE_UTF8 flag in several places, so should not be very large.
</P>
<P>
If PCRE is built with Unicode character property support (which implies UTF-8
support), the escape sequences \p{..}, \P{..}, and \X are supported.
The available properties that can be tested are limited to the general
category properties such as Lu for an upper case letter or Nd for a decimal
number, the Unicode script names such as Arabic or Han, and the derived
properties Any and L&. A full list is given in the
<a href="pcrepattern.html"><b>pcrepattern</b></a>
documentation. Only the short names for properties are supported. For example,
\p{L} matches a letter. Its Perl synonym, \p{Letter}, is not supported.
Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
compatibility with Perl 5.6. PCRE does not support this.
</P>
<P>
The following comments apply when PCRE is running in UTF-8 mode:
</P>
<P>
1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
are checked for validity on entry to the relevant functions. If an invalid
UTF-8 string is passed, an error return is given. In some situations, you may
already know that your strings are valid, and therefore want to skip these
checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag
at compile time or at run time, PCRE assumes that the pattern or subject it
is given (respectively) contains only valid UTF-8 codes. In this case, it does
not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to
PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
may crash.
</P>
<P>
2. An unbraced hexadecimal escape sequence (such as \xb3) matches a two-byte
UTF-8 character if the value is greater than 127.
</P>
<P>
3. Octal numbers up to \777 are recognized, and match two-byte UTF-8
characters for values greater than \177.
</P>
<P>
4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
bytes, for example: \x{100}{3}.
</P>
<P>
5. The dot metacharacter matches one UTF-8 character instead of a single byte.
</P>
<P>
6. The escape sequence \C can be used to match a single byte in UTF-8 mode,
but its use can lead to some strange effects. This facility is not available in
the alternative matching function, <b>pcre_dfa_exec()</b>.
</P>
<P>
7. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
test characters of any code value, but the characters that PCRE recognizes as
digits, spaces, or word characters remain the same set as before, all with
values less than 256. This remains true even when PCRE includes Unicode
property support, because to do otherwise would slow down PCRE in many common
cases. If you really want to test for a wider sense of, say, "digit", you
must use Unicode property tests such as \p{Nd}.
</P>
<P>
8. Similarly, characters that match the POSIX named character classes are all
low-valued characters.
</P>
<P>
9. Case-insensitive matching applies only to characters whose values are less
than 128, unless PCRE is built with Unicode property support. Even when Unicode
property support is available, PCRE still uses its own character tables when
checking the case of low-valued characters, so as not to degrade performance.
The Unicode property information is used only for characters with higher
values. Even when Unicode property support is available, PCRE supports
case-insensitive matching only when there is a one-to-one mapping between a
letter's cases. There are a small number of many-to-one mappings in Unicode;
these are not supported by PCRE.
</P>
<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
University Computing Service,
<br>
Cambridge CB2 3QG, England.
</P>
<P>
Putting an actual email address here seems to have been a spam magnet, so I've
taken it away. If you want to email me, use my initial and surname, separated
by a dot, at the domain ucs.cam.ac.uk.
Last updated: 05 June 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,80 @@
<html>
<head>
<title>pcre_compile specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_compile man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b>
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
<b>const unsigned char *<i>tableptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function compiles a regular expression into an internal form. Its
arguments are:
<pre>
<i>pattern</i> A zero-terminated string containing the
regular expression to be compiled
<i>options</i> Zero or more option bits
<i>errptr</i> Where to put an error message
<i>erroffset</i> Offset in pattern where error was found
<i>tableptr</i> Pointer to character tables, or NULL to
use the built-in default
</pre>
The option bits are:
<pre>
PCRE_ANCHORED Force pattern anchoring
PCRE_AUTO_CALLOUT Compile automatic callouts
PCRE_CASELESS Do caseless matching
PCRE_DOLLAR_ENDONLY $ not to match newline at end
PCRE_DOTALL . matches anything including NL
PCRE_DUPNAMES Allow duplicate names for subpatterns
PCRE_EXTENDED Ignore whitespace and # comments
PCRE_EXTRA PCRE extra features
(not much use currently)
PCRE_FIRSTLINE Force matching to be before newline
PCRE_MULTILINE ^ and $ match newlines within data
PCRE_NEWLINE_CR Set CR as the newline sequence
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
theses (named ones available)
PCRE_UNGREEDY Invert greediness of quantifiers
PCRE_UTF8 Run in UTF-8 mode
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
validity (only relevant if
PCRE_UTF8 is set)
</pre>
PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
PCRE_NO_UTF8_CHECK.
</P>
<P>
The yield of the function is a pointer to a private data structure that
contains the compiled pattern, or NULL if an error was detected.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,85 @@
<html>
<head>
<title>pcre_compile2 specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_compile2 man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>pcre *pcre_compile2(const char *<i>pattern</i>, int <i>options</i>,</b>
<b>int *<i>errorcodeptr</i>,</b>
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
<b>const unsigned char *<i>tableptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function compiles a regular expression into an internal form. It is the
same as <b>pcre_compile()</b>, except for the addition of the <i>errorcodeptr</i>
argument. The arguments are:
</P>
<P>
<pre>
<i>pattern</i> A zero-terminated string containing the
regular expression to be compiled
<i>options</i> Zero or more option bits
<i>errorcodeptr</i> Where to put an error code
<i>errptr</i> Where to put an error message
<i>erroffset</i> Offset in pattern where error was found
<i>tableptr</i> Pointer to character tables, or NULL to
use the built-in default
</pre>
The option bits are:
<pre>
PCRE_ANCHORED Force pattern anchoring
PCRE_AUTO_CALLOUT Compile automatic callouts
PCRE_CASELESS Do caseless matching
PCRE_DOLLAR_ENDONLY $ not to match newline at end
PCRE_DOTALL . matches anything including NL
PCRE_DUPNAMES Allow duplicate names for subpatterns
PCRE_EXTENDED Ignore whitespace and # comments
PCRE_EXTRA PCRE extra features
(not much use currently)
PCRE_FIRSTLINE Force matching to be before newline
PCRE_MULTILINE ^ and $ match newlines within data
PCRE_NEWLINE_CR Set CR as the newline sequence
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
theses (named ones available)
PCRE_UNGREEDY Invert greediness of quantifiers
PCRE_UTF8 Run in UTF-8 mode
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
validity (only relevant if
PCRE_UTF8 is set)
</pre>
PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
PCRE_NO_UTF8_CHECK.
</P>
<P>
The yield of the function is a pointer to a private data structure that
contains the compiled pattern, or NULL if an error was detected.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,62 @@
<html>
<head>
<title>pcre_config specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_config man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_config(int <i>what</i>, void *<i>where</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function makes it possible for a client program to find out which optional
features are available in the version of the PCRE library it is using. Its
arguments are as follows:
<pre>
<i>what</i> A code specifying what information is required
<i>where</i> Points to where to put the data
</pre>
The available codes are:
<pre>
PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
PCRE_CONFIG_MATCH_LIMIT Internal resource limit
PCRE_CONFIG_MATCH_LIMIT_RECURSION
Internal recursion depth limit
PCRE_CONFIG_NEWLINE Value of the newline sequence
PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
Threshold of return slots, above
which <b>malloc()</b> is used by
the POSIX API
PCRE_CONFIG_STACKRECURSE Recursion implementation (1=stack 0=heap)
PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no)
PCRE_CONFIG_UNICODE_PROPERTIES
Availability of Unicode property support
(1=yes 0=no)
</pre>
The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,53 @@
<html>
<head>
<title>pcre_copy_named_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_copy_named_substring man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_copy_named_substring(const pcre *<i>code</i>,</b>
<b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
<b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
<b>char *<i>buffer</i>, int <i>buffersize</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a captured substring, identified
by name, into a given buffer. The arguments are:
<pre>
<i>code</i> Pattern that was successfully matched
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
<i>stringname</i> Name of the required substring
<i>buffer</i> Buffer to receive the string
<i>buffersize</i> Size of buffer
</pre>
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if the buffer was
too small, or PCRE_ERROR_NOSUBSTRING if the string name is invalid.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,51 @@
<html>
<head>
<title>pcre_copy_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_copy_substring man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
<b>int <i>stringcount</i>, int <i>stringnumber</i>, char *<i>buffer</i>,</b>
<b>int <i>buffersize</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a captured substring into a given
buffer. The arguments are:
<pre>
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
<i>stringnumber</i> Number of the required substring
<i>buffer</i> Buffer to receive the string
<i>buffersize</i> Size of buffer
</pre>
The yield is the legnth of the string, PCRE_ERROR_NOMEMORY if the buffer was
too small, or PCRE_ERROR_NOSUBSTRING if the string number is invalid.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,93 @@
<html>
<head>
<title>pcre_dfa_exec specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_dfa_exec man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_dfa_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
<b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
<b>int *<i>workspace</i>, int <i>wscount</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function matches a compiled regular expression against a given subject
string, using a DFA matching algorithm (<i>not</i> Perl-compatible). Note that
the main, Perl-compatible, matching function is <b>pcre_exec()</b>. The
arguments for this function are:
<pre>
<i>code</i> Points to the compiled pattern
<i>extra</i> Points to an associated <b>pcre_extra</b> structure,
or is NULL
<i>subject</i> Points to the subject string
<i>length</i> Length of the subject string, in bytes
<i>startoffset</i> Offset in bytes in the subject at which to
start matching
<i>options</i> Option bits
<i>ovector</i> Points to a vector of ints for result offsets
<i>ovecsize</i> Number of elements in the vector
<i>workspace</i> Points to a vector of ints used as working space
<i>wscount</i> Number of elements in the vector
</pre>
The options are:
<pre>
PCRE_ANCHORED Match only at the first position
PCRE_NEWLINE_CR Set CR as the newline sequence
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NOTBOL Subject is not the beginning of a line
PCRE_NOTEOL Subject is not the end of a line
PCRE_NOTEMPTY An empty string is not a valid match
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
validity (only relevant if PCRE_UTF8
was set at compile time)
PCRE_PARTIAL Return PCRE_ERROR_PARTIAL for a partial match
PCRE_DFA_SHORTEST Return only the shortest match
PCRE_DFA_RESTART This is a restart after a partial match
</pre>
There are restrictions on what may appear in a pattern when matching using the
DFA algorithm is requested. Details are given in the
<a href="pcrematching.html"><b>pcrematching</b></a>
documentation.
</P>
<P>
A <b>pcre_extra</b> structure contains the following fields:
<pre>
<i>flags</i> Bits indicating which fields are set
<i>study_data</i> Opaque data from <b>pcre_study()</b>
<i>match_limit</i> Limit on internal resource use
<i>match_limit_recursion</i> Limit on internal recursion depth
<i>callout_data</i> Opaque data passed back to callouts
<i>tables</i> Points to character tables or is NULL
</pre>
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
PCRE_EXTRA_TABLES. For DFA matching, the <i>match_limit</i> and
<i>match_limit_recursion</i> fields are not used, and must not be set.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,84 @@
<html>
<head>
<title>pcre_exec specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_exec man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
<b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function matches a compiled regular expression against a given subject
string, using a matching algorithm that is similar to Perl's. It returns
offsets to captured substrings. Its arguments are:
<pre>
<i>code</i> Points to the compiled pattern
<i>extra</i> Points to an associated <b>pcre_extra</b> structure,
or is NULL
<i>subject</i> Points to the subject string
<i>length</i> Length of the subject string, in bytes
<i>startoffset</i> Offset in bytes in the subject at which to
start matching
<i>options</i> Option bits
<i>ovector</i> Points to a vector of ints for result offsets
<i>ovecsize</i> Number of elements in the vector (a multiple of 3)
</pre>
The options are:
<pre>
PCRE_ANCHORED Match only at the first position
PCRE_NEWLINE_CR Set CR as the newline sequence
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
PCRE_NEWLINE_LF Set LF as the newline sequence
PCRE_NOTBOL Subject is not the beginning of a line
PCRE_NOTEOL Subject is not the end of a line
PCRE_NOTEMPTY An empty string is not a valid match
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
validity (only relevant if PCRE_UTF8
was set at compile time)
PCRE_PARTIAL Return PCRE_ERROR_PARTIAL for a partial match
</pre>
There are restrictions on what may appear in a pattern when partial matching is
requested.
</P>
<P>
A <b>pcre_extra</b> structure contains the following fields:
<pre>
<i>flags</i> Bits indicating which fields are set
<i>study_data</i> Opaque data from <b>pcre_study()</b>
<i>match_limit</i> Limit on internal resource use
<i>match_limit_recursion</i> Limit on internal recursion depth
<i>callout_data</i> Opaque data passed back to callouts
<i>tables</i> Points to character tables or is NULL
</pre>
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
PCRE_EXTRA_TABLES.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,40 @@
<html>
<head>
<title>pcre_free_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_free_substring man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>void pcre_free_substring(const char *<i>stringptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for freeing the store obtained by a previous
call to <b>pcre_get_substring()</b> or <b>pcre_get_named_substring()</b>. Its
only argument is a pointer to the string.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,40 @@
<html>
<head>
<title>pcre_free_substring_list specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_free_substring_list man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>void pcre_free_substring_list(const char **<i>stringptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for freeing the store obtained by a previous
call to <b>pcre_get_substring_list()</b>. Its only argument is a pointer to the
list of string pointers.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,71 @@
<html>
<head>
<title>pcre_fullinfo specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_fullinfo man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_fullinfo(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
<b>int <i>what</i>, void *<i>where</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function returns information about a compiled pattern. Its arguments are:
<pre>
<i>code</i> Compiled regular expression
<i>extra</i> Result of <b>pcre_study()</b> or NULL
<i>what</i> What information is required
<i>where</i> Where to put the information
</pre>
The following information is available:
<pre>
PCRE_INFO_BACKREFMAX Number of highest back reference
PCRE_INFO_CAPTURECOUNT Number of capturing subpatterns
PCRE_INFO_DEFAULT_TABLES Pointer to default tables
PCRE_INFO_FIRSTBYTE Fixed first byte for a match, or
-1 for start of string
or after newline, or
-2 otherwise
PCRE_INFO_FIRSTTABLE Table of first bytes
(after studying)
PCRE_INFO_LASTLITERAL Literal last byte required
PCRE_INFO_NAMECOUNT Number of named subpatterns
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
PCRE_INFO_NAMETABLE Pointer to name table
PCRE_INFO_OPTIONS Options used for compilation
PCRE_INFO_SIZE Size of compiled pattern
PCRE_INFO_STUDYSIZE Size of study data
</pre>
The yield of the function is zero on success or:
<pre>
PCRE_ERROR_NULL the argument <i>code</i> was NULL
the argument <i>where</i> was NULL
PCRE_ERROR_BADMAGIC the "magic number" was not found
PCRE_ERROR_BADOPTION the value of <i>what</i> was invalid
</PRE>
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,54 @@
<html>
<head>
<title>pcre_get_named_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_get_named_substring man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_get_named_substring(const pcre *<i>code</i>,</b>
<b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
<b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
<b>const char **<i>stringptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a captured substring by name. The
arguments are:
<pre>
<i>code</i> Compiled pattern
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
<i>stringname</i> Name of the required substring
<i>stringptr</i> Where to put the string pointer
</pre>
The memory in which the substring is placed is obtained by calling
<b>pcre_malloc()</b>. The yield of the function is the length of the extracted
substring, PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
PCRE_ERROR_NOSUBSTRING if the string name is invalid.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,46 @@
<html>
<head>
<title>pcre_get_stringnumber specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_get_stringnumber man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_get_stringnumber(const pcre *<i>code</i>,</b>
<b>const char *<i>name</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This convenience function finds the number of a named substring capturing
parenthesis in a compiled pattern. Its arguments are:
<pre>
<i>code</i> Compiled regular expression
<i>name</i> Name whose number is required
</pre>
The yield of the function is the number of the parenthesis if the name is
found, or PCRE_ERROR_NOSUBSTRING otherwise.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,52 @@
<html>
<head>
<title>pcre_get_stringtable_entries specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_get_stringtable_entries man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_get_stringtable_entries(const pcre *<i>code</i>,</b>
<b>const char *<i>name</i>, char **<i>first</i>, char **<i>last</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This convenience function finds, for a compiled pattern, the first and last
entries for a given name in the table that translates capturing parenthesis
names into numbers. When names are required to be unique (PCRE_DUPNAMES is
<i>not</i> set), it is usually easier to use <b>pcre_get_stringnumber()</b>
instead.
<pre>
<i>code</i> Compiled regular expression
<i>name</i> Name whose entries required
<i>first</i> Where to return a pointer to the first entry
<i>last</i> Where to return a pointer to the last entry
</pre>
The yield of the function is the length of each entry, or
PCRE_ERROR_NOSUBSTRING if none are found.
</P>
<P>
There is a complete description of the PCRE native API, including the format of
the table entries, in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,52 @@
<html>
<head>
<title>pcre_get_substring specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_get_substring man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_get_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
<b>int <i>stringcount</i>, int <i>stringnumber</i>,</b>
<b>const char **<i>stringptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a captured substring. The
arguments are:
<pre>
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
<i>stringnumber</i> Number of the required substring
<i>stringptr</i> Where to put the string pointer
</pre>
The memory in which the substring is placed is obtained by calling
<b>pcre_malloc()</b>. The yield of the function is the length of the substring,
PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
PCRE_ERROR_NOSUBSTRING if the string number is invalid.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,51 @@
<html>
<head>
<title>pcre_get_substring_list specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_get_substring_list man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_get_substring_list(const char *<i>subject</i>,</b>
<b>int *<i>ovector</i>, int <i>stringcount</i>, const char ***<i>listptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This is a convenience function for extracting a list of all the captured
substrings. The arguments are:
<pre>
<i>subject</i> Subject that has been successfully matched
<i>ovector</i> Offset vector that <b>pcre_exec</b> used
<i>stringcount</i> Value returned by <b>pcre_exec</b>
<i>listptr</i> Where to put a pointer to the list
</pre>
The memory in which the substrings and the list are placed is obtained by
calling <b>pcre_malloc()</b>. A pointer to a list of pointers is put in
the variable whose address is in <i>listptr</i>. The list is terminated by a
NULL pointer. The yield of the function is zero on success or
PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,39 @@
<html>
<head>
<title>pcre_info specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_info man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_info(const pcre *<i>code</i>, int *<i>optptr</i>, int</b>
<b>*<i>firstcharptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function is obsolete. You should be using <b>pcre_fullinfo()</b> instead.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,42 @@
<html>
<head>
<title>pcre_maketables specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_maketables man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>const unsigned char *pcre_maketables(void);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function builds a set of character tables for character values less than
256. These can be passed to <b>pcre_compile()</b> to override PCRE's internal,
built-in tables (which were made by <b>pcre_maketables()</b> when PCRE was
compiled). You might want to do this if you are using a non-standard locale.
The function yields a pointer to the tables.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,45 @@
<html>
<head>
<title>pcre_refcount specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_refcount man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>int pcre_refcount(pcre *<i>code</i>, int <i>adjust</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function is used to maintain a reference count inside a data block that
contains a compiled pattern. Its arguments are:
<pre>
<i>code</i> Compiled regular expression
<i>adjust</i> Adjustment to reference value
</pre>
The yield of the function is the adjusted reference value, which is constrained
to lie between 0 and 65535.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,56 @@
<html>
<head>
<title>pcre_study specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_study man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>pcre_extra *pcre_study(const pcre *<i>code</i>, int <i>options</i>,</b>
<b>const char **<i>errptr</i>);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function studies a compiled pattern, to see if additional information can
be extracted that might speed up matching. Its arguments are:
<pre>
<i>code</i> A compiled regular expression
<i>options</i> Options for <b>pcre_study()</b>
<i>errptr</i> Where to put an error message
</pre>
If the function succeeds, it returns a value that can be passed to
<b>pcre_exec()</b> via its <i>extra</i> argument.
</P>
<P>
If the function returns NULL, either it could not find any additional
information, or there was an error. You can tell the difference by looking at
the error value. It is NULL in first case.
</P>
<P>
There are currently no options defined; the value of the second argument should
always be zero.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,39 @@
<html>
<head>
<title>pcre_version specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcre_version man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
SYNOPSIS
</b><br>
<P>
<b>#include &#60;pcre.h&#62;</b>
</P>
<P>
<b>char *pcre_version(void);</b>
</P>
<br><b>
DESCRIPTION
</b><br>
<P>
This function returns a character string that gives the version number of the
PCRE library and the date of its release.
</P>
<P>
There is a complete description of the PCRE native API in the
<a href="pcreapi.html"><b>pcreapi</b></a>
page and a description of the POSIX API in the
<a href="pcreposix.html"><b>pcreposix</b></a>
page.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,225 @@
<html>
<head>
<title>pcrebuild specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcrebuild man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
<li><a name="TOC2" href="#SEC2">C++ SUPPORT</a>
<li><a name="TOC3" href="#SEC3">UTF-8 SUPPORT</a>
<li><a name="TOC4" href="#SEC4">UNICODE CHARACTER PROPERTY SUPPORT</a>
<li><a name="TOC5" href="#SEC5">CODE VALUE OF NEWLINE</a>
<li><a name="TOC6" href="#SEC6">BUILDING SHARED AND STATIC LIBRARIES</a>
<li><a name="TOC7" href="#SEC7">POSIX MALLOC USAGE</a>
<li><a name="TOC8" href="#SEC8">HANDLING VERY LARGE PATTERNS</a>
<li><a name="TOC9" href="#SEC9">AVOIDING EXCESSIVE STACK USAGE</a>
<li><a name="TOC10" href="#SEC10">LIMITING PCRE RESOURCE USAGE</a>
<li><a name="TOC11" href="#SEC11">USING EBCDIC CODE</a>
</ul>
<br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
<P>
This document describes the optional features of PCRE that can be selected when
the library is compiled. They are all selected, or deselected, by providing
options to the <b>configure</b> script that is run before the <b>make</b>
command. The complete list of options for <b>configure</b> (which includes the
standard ones such as the selection of the installation directory) can be
obtained by running
<pre>
./configure --help
</pre>
The following sections describe certain options whose names begin with --enable
or --disable. These settings specify changes to the defaults for the
<b>configure</b> command. Because of the way that <b>configure</b> works,
--enable and --disable always come in pairs, so the complementary option always
exists as well, but as it specifies the default, it is not described.
</P>
<br><a name="SEC2" href="#TOC1">C++ SUPPORT</a><br>
<P>
By default, the <b>configure</b> script will search for a C++ compiler and C++
header files. If it finds them, it automatically builds the C++ wrapper library
for PCRE. You can disable this by adding
<pre>
--disable-cpp
</pre>
to the <b>configure</b> command.
</P>
<br><a name="SEC3" href="#TOC1">UTF-8 SUPPORT</a><br>
<P>
To build PCRE with support for UTF-8 character strings, add
<pre>
--enable-utf8
</pre>
to the <b>configure</b> command. Of itself, this does not make PCRE treat
strings as UTF-8. As well as compiling PCRE with this option, you also have
have to set the PCRE_UTF8 option when you call the <b>pcre_compile()</b>
function.
</P>
<br><a name="SEC4" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
<P>
UTF-8 support allows PCRE to process character values greater than 255 in the
strings that it handles. On its own, however, it does not provide any
facilities for accessing the properties of such characters. If you want to be
able to use the pattern escapes \P, \p, and \X, which refer to Unicode
character properties, you must add
<pre>
--enable-unicode-properties
</pre>
to the <b>configure</b> command. This implies UTF-8 support, even if you have
not explicitly requested it.
</P>
<P>
Including Unicode property support adds around 90K of tables to the PCRE
library, approximately doubling its size. Only the general category properties
such as <i>Lu</i> and <i>Nd</i> are supported. Details are given in the
<a href="pcrepattern.html"><b>pcrepattern</b></a>
documentation.
</P>
<br><a name="SEC5" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
<P>
By default, PCRE interprets character 10 (linefeed, LF) as indicating the end
of a line. This is the normal newline character on Unix-like systems. You can
compile PCRE to use character 13 (carriage return, CR) instead, by adding
<pre>
--enable-newline-is-cr
</pre>
to the <b>configure</b> command. There is also a --enable-newline-is-lf option,
which explicitly specifies linefeed as the newline character.
<br>
<br>
Alternatively, you can specify that line endings are to be indicated by the two
character sequence CRLF. If you want this, add
<pre>
--enable-newline-is-crlf
</pre>
to the <b>configure</b> command. Whatever line ending convention is selected
when PCRE is built can be overridden when the library functions are called. At
build time it is conventional to use the standard for your operating system.
</P>
<br><a name="SEC6" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
<P>
The PCRE building process uses <b>libtool</b> to build both shared and static
Unix libraries by default. You can suppress one of these by adding one of
<pre>
--disable-shared
--disable-static
</pre>
to the <b>configure</b> command, as required.
</P>
<br><a name="SEC7" href="#TOC1">POSIX MALLOC USAGE</a><br>
<P>
When PCRE is called through the POSIX interface (see the
<a href="pcreposix.html"><b>pcreposix</b></a>
documentation), additional working storage is required for holding the pointers
to capturing substrings, because PCRE requires three integers per substring,
whereas the POSIX interface provides only two. If the number of expected
substrings is small, the wrapper function uses space on the stack, because this
is faster than using <b>malloc()</b> for each call. The default threshold above
which the stack is no longer used is 10; it can be changed by adding a setting
such as
<pre>
--with-posix-malloc-threshold=20
</pre>
to the <b>configure</b> command.
</P>
<br><a name="SEC8" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
<P>
Within a compiled pattern, offset values are used to point from one part to
another (for example, from an opening parenthesis to an alternation
metacharacter). By default, two-byte values are used for these offsets, leading
to a maximum size for a compiled pattern of around 64K. This is sufficient to
handle all but the most gigantic patterns. Nevertheless, some people do want to
process enormous patterns, so it is possible to compile PCRE to use three-byte
or four-byte offsets by adding a setting such as
<pre>
--with-link-size=3
</pre>
to the <b>configure</b> command. The value given must be 2, 3, or 4. Using
longer offsets slows down the operation of PCRE because it has to load
additional bytes when handling them.
</P>
<P>
If you build PCRE with an increased link size, test 2 (and test 5 if you are
using UTF-8) will fail. Part of the output of these tests is a representation
of the compiled pattern, and this changes with the link size.
</P>
<br><a name="SEC9" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
<P>
When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking
by making recursive calls to an internal function called <b>match()</b>. In
environments where the size of the stack is limited, this can severely limit
PCRE's operation. (The Unix environment does not usually suffer from this
problem, but it may sometimes be necessary to increase the maximum stack size.
There is a discussion in the
<a href="pcrestack.html"><b>pcrestack</b></a>
documentation.) An alternative approach to recursion that uses memory from the
heap to remember data, instead of using recursive function calls, has been
implemented to work round the problem of limited stack size. If you want to
build a version of PCRE that works this way, add
<pre>
--disable-stack-for-recursion
</pre>
to the <b>configure</b> command. With this configuration, PCRE will use the
<b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory
management functions. Separate functions are provided because the usage is very
predictable: the block sizes requested are always the same, and the blocks are
always freed in reverse order. A calling program might be able to implement
optimized functions that perform better than the standard <b>malloc()</b> and
<b>free()</b> functions. PCRE runs noticeably more slowly when built in this
way. This option affects only the <b>pcre_exec()</b> function; it is not
relevant for the the <b>pcre_dfa_exec()</b> function.
</P>
<br><a name="SEC10" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
<P>
Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly
(sometimes recursively) when matching a pattern with the <b>pcre_exec()</b>
function. By controlling the maximum number of times this function may be
called during a single matching operation, a limit can be placed on the
resources used by a single call to <b>pcre_exec()</b>. The limit can be changed
at run time, as described in the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation. The default is 10 million, but this can be changed by adding a
setting such as
<pre>
--with-match-limit=500000
</pre>
to the <b>configure</b> command. This setting has no effect on the
<b>pcre_dfa_exec()</b> matching function.
</P>
<P>
In some environments it is desirable to limit the depth of recursive calls of
<b>match()</b> more strictly than the total number of calls, in order to
restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
is specified) that is used. A second limit controls this; it defaults to the
value that is set for --with-match-limit, which imposes no additional
constraints. However, you can set a lower limit by adding, for example,
<pre>
--with-match-limit-recursion=10000
</pre>
to the <b>configure</b> command. This value can also be overridden at run time.
</P>
<br><a name="SEC11" href="#TOC1">USING EBCDIC CODE</a><br>
<P>
PCRE assumes by default that it will run in an environment where the character
code is ASCII (or Unicode, which is a superset of ASCII). PCRE can, however, be
compiled to run in an EBCDIC environment by adding
<pre>
--enable-ebcdic
</pre>
to the <b>configure</b> command.
</P>
<P>
Last updated: 06 June 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,186 @@
<html>
<head>
<title>pcrecallout specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcrecallout man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">PCRE CALLOUTS</a>
<li><a name="TOC2" href="#SEC2">MISSING CALLOUTS</a>
<li><a name="TOC3" href="#SEC3">THE CALLOUT INTERFACE</a>
<li><a name="TOC4" href="#SEC4">RETURN VALUES</a>
</ul>
<br><a name="SEC1" href="#TOC1">PCRE CALLOUTS</a><br>
<P>
<b>int (*pcre_callout)(pcre_callout_block *);</b>
</P>
<P>
PCRE provides a feature called "callout", which is a means of temporarily
passing control to the caller of PCRE in the middle of pattern matching. The
caller of PCRE provides an external function by putting its entry point in the
global variable <i>pcre_callout</i>. By default, this variable contains NULL,
which disables all calling out.
</P>
<P>
Within a regular expression, (?C) indicates the points at which the external
function is to be called. Different callout points can be identified by putting
a number less than 256 after the letter C. The default value is zero.
For example, this pattern has two callout points:
<pre>
(?C1)\deabc(?C2)def
</pre>
If the PCRE_AUTO_CALLOUT option bit is set when <b>pcre_compile()</b> is called,
PCRE automatically inserts callouts, all with number 255, before each item in
the pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern
<pre>
A(\d{2}|--)
</pre>
it is processed as if it were
<br>
<br>
(?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
<br>
<br>
Notice that there is a callout before and after each parenthesis and
alternation bar. Automatic callouts can be used for tracking the progress of
pattern matching. The
<a href="pcretest.html"><b>pcretest</b></a>
command has an option that sets automatic callouts; when it is used, the output
indicates how the pattern is matched. This is useful information when you are
trying to optimize the performance of a particular pattern.
</P>
<br><a name="SEC2" href="#TOC1">MISSING CALLOUTS</a><br>
<P>
You should be aware that, because of optimizations in the way PCRE matches
patterns, callouts sometimes do not happen. For example, if the pattern is
<pre>
ab(?C4)cd
</pre>
PCRE knows that any matching string must contain the letter "d". If the subject
string is "abyz", the lack of "d" means that matching doesn't ever start, and
the callout is never reached. However, with "abyd", though the result is still
no match, the callout is obeyed.
</P>
<br><a name="SEC3" href="#TOC1">THE CALLOUT INTERFACE</a><br>
<P>
During matching, when PCRE reaches a callout point, the external function
defined by <i>pcre_callout</i> is called (if it is set). This applies to both
the <b>pcre_exec()</b> and the <b>pcre_dfa_exec()</b> matching functions. The
only argument to the callout function is a pointer to a <b>pcre_callout</b>
block. This structure contains the following fields:
<pre>
int <i>version</i>;
int <i>callout_number</i>;
int *<i>offset_vector</i>;
const char *<i>subject</i>;
int <i>subject_length</i>;
int <i>start_match</i>;
int <i>current_position</i>;
int <i>capture_top</i>;
int <i>capture_last</i>;
void *<i>callout_data</i>;
int <i>pattern_position</i>;
int <i>next_item_length</i>;
</pre>
The <i>version</i> field is an integer containing the version number of the
block format. The initial version was 0; the current version is 1. The version
number will change again in future if additional fields are added, but the
intention is never to remove any of the existing fields.
</P>
<P>
The <i>callout_number</i> field contains the number of the callout, as compiled
into the pattern (that is, the number after ?C for manual callouts, and 255 for
automatically generated callouts).
</P>
<P>
The <i>offset_vector</i> field is a pointer to the vector of offsets that was
passed by the caller to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>. When
<b>pcre_exec()</b> is used, the contents can be inspected in order to extract
substrings that have been matched so far, in the same way as for extracting
substrings after a match has completed. For <b>pcre_dfa_exec()</b> this field is
not useful.
</P>
<P>
The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
that were passed to <b>pcre_exec()</b>.
</P>
<P>
The <i>start_match</i> field contains the offset within the subject at which the
current match attempt started. If the pattern is not anchored, the callout
function may be called several times from the same point in the pattern for
different starting points in the subject.
</P>
<P>
The <i>current_position</i> field contains the offset within the subject of the
current match pointer.
</P>
<P>
When the <b>pcre_exec()</b> function is used, the <i>capture_top</i> field
contains one more than the number of the highest numbered captured substring so
far. If no substrings have been captured, the value of <i>capture_top</i> is
one. This is always the case when <b>pcre_dfa_exec()</b> is used, because it
does not support captured substrings.
</P>
<P>
The <i>capture_last</i> field contains the number of the most recently captured
substring. If no substrings have been captured, its value is -1. This is always
the case when <b>pcre_dfa_exec()</b> is used.
</P>
<P>
The <i>callout_data</i> field contains a value that is passed to
<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> specifically so that it can be
passed back in callouts. It is passed in the <i>pcre_callout</i> field of the
<b>pcre_extra</b> data structure. If no such data was passed, the value of
<i>callout_data</i> in a <b>pcre_callout</b> block is NULL. There is a
description of the <b>pcre_extra</b> structure in the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation.
</P>
<P>
The <i>pattern_position</i> field is present from version 1 of the
<i>pcre_callout</i> structure. It contains the offset to the next item to be
matched in the pattern string.
</P>
<P>
The <i>next_item_length</i> field is present from version 1 of the
<i>pcre_callout</i> structure. It contains the length of the next item to be
matched in the pattern string. When the callout immediately precedes an
alternation bar, a closing parenthesis, or the end of the pattern, the length
is zero. When the callout precedes an opening parenthesis, the length is that
of the entire subpattern.
</P>
<P>
The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
help in distinguishing between different automatic callouts, which all have the
same callout number. However, they are set for all callouts.
</P>
<br><a name="SEC4" href="#TOC1">RETURN VALUES</a><br>
<P>
The external callout function returns an integer to PCRE. If the value is zero,
matching proceeds as normal. If the value is greater than zero, matching fails
at the current point, but the testing of other matching possibilities goes
ahead, just as if a lookahead assertion had failed. If the value is less than
zero, the match is abandoned, and <b>pcre_exec()</b> (or <b>pcre_dfa_exec()</b>)
returns the negative value.
</P>
<P>
Negative values should normally be chosen from the set of PCRE_ERROR_xxx
values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure.
The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions;
it will never be used by PCRE itself.
</P>
<P>
Last updated: 28 February 2005
<br>
Copyright &copy; 1997-2005 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,156 @@
<html>
<head>
<title>pcrecompat specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcrecompat man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
DIFFERENCES BETWEEN PCRE AND PERL
</b><br>
<P>
This document describes the differences in the ways that PCRE and Perl handle
regular expressions. The differences described here are with respect to Perl
5.8.
</P>
<P>
1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what
it does have are given in the
<a href="pcre.html#utf8support">section on UTF-8 support</a>
in the main
<a href="pcre.html"><b>pcre</b></a>
page.
</P>
<P>
2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits
them, but they do not mean what you might think. For example, (?!a){3} does
not assert that the next three characters are not "a". It just asserts that the
next character is not "a" three times.
</P>
<P>
3. Capturing subpatterns that occur inside negative lookahead assertions are
counted, but their entries in the offsets vector are never set. Perl sets its
numerical variables from any such patterns that are matched before the
assertion fails to match something (thereby succeeding), but only if the
negative lookahead assertion contains just one branch.
</P>
<P>
4. Though binary zero characters are supported in the subject string, they are
not allowed in a pattern string because it is passed as a normal C string,
terminated by zero. The escape sequence \0 can be used in the pattern to
represent a binary zero.
</P>
<P>
5. The following Perl escape sequences are not supported: \l, \u, \L,
\U, and \N. In fact these are implemented by Perl's general string-handling
and are not part of its pattern matching engine. If any of these are
encountered by PCRE, an error is generated.
</P>
<P>
6. The Perl escape sequences \p, \P, and \X are supported only if PCRE is
built with Unicode character property support. The properties that can be
tested with \p and \P are limited to the general category properties such as
Lu and Nd, script names such as Greek or Han, and the derived properties Any
and L&.
</P>
<P>
7. PCRE does support the \Q...\E escape for quoting substrings. Characters in
between are treated as literals. This is slightly different from Perl in that $
and @ are also handled as literals inside the quotes. In Perl, they cause
variable interpolation (but of course PCRE does not have variables). Note the
following examples:
<pre>
Pattern PCRE matches Perl matches
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
\Qabc\$xyz\E abc\$xyz abc\$xyz
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
</pre>
The \Q...\E sequence is recognized both inside and outside character classes.
</P>
<P>
8. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
constructions. However, there is support for recursive patterns using the
non-Perl items (?R), (?number), and (?P&#62;name). Also, the PCRE "callout" feature
allows an external function to be called during pattern matching. See the
<a href="pcrecallout.html"><b>pcrecallout</b></a>
documentation for details.
</P>
<P>
9. There are some differences that are concerned with the settings of captured
strings when part of a pattern is repeated. For example, matching "aba" against
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
</P>
<P>
10. PCRE provides some extensions to the Perl regular expression facilities:
<br>
<br>
(a) Although lookbehind assertions must match fixed length strings, each
alternative branch of a lookbehind assertion can match a different length of
string. Perl requires them all to have the same length.
<br>
<br>
(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
meta-character matches only at the very end of the string.
<br>
<br>
(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special
meaning is faulted. Otherwise, like Perl, the backslash is ignored. (Perl can
be made to issue a warning.)
<br>
<br>
(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is
inverted, that is, by default they are not greedy, but if followed by a
question mark they are.
<br>
<br>
(e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried
only at the first matching position in the subject string.
<br>
<br>
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE
options for <b>pcre_exec()</b> have no Perl equivalents.
<br>
<br>
(g) The (?R), (?number), and (?P&#62;name) constructs allows for recursive pattern
matching (Perl can do this using the (?p{code}) construct, which PCRE cannot
support.)
<br>
<br>
(h) PCRE supports named capturing substrings, using the Python syntax.
<br>
<br>
(i) PCRE supports the possessive quantifier "++" syntax, taken from Sun's Java
package.
<br>
<br>
(j) The (R) condition, for testing recursion, is a PCRE extension.
<br>
<br>
(k) The callout facility is PCRE-specific.
<br>
<br>
(l) The partial matching facility is PCRE-specific.
<br>
<br>
(m) Patterns compiled by PCRE can be saved and re-used at a later time, even on
different hosts that have the other endianness.
<br>
<br>
(n) The alternative matching function (<b>pcre_dfa_exec()</b>) matches in a
different way and is not Perl-compatible.
</P>
<P>
Last updated: 06 June 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,337 @@
<html>
<head>
<title>pcrecpp specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcrecpp man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a>
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
<li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>
<li><a name="TOC4" href="#SEC4">PARTIAL MATCHES</a>
<li><a name="TOC5" href="#SEC5">UTF-8 AND THE MATCHING INTERFACE</a>
<li><a name="TOC6" href="#SEC6">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a>
<li><a name="TOC7" href="#SEC7">SCANNING TEXT INCREMENTALLY</a>
<li><a name="TOC8" href="#SEC8">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>
<li><a name="TOC9" href="#SEC9">REPLACING PARTS OF STRINGS</a>
<li><a name="TOC10" href="#SEC10">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>
<P>
<b>#include &#60;pcrecpp.h&#62;</b>
</P>
<P>
</P>
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
<P>
The C++ wrapper for PCRE was provided by Google Inc. Some additional
functionality was added by Giuseppe Maxia. This brief man page was constructed
from the notes in the <i>pcrecpp.h</i> file, which should be consulted for
further details.
</P>
<br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>
<P>
The "FullMatch" operation checks that supplied text matches a supplied pattern
exactly. If pointer arguments are supplied, it copies matched sub-strings that
match sub-patterns into them.
<pre>
Example: successful match
pcrecpp::RE re("h.*o");
re.FullMatch("hello");
Example: unsuccessful match (requires full match):
pcrecpp::RE re("e");
!re.FullMatch("hello");
Example: creating a temporary RE object:
pcrecpp::RE("h.*o").FullMatch("hello");
</pre>
You can pass in a "const char*" or a "string" for "text". The examples below
tend to use a const char*. You can, as in the different examples above, store
the RE object explicitly in a variable or use a temporary RE object. The
examples below use one mode or the other arbitrarily. Either could correctly be
used for any of these examples.
</P>
<P>
You must supply extra pointer arguments to extract matched subpieces.
<pre>
Example: extracts "ruby" into "s" and 1234 into "i"
int i;
string s;
pcrecpp::RE re("(\\w+):(\\d+)");
re.FullMatch("ruby:1234", &s, &i);
Example: does not try to extract any extra sub-patterns
re.FullMatch("ruby:1234", &s);
Example: does not try to extract into NULL
re.FullMatch("ruby:1234", NULL, &i);
Example: integer overflow causes failure
!re.FullMatch("ruby:1234567891234", NULL, &i);
Example: fails because there aren't enough sub-patterns:
!pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
Example: fails because string cannot be stored in integer
!pcrecpp::RE("(.*)").FullMatch("ruby", &i);
</pre>
The provided pointer arguments can be pointers to any scalar numeric
type, or one of:
<pre>
string (matched piece is copied to string)
StringPiece (StringPiece is mutated to point to matched piece)
T (where "bool T::ParseFrom(const char*, int)" exists)
NULL (the corresponding matched sub-pattern is not copied)
</pre>
The function returns true iff all of the following conditions are satisfied:
<pre>
a. "text" matches "pattern" exactly;
b. The number of matched sub-patterns is &#62;= number of supplied
pointers;
c. The "i"th argument has a suitable type for holding the
string captured as the "i"th sub-pattern. If you pass in
NULL for the "i"th argument, or pass fewer arguments than
number of sub-patterns, "i"th captured sub-pattern is
ignored.
</pre>
The matching interface supports at most 16 arguments per call.
If you need more, consider using the more general interface
<b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for
<b>DoMatch</b>.
</P>
<br><a name="SEC4" href="#TOC1">PARTIAL MATCHES</a><br>
<P>
You can use the "PartialMatch" operation when you want the pattern
to match any substring of the text.
<pre>
Example: simple search for a string:
pcrecpp::RE("ell").PartialMatch("hello");
Example: find first number in a string:
int number;
pcrecpp::RE re("(\\d+)");
re.PartialMatch("x*100 + 20", &number);
assert(number == 100);
</PRE>
</P>
<br><a name="SEC5" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br>
<P>
By default, pattern and text are plain text, one byte per character. The UTF8
flag, passed to the constructor, causes both pattern and string to be treated
as UTF-8 text, still a byte stream but potentially multiple bytes per
character. In practice, the text is likelier to be UTF-8 than the pattern, but
the match returned may depend on the UTF8 flag, so always use it when matching
UTF8 text. For example, "." will match one byte normally but with UTF8 set may
match up to three bytes of a multi-byte character.
<pre>
Example:
pcrecpp::RE_Options options;
options.set_utf8();
pcrecpp::RE re(utf8_pattern, options);
re.FullMatch(utf8_string);
Example: using the convenience function UTF8():
pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
re.FullMatch(utf8_string);
</pre>
NOTE: The UTF8 flag is ignored if pcre was not configured with the
<pre>
--enable-utf8 flag.
</PRE>
</P>
<br><a name="SEC6" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br>
<P>
PCRE defines some modifiers to change the behavior of the regular expression
engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
pass such modifiers to a RE class. Currently, the following modifiers are
supported:
<pre>
modifier description Perl corresponding
PCRE_CASELESS case insensitive match /i
PCRE_MULTILINE multiple lines match /m
PCRE_DOTALL dot matches newlines /s
PCRE_DOLLAR_ENDONLY $ matches only at end N/A
PCRE_EXTRA strict escape parsing N/A
PCRE_EXTENDED ignore whitespaces /x
PCRE_UTF8 handles UTF8 chars built-in
PCRE_UNGREEDY reverses * and *? N/A
PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
</pre>
(*) Both Perl and PCRE allow non capturing parentheses by means of the
"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
capture, while (ab|cd) does.
</P>
<P>
For a full account on how each modifier works, please check the
PCRE API reference page.
</P>
<P>
For each modifier, there are two member functions whose name is made
out of the modifier in lowercase, without the "PCRE_" prefix. For
instance, PCRE_CASELESS is handled by
<pre>
bool caseless()
</pre>
which returns true if the modifier is set, and
<pre>
RE_Options & set_caseless(bool)
</pre>
which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member
functions. Setting <i>match_limit</i> to a non-zero value will limit the
execution of pcre to keep it from doing bad things like blowing the stack or
taking an eternity to return a result. A value of 5000 is good enough to stop
stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables
match limiting. Alternatively, you can call <b>match_limit_recursion()</b>
which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
recurses. <b>match_limit()</b> limits the number of matches PCRE does;
<b>match_limit_recursion()</b> limits the depth of internal recursion, and
therefore the amount of stack that is used.
</P>
<P>
Normally, to pass one or more modifiers to a RE class, you declare
a <i>RE_Options</i> object, set the appropriate options, and pass this
object to a RE constructor. Example:
<pre>
RE_options opt;
opt.set_caseless(true);
if (RE("HELLO", opt).PartialMatch("hello world")) ...
</pre>
RE_options has two constructors. The default constructor takes no arguments and
creates a set of flags that are off by default. The optional parameter
<i>option_flags</i> is to facilitate transfer of legacy code from C programs.
This lets you do
<pre>
RE(pattern,
RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
</pre>
However, new code is better off doing
<pre>
RE(pattern,
RE_Options().set_caseless(true).set_multiline(true))
.PartialMatch(str);
</pre>
If you are going to pass one of the most used modifiers, there are some
convenience functions that return a RE_Options class with the
appropriate modifier already set: <b>CASELESS()</b>, <b>UTF8()</b>,
<b>MULTILINE()</b>, <b>DOTALL</b>(), and <b>EXTENDED()</b>.
</P>
<P>
If you need to set several options at once, and you don't want to go through
the pains of declaring a RE_Options object and setting several options, there
is a parallel method that give you such ability on the fly. You can concatenate
several <b>set_xxxxx()</b> member functions, since each of them returns a
reference to its class object. For example, to pass PCRE_CASELESS,
PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
<pre>
RE(" ^ xyz \\s+ .* blah$",
RE_Options()
.set_caseless(true)
.set_extended(true)
.set_multiline(true)).PartialMatch(sometext);
</PRE>
</P>
<br><a name="SEC7" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>
<P>
The "Consume" operation may be useful if you want to repeatedly
match regular expressions at the front of a string and skip over
them as they match. This requires use of the "StringPiece" type,
which represents a sub-range of a real string. Like RE, StringPiece
is defined in the pcrecpp namespace.
<pre>
Example: read lines of the form "var = value" from a string.
string contents = ...; // Fill string somehow
pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
</PRE>
</P>
<P>
<pre>
string var;
int value;
pcrecpp::RE re("(\\w+) = (\\d+)\n");
while (re.Consume(&input, &var, &value)) {
...;
}
</pre>
Each successful call to "Consume" will set "var/value", and also
advance "input" so it points past the matched text.
</P>
<P>
The "FindAndConsume" operation is similar to "Consume" but does not
anchor your match at the beginning of the string. For example, you
could extract all words from a string by repeatedly calling
<pre>
pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
</PRE>
</P>
<br><a name="SEC8" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>
<P>
By default, if you pass a pointer to a numeric value, the
corresponding text is interpreted as a base-10 number. You can
instead wrap the pointer with a call to one of the operators Hex(),
Octal(), or CRadix() to interpret the text in another base. The
CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
prefixes, but defaults to base-10.
<pre>
Example:
int a, b, c, d;
pcrecpp::RE re("(.*) (.*) (.*) (.*)");
re.FullMatch("100 40 0100 0x40",
pcrecpp::Octal(&a), pcrecpp::Hex(&b),
pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
</pre>
will leave 64 in a, b, c, and d.
</P>
<br><a name="SEC9" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>
<P>
You can replace the first match of "pattern" in "str" with "rewrite".
Within "rewrite", backslash-escaped digits (\1 to \9) can be
used to insert text matching corresponding parenthesized group
from the pattern. \0 in "rewrite" refers to the entire matching
text. For example:
<pre>
string s = "yabba dabba doo";
pcrecpp::RE("b+").Replace("d", &s);
</pre>
will leave "s" containing "yada dabba doo". The result is true if the pattern
matches and a replacement occurs, false otherwise.
</P>
<P>
<b>GlobalReplace</b> is like <b>Replace</b> except that it replaces all
occurrences of the pattern in the string with the rewrite. Replacements are
not subject to re-matching. For example:
<pre>
string s = "yabba dabba doo";
pcrecpp::RE("b+").GlobalReplace("d", &s);
</pre>
will leave "s" containing "yada dada doo". It returns the number of
replacements made.
</P>
<P>
<b>Extract</b> is like <b>Replace</b>, except that if the pattern matches,
"rewrite" is copied into "out" (an additional argument) with substitutions.
The non-matching portions of "text" are ignored. Returns true iff a match
occurred and the extraction happened successfully; if no match occurs, the
string is left unaffected.
</P>
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br>
<P>
The C++ wrapper was contributed by Google Inc.
<br>
Copyright &copy; 2005 Google Inc.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,424 @@
<html>
<head>
<title>pcregrep specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcregrep man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
<li><a name="TOC3" href="#SEC3">OPTIONS</a>
<li><a name="TOC4" href="#SEC4">ENVIRONMENT VARIABLES</a>
<li><a name="TOC5" href="#SEC5">NEWLINES</a>
<li><a name="TOC6" href="#SEC6">OPTIONS COMPATIBILITY</a>
<li><a name="TOC7" href="#SEC7">OPTIONS WITH DATA</a>
<li><a name="TOC8" href="#SEC8">MATCHING ERRORS</a>
<li><a name="TOC9" href="#SEC9">DIAGNOSTICS</a>
<li><a name="TOC10" href="#SEC10">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
<P>
<b>pcregrep [options] [long options] [pattern] [path1 path2 ...]</b>
</P>
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
<P>
<b>pcregrep</b> searches files for character patterns, in the same way as other
grep commands do, but it uses the PCRE regular expression library to support
patterns that are compatible with the regular expressions of Perl 5. See
<a href="pcrepattern.html"><b>pcrepattern</b></a>
for a full description of syntax and semantics of the regular expressions that
PCRE supports.
</P>
<P>
Patterns, whether supplied on the command line or in a separate file, are given
without delimiters. For example:
<pre>
pcregrep Thursday /etc/motd
</pre>
If you attempt to use delimiters (for example, by surrounding a pattern with
slashes, as is common in Perl scripts), they are interpreted as part of the
pattern. Quotes can of course be used on the command line because they are
interpreted by the shell, and indeed they are required if a pattern contains
white space or shell metacharacters.
</P>
<P>
The first argument that follows any option settings is treated as the single
pattern to be matched when neither <b>-e</b> nor <b>-f</b> is present.
Conversely, when one or both of these options are used to specify patterns, all
arguments are treated as path names. At least one of <b>-e</b>, <b>-f</b>, or an
argument pattern must be provided.
</P>
<P>
If no files are specified, <b>pcregrep</b> reads the standard input. The
standard input can also be referenced by a name consisting of a single hyphen.
For example:
<pre>
pcregrep some-pattern /file1 - /file3
</pre>
By default, each line that matches the pattern is copied to the standard
output, and if there is more than one file, the file name is output at the
start of each line. However, there are options that can change how
<b>pcregrep</b> behaves. In particular, the <b>-M</b> option makes it possible to
search for patterns that span line boundaries. What defines a line boundary is
controlled by the <b>-N</b> (<b>--newline</b>) option.
</P>
<P>
Patterns are limited to 8K or BUFSIZ characters, whichever is the greater.
BUFSIZ is defined in <b>&#60;stdio.h&#62;</b>.
</P>
<P>
If the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variable is set,
<b>pcregrep</b> uses the value to set a locale when calling the PCRE library.
The <b>--locale</b> option can be used to override this.
</P>
<br><a name="SEC3" href="#TOC1">OPTIONS</a><br>
<P>
<b>--</b>
This terminate the list of options. It is useful if the next item on the
command line starts with a hyphen but is not an option. This allows for the
processing of patterns and filenames that start with hyphens.
</P>
<P>
<b>-A</b> <i>number</i>, <b>--after-context=</b><i>number</i>
Output <i>number</i> lines of context after each matching line. If filenames
and/or line numbers are being output, a hyphen separator is used instead of a
colon for the context lines. A line containing "--" is output between each
group of lines, unless they are in fact contiguous in the input file. The value
of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
guarantees to have up to 8K of following text available for context output.
</P>
<P>
<b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
Output <i>number</i> lines of context before each matching line. If filenames
and/or line numbers are being output, a hyphen separator is used instead of a
colon for the context lines. A line containing "--" is output between each
group of lines, unless they are in fact contiguous in the input file. The value
of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
guarantees to have up to 8K of preceding text available for context output.
</P>
<P>
<b>-C</b> <i>number</i>, <b>--context=</b><i>number</i>
Output <i>number</i> lines of context both before and after each matching line.
This is equivalent to setting both <b>-A</b> and <b>-B</b> to the same value.
</P>
<P>
<b>-c</b>, <b>--count</b>
Do not output individual lines; instead just output a count of the number of
lines that would otherwise have been output. If several files are given, a
count is output for each of them. In this mode, the <b>-A</b>, <b>-B</b>, and
<b>-C</b> options are ignored.
</P>
<P>
<b>--colour</b>, <b>--color</b>
If this option is given without any data, it is equivalent to "--colour=auto".
If data is required, it must be given in the same shell item, separated by an
equals sign.
</P>
<P>
<b>--colour=</b><i>value</i>, <b>--color=</b><i>value</i>
This option specifies under what circumstances the part of a line that matched
a pattern should be coloured in the output. The value may be "never" (the
default), "always", or "auto". In the latter case, colouring happens only if
the standard output is connected to a terminal. The colour can be specified by
setting the environment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value
of this variable should be a string of two numbers, separated by a semicolon.
They are copied directly into the control string for setting colour on a
terminal, so it is your responsibility to ensure that they make sense. If
neither of the environment variables is set, the default is "1;31", which gives
red.
</P>
<P>
<b>-D</b> <i>action</i>, <b>--devices=</b><i>action</i>
If an input path is not a regular file or a directory, "action" specifies how
it is to be processed. Valid values are "read" (the default) or "skip"
(silently skip the path).
</P>
<P>
<b>-d</b> <i>action</i>, <b>--directories=</b><i>action</i>
If an input path is a directory, "action" specifies how it is to be processed.
Valid values are "read" (the default), "recurse" (equivalent to the <b>-r</b>
option), or "skip" (silently skip the path). In the default case, directories
are read as if they were ordinary files. In some operating systems the effect
of reading a directory like this is an immediate end-of-file.
</P>
<P>
<b>-e</b> <i>pattern</i>, <b>--regex=</b><i>pattern</i>,
<b>--regexp=</b><i>pattern</i> Specify a pattern to be matched. This option can
be used multiple times in order to specify several patterns. It can also be
used as a way of specifying a single pattern that starts with a hyphen. When
<b>-e</b> is used, no argument pattern is taken from the command line; all
arguments are treated as file names. There is an overall maximum of 100
patterns. They are applied to each line in the order in which they are defined
until one matches (or fails to match if <b>-v</b> is used). If <b>-f</b> is used
with <b>-e</b>, the command line patterns are matched first, followed by the
patterns from the file, independent of the order in which these options are
specified. Note that multiple use of <b>-e</b> is not the same as a single
pattern with alternatives. For example, X|Y finds the first character in a line
that is X or Y, whereas if the two patterns are given separately,
<b>pcregrep</b> finds X if it is present, even if it follows Y in the line. It
finds Y only if there is no X in the line. This really matters only if you are
using <b>-o</b> to show the portion of the line that matched.
</P>
<P>
<b>--exclude</b>=<i>pattern</i>
When <b>pcregrep</b> is searching the files in a directory as a consequence of
the <b>-r</b> (recursive search) option, any files whose names match the pattern
are excluded. The pattern is a PCRE regular expression. If a file name matches
both <b>--include</b> and <b>--exclude</b>, it is excluded. There is no short
form for this option.
</P>
<P>
<b>-F</b>, <b>--fixed-strings</b>
Interpret each pattern as a list of fixed strings, separated by newlines,
instead of as a regular expression. The <b>-w</b> (match as a word) and <b>-x</b>
(match whole line) options can be used with <b>-F</b>. They apply to each of the
fixed strings. A line is selected if any of the fixed strings are found in it
(subject to <b>-w</b> or <b>-x</b>, if present).
</P>
<P>
<b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
Read a number of patterns from the file, one per line, and match them against
each line of input. A data line is output if any of the patterns match it. The
filename can be given as "-" to refer to the standard input. When <b>-f</b> is
used, patterns specified on the command line using <b>-e</b> may also be
present; they are tested before the file's patterns. However, no other pattern
is taken from the command line; all arguments are treated as file names. There
is an overall maximum of 100 patterns. Trailing white space is removed from
each line, and blank lines are ignored. An empty file contains no patterns and
therefore matches nothing.
</P>
<P>
<b>-H</b>, <b>--with-filename</b>
Force the inclusion of the filename at the start of output lines when searching
a single file. By default, the filename is not shown in this case. For matching
lines, the filename is followed by a colon and a space; for context lines, a
hyphen separator is used. If a line number is also being output, it follows the
file name without a space.
</P>
<P>
<b>-h</b>, <b>--no-filename</b>
Suppress the output filenames when searching multiple files. By default,
filenames are shown when multiple files are searched. For matching lines, the
filename is followed by a colon and a space; for context lines, a hyphen
separator is used. If a line number is also being output, it follows the file
name without a space.
</P>
<P>
<b>--help</b>
Output a brief help message and exit.
</P>
<P>
<b>-i</b>, <b>--ignore-case</b>
Ignore upper/lower case distinctions during comparisons.
</P>
<P>
<b>--include</b>=<i>pattern</i>
When <b>pcregrep</b> is searching the files in a directory as a consequence of
the <b>-r</b> (recursive search) option, only those files whose names match the
pattern are included. The pattern is a PCRE regular expression. If a file name
matches both <b>--include</b> and <b>--exclude</b>, it is excluded. There is no
short form for this option.
</P>
<P>
<b>-L</b>, <b>--files-without-match</b>
Instead of outputting lines from the files, just output the names of the files
that do not contain any lines that would have been output. Each file name is
output once, on a separate line.
</P>
<P>
<b>-l</b>, <b>--files-with-matches</b>
Instead of outputting lines from the files, just output the names of the files
containing lines that would have been output. Each file name is output
once, on a separate line. Searching stops as soon as a matching line is found
in a file.
</P>
<P>
<b>--label</b>=<i>name</i>
This option supplies a name to be used for the standard input when file names
are being output. If not supplied, "(standard input)" is used. There is no
short form for this option.
</P>
<P>
<b>--locale</b>=<i>locale-name</i>
This option specifies a locale to be used for pattern matching. It overrides
the value in the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variables. If no
locale is specified, the PCRE library's default (usually the "C" locale) is
used. There is no short form for this option.
</P>
<P>
<b>-M</b>, <b>--multiline</b>
Allow patterns to match more than one line. When this option is given, patterns
may usefully contain literal newline characters and internal occurrences of ^
and $ characters. The output for any one match may consist of more than one
line. When this option is set, the PCRE library is called in "multiline" mode.
There is a limit to the number of lines that can be matched, imposed by the way
that <b>pcregrep</b> buffers the input file as it scans it. However,
<b>pcregrep</b> ensures that at least 8K characters or the rest of the document
(whichever is the shorter) are available for forward matching, and similarly
the previous 8K characters (or all the previous characters, if fewer than 8K)
are guaranteed to be available for lookbehind assertions.
</P>
<P>
<b>-N</b> <i>newline-type</i>, <b>--newline=</b><i>newline-type</i>
The PCRE library supports three different character sequences for indicating
the ends of lines. They are the single-character sequences CR (carriage return)
and LF (linefeed), and the two-character sequence CR, LF. When the library is
built, a default line-ending sequence is specified. This is normally the
standard sequence for the operating system. Unless otherwise specified by this
option, <b>pcregrep</b> uses the default. The possible values for this option
are CR, LF, or CRLF. This makes it possible to use <b>pcregrep</b> on files that
have come from other environments without having to modify their line endings.
If the data that is being scanned does not agree with the convention set by
this option, <b>pcregrep</b> may behave in strange ways.
</P>
<P>
<b>-n</b>, <b>--line-number</b>
Precede each output line by its line number in the file, followed by a colon
and a space for matching lines or a hyphen and a space for context lines. If
the filename is also being output, it precedes the line number.
</P>
<P>
<b>-o</b>, <b>--only-matching</b>
Show only the part of the line that matched a pattern. In this mode, no
context is shown. That is, the <b>-A</b>, <b>-B</b>, and <b>-C</b> options are
ignored.
</P>
<P>
<b>-q</b>, <b>--quiet</b>
Work quietly, that is, display nothing except error messages. The exit
status indicates whether or not any matches were found.
</P>
<P>
<b>-r</b>, <b>--recursive</b>
If any given path is a directory, recursively scan the files it contains,
taking note of any <b>--include</b> and <b>--exclude</b> settings. By default, a
directory is read as a normal file; in some operating systems this gives an
immediate end-of-file. This option is a shorthand for setting the <b>-d</b>
option to "recurse".
</P>
<P>
<b>-s</b>, <b>--no-messages</b>
Suppress error messages about non-existent or unreadable files. Such files are
quietly skipped. However, the return code is still 2, even if matches were
found in other files.
</P>
<P>
<b>-u</b>, <b>--utf-8</b>
Operate in UTF-8 mode. This option is available only if PCRE has been compiled
with UTF-8 support. Both patterns and subject lines must be valid strings of
UTF-8 characters.
</P>
<P>
<b>-V</b>, <b>--version</b>
Write the version numbers of <b>pcregrep</b> and the PCRE library that is being
used to the standard error stream.
</P>
<P>
<b>-v</b>, <b>--invert-match</b>
Invert the sense of the match, so that lines which do <i>not</i> match any of
the patterns are the ones that are found.
</P>
<P>
<b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
Force the patterns to match only whole words. This is equivalent to having \b
at the start and end of the pattern.
</P>
<P>
<b>-x</b>, <b>--line-regex</b>, \fP--line-regexp\fP
Force the patterns to be anchored (each must start matching at the beginning of
a line) and in addition, require them to match entire lines. This is
equivalent to having ^ and $ characters at the start and end of each
alternative branch in every pattern.
</P>
<br><a name="SEC4" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
<P>
The environment variables <b>LC_ALL</b> and <b>LC_CTYPE</b> are examined, in that
order, for a locale. The first one that is set is used. This can be overridden
by the <b>--locale</b> option. If no locale is set, the PCRE library's default
(usually the "C" locale) is used.
</P>
<br><a name="SEC5" href="#TOC1">NEWLINES</a><br>
<P>
The <b>-N</b> (<b>--newline</b>) option allows <b>pcregrep</b> to scan files with
different newline conventions from the default. However, the setting of this
option does not affect the way in which <b>pcregrep</b> writes information to
the standard error and output streams. It uses the string "\n" in C
<b>printf()</b> calls to indicate newlines, relying on the C I/O library to
convert this to an appropriate sequence if the output is sent to a file.
</P>
<br><a name="SEC6" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
<P>
The majority of short and long forms of <b>pcregrep</b>'s options are the same
as in the GNU <b>grep</b> program. Any long option of the form
<b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b>
(PCRE terminology). However, the <b>--locale</b>, <b>-M</b>, <b>--multiline</b>,
<b>-u</b>, and <b>--utf-8</b> options are specific to <b>pcregrep</b>.
</P>
<br><a name="SEC7" href="#TOC1">OPTIONS WITH DATA</a><br>
<P>
There are four different ways in which an option with data can be specified.
If a short form option is used, the data may follow immediately, or in the next
command line item. For example:
<pre>
-f/some/file
-f /some/file
</pre>
If a long form option is used, the data may appear in the same command line
item, separated by an equals character, or (with one exception) it may appear
in the next command line item. For example:
<pre>
--file=/some/file
--file /some/file
</pre>
Note, however, that if you want to supply a file name beginning with ~ as data
in a shell command, and have the shell expand ~ to a home directory, you must
separate the file name from the option, because the shell does not treat ~
specially unless it is at the start of an item.
</P>
<P>
The exception to the above is the <b>--colour</b> (or <b>--color</b>) option,
for which the data is optional. If this option does have data, it must be given
in the first form, using an equals character. Otherwise it will be assumed that
it has no data.
</P>
<br><a name="SEC8" href="#TOC1">MATCHING ERRORS</a><br>
<P>
It is possible to supply a regular expression that takes a very long time to
fail to match certain lines. Such patterns normally involve nested indefinite
repeats, for example: (a+)*\d when matched against a line of a's with no final
digit. The PCRE matching function has a resource limit that causes it to abort
in these circumstances. If this happens, <b>pcregrep</b> outputs an error
message and the line that caused the problem to the standard error stream. If
there are more than 20 such errors, <b>pcregrep</b> gives up.
</P>
<br><a name="SEC9" href="#TOC1">DIAGNOSTICS</a><br>
<P>
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
for syntax errors and non-existent or inacessible files (even if matches were
found in other files) or too many matching errors. Using the <b>-s</b> option to
suppress error messages about inaccessble files does not affect the return
code.
</P>
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
University Computing Service
<br>
Cambridge CB2 3QG, England.
</P>
<P>
Last updated: 06 June 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,192 @@
<html>
<head>
<title>pcrematching specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcrematching man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">PCRE MATCHING ALGORITHMS</a>
<li><a name="TOC2" href="#SEC2">REGULAR EXPRESSIONS AS TREES</a>
<li><a name="TOC3" href="#SEC3">THE STANDARD MATCHING ALGORITHM</a>
<li><a name="TOC4" href="#SEC4">THE DFA MATCHING ALGORITHM</a>
<li><a name="TOC5" href="#SEC5">ADVANTAGES OF THE DFA ALGORITHM</a>
<li><a name="TOC6" href="#SEC6">DISADVANTAGES OF THE DFA ALGORITHM</a>
</ul>
<br><a name="SEC1" href="#TOC1">PCRE MATCHING ALGORITHMS</a><br>
<P>
This document describes the two different algorithms that are available in PCRE
for matching a compiled regular expression against a given subject string. The
"standard" algorithm is the one provided by the <b>pcre_exec()</b> function.
This works in the same was as Perl's matching function, and provides a
Perl-compatible matching operation.
</P>
<P>
An alternative algorithm is provided by the <b>pcre_dfa_exec()</b> function;
this operates in a different way, and is not Perl-compatible. It has advantages
and disadvantages compared with the standard algorithm, and these are described
below.
</P>
<P>
When there is only one possible way in which a given subject string can match a
pattern, the two algorithms give the same answer. A difference arises, however,
when there are multiple possibilities. For example, if the pattern
<pre>
^&#60;.*&#62;
</pre>
is matched against the string
<pre>
&#60;something&#62; &#60;something else&#62; &#60;something further&#62;
</pre>
there are three possible answers. The standard algorithm finds only one of
them, whereas the DFA algorithm finds all three.
</P>
<br><a name="SEC2" href="#TOC1">REGULAR EXPRESSIONS AS TREES</a><br>
<P>
The set of strings that are matched by a regular expression can be represented
as a tree structure. An unlimited repetition in the pattern makes the tree of
infinite size, but it is still a tree. Matching the pattern to a given subject
string (from a given starting point) can be thought of as a search of the tree.
There are two ways to search a tree: depth-first and breadth-first, and these
correspond to the two matching algorithms provided by PCRE.
</P>
<br><a name="SEC3" href="#TOC1">THE STANDARD MATCHING ALGORITHM</a><br>
<P>
In the terminology of Jeffrey Friedl's book \fIMastering Regular
Expressions\fP, the standard algorithm is an "NFA algorithm". It conducts a
depth-first search of the pattern tree. That is, it proceeds along a single
path through the tree, checking that the subject matches what is required. When
there is a mismatch, the algorithm tries any alternatives at the current point,
and if they all fail, it backs up to the previous branch point in the tree, and
tries the next alternative branch at that level. This often involves backing up
(moving to the left) in the subject string as well. The order in which
repetition branches are tried is controlled by the greedy or ungreedy nature of
the quantifier.
</P>
<P>
If a leaf node is reached, a matching string has been found, and at that point
the algorithm stops. Thus, if there is more than one possible match, this
algorithm returns the first one that it finds. Whether this is the shortest,
the longest, or some intermediate length depends on the way the greedy and
ungreedy repetition quantifiers are specified in the pattern.
</P>
<P>
Because it ends up with a single path through the tree, it is relatively
straightforward for this algorithm to keep track of the substrings that are
matched by portions of the pattern in parentheses. This provides support for
capturing parentheses and back references.
</P>
<br><a name="SEC4" href="#TOC1">THE DFA MATCHING ALGORITHM</a><br>
<P>
DFA stands for "deterministic finite automaton", but you do not need to
understand the origins of that name. This algorithm conducts a breadth-first
search of the tree. Starting from the first matching point in the subject, it
scans the subject string from left to right, once, character by character, and
as it does this, it remembers all the paths through the tree that represent
valid matches.
</P>
<P>
The scan continues until either the end of the subject is reached, or there are
no more unterminated paths. At this point, terminated paths represent the
different matching possibilities (if there are none, the match has failed).
Thus, if there is more than one possible match, this algorithm finds all of
them, and in particular, it finds the longest. In PCRE, there is an option to
stop the algorithm after the first match (which is necessarily the shortest)
has been found.
</P>
<P>
Note that all the matches that are found start at the same point in the
subject. If the pattern
<pre>
cat(er(pillar)?)
</pre>
is matched against the string "the caterpillar catchment", the result will be
the three strings "cat", "cater", and "caterpillar" that start at the fourth
character of the subject. The algorithm does not automatically move on to find
matches that start at later positions.
</P>
<P>
There are a number of features of PCRE regular expressions that are not
supported by the DFA matching algorithm. They are as follows:
</P>
<P>
1. Because the algorithm finds all possible matches, the greedy or ungreedy
nature of repetition quantifiers is not relevant. Greedy and ungreedy
quantifiers are treated in exactly the same way.
</P>
<P>
2. When dealing with multiple paths through the tree simultaneously, it is not
straightforward to keep track of captured substrings for the different matching
possibilities, and PCRE's implementation of this algorithm does not attempt to
do this. This means that no captured substrings are available.
</P>
<P>
3. Because no substrings are captured, back references within the pattern are
not supported, and cause errors if encountered.
</P>
<P>
4. For the same reason, conditional expressions that use a backreference as the
condition are not supported.
</P>
<P>
5. Callouts are supported, but the value of the <i>capture_top</i> field is
always 1, and the value of the <i>capture_last</i> field is always -1.
</P>
<P>
6.
The \C escape sequence, which (in the standard algorithm) matches a single
byte, even in UTF-8 mode, is not supported because the DFA algorithm moves
through the subject string one character at a time, for all active paths
through the tree.
</P>
<br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE DFA ALGORITHM</a><br>
<P>
Using the DFA matching algorithm provides the following advantages:
</P>
<P>
1. All possible matches (at a single point in the subject) are automatically
found, and in particular, the longest match is found. To find more than one
match using the standard algorithm, you have to do kludgy things with
callouts.
</P>
<P>
2. There is much better support for partial matching. The restrictions on the
content of the pattern that apply when using the standard algorithm for partial
matching do not apply to the DFA algorithm. For non-anchored patterns, the
starting position of a partial match is available.
</P>
<P>
3. Because the DFA algorithm scans the subject string just once, and never
needs to backtrack, it is possible to pass very long subject strings to the
matching function in several pieces, checking for partial matching each time.
</P>
<br><a name="SEC6" href="#TOC1">DISADVANTAGES OF THE DFA ALGORITHM</a><br>
<P>
The DFA algorithm suffers from a number of disadvantages:
</P>
<P>
1. It is substantially slower than the standard algorithm. This is partly
because it has to search for all possible matches, but is also because it is
less susceptible to optimization.
</P>
<P>
2. Capturing parentheses and back references are not supported.
</P>
<P>
3. The "atomic group" feature of PCRE regular expressions is supported, but
does not provide the advantage that it does for the standard algorithm.
</P>
<P>
Last updated: 06 June 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,225 @@
<html>
<head>
<title>pcrepartial specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcrepartial man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE</a>
<li><a name="TOC2" href="#SEC2">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a>
<li><a name="TOC3" href="#SEC3">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a>
<li><a name="TOC4" href="#SEC4">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a>
</ul>
<br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE</a><br>
<P>
In normal use of PCRE, if the subject string that is passed to
<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> matches as far as it goes, but is
too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There
are circumstances where it might be helpful to distinguish this case from other
cases in which there is no match.
</P>
<P>
Consider, for example, an application where a human is required to type in data
for a field with specific formatting requirements. An example might be a date
in the form <i>ddmmmyy</i>, defined by this pattern:
<pre>
^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
</pre>
If the application sees the user's keystrokes one by one, and can check that
what has been typed so far is potentially valid, it is able to raise an error
as soon as a mistake is made, possibly beeping and not reflecting the
character that has been typed. This immediate feedback is likely to be a better
user interface than a check that is delayed until the entire string has been
entered.
</P>
<P>
PCRE supports the concept of partial matching by means of the PCRE_PARTIAL
option, which can be set when calling <b>pcre_exec()</b> or
<b>pcre_dfa_exec()</b>. When this flag is set for <b>pcre_exec()</b>, the return
code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time
during the matching process the last part of the subject string matched part of
the pattern. Unfortunately, for non-anchored matching, it is not possible to
obtain the position of the start of the partial match. No captured data is set
when PCRE_ERROR_PARTIAL is returned.
</P>
<P>
When PCRE_PARTIAL is set for <b>pcre_dfa_exec()</b>, the return code
PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the
subject is reached, there have been no complete matches, but there is still at
least one matching possibility. The portion of the string that provided the
partial match is set as the first matching string.
</P>
<P>
Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the
last literal byte in a pattern, and abandons matching immediately if such a
byte is not present in the subject string. This optimization cannot be used
for a subject string that might match only partially.
</P>
<br><a name="SEC2" href="#TOC1">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a><br>
<P>
Because of the way certain internal optimizations are implemented in the
<b>pcre_exec()</b> function, the PCRE_PARTIAL option cannot be used with all
patterns. These restrictions do not apply when <b>pcre_dfa_exec()</b> is used.
For <b>pcre_exec()</b>, repeated single characters such as
<pre>
a{2,4}
</pre>
and repeated single metasequences such as
<pre>
\d+
</pre>
are not permitted if the maximum number of occurrences is greater than one.
Optional items such as \d? (where the maximum is one) are permitted.
Quantifiers with any values are permitted after parentheses, so the invalid
examples above can be coded thus:
<pre>
(a){2,4}
(\d)+
</pre>
These constructions run more slowly, but for the kinds of application that are
envisaged for this facility, this is not felt to be a major restriction.
</P>
<P>
If PCRE_PARTIAL is set for a pattern that does not conform to the restrictions,
<b>pcre_exec()</b> returns the error code PCRE_ERROR_BADPARTIAL (-13).
</P>
<br><a name="SEC3" href="#TOC1">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a><br>
<P>
If the escape sequence \P is present in a <b>pcretest</b> data line, the
PCRE_PARTIAL flag is used for the match. Here is a run of <b>pcretest</b> that
uses the date example quoted above:
<pre>
re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
data&#62; 25jun04\P
0: 25jun04
1: jun
data&#62; 25dec3\P
Partial match
data&#62; 3ju\P
Partial match
data&#62; 3juj\P
No match
data&#62; j\P
No match
</pre>
The first data string is matched completely, so <b>pcretest</b> shows the
matched substrings. The remaining four strings do not match the complete
pattern, but the first two are partial matches. The same test, using DFA
matching (by means of the \D escape sequence), produces the following output:
<pre>
re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
data&#62; 25jun04\P\D
0: 25jun04
data&#62; 23dec3\P\D
Partial match: 23dec3
data&#62; 3ju\P\D
Partial match: 3ju
data&#62; 3juj\P\D
No match
data&#62; j\P\D
No match
</pre>
Notice that in this case the portion of the string that was matched is made
available.
</P>
<br><a name="SEC4" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a><br>
<P>
When a partial match has been found using <b>pcre_dfa_exec()</b>, it is possible
to continue the match by providing additional subject data and calling
<b>pcre_dfa_exec()</b> again with the PCRE_DFA_RESTART option and the same
working space (where details of the previous partial match are stored). Here is
an example using <b>pcretest</b>, where the \R escape sequence sets the
PCRE_DFA_RESTART option and the \D escape sequence requests the use of
<b>pcre_dfa_exec()</b>:
<pre>
re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
data&#62; 23ja\P\D
Partial match: 23ja
data&#62; n05\R\D
0: n05
</pre>
The first call has "23ja" as the subject, and requests partial matching; the
second call has "n05" as the subject for the continued (restarted) match.
Notice that when the match is complete, only the last part is shown; PCRE does
not retain the previously partially-matched string. It is up to the calling
program to do that if it needs to.
</P>
<P>
This facility can be used to pass very long subject strings to
<b>pcre_dfa_exec()</b>. However, some care is needed for certain types of
pattern.
</P>
<P>
1. If the pattern contains tests for the beginning or end of a line, you need
to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
subject string for any call does not contain the beginning or end of a line.
</P>
<P>
2. If the pattern contains backward assertions (including \b or \B), you need
to arrange for some overlap in the subject strings to allow for this. For
example, you could pass the subject in chunks that were 500 bytes long, but in
a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
bytes at the start of the buffer.
</P>
<P>
3. Matching a subject string that is split into multiple segments does not
always produce exactly the same result as matching over one single long string.
The difference arises when there are multiple matching possibilities, because a
partial match result is given only when there are no completed matches in a
call to fBpcre_dfa_exec()\fP. This means that as soon as the shortest match has
been found, continuation to a new subject segment is no longer possible.
Consider this <b>pcretest</b> example:
<pre>
re&#62; /dog(sbody)?/
data&#62; do\P\D
Partial match: do
data&#62; gsb\R\P\D
0: g
data&#62; dogsbody\D
0: dogsbody
1: dog
</pre>
The pattern matches the words "dog" or "dogsbody". When the subject is
presented in several parts ("do" and "gsb" being the first two) the match stops
when "dog" has been found, and it is not possible to continue. On the other
hand, if "dogsbody" is presented as a single string, both matches are found.
</P>
<P>
Because of this phenomenon, it does not usually make sense to end a pattern
that is going to be matched in this way with a variable repeat.
</P>
<P>
4. Patterns that contain alternatives at the top level which do not all
start with the same pattern item may not work as expected. For example,
consider this pattern:
<pre>
1234|3789
</pre>
If the first part of the subject is "ABC123", a partial match of the first
alternative is found at offset 3. There is no partial match for the second
alternative, because such a match does not start at the same point in the
subject string. Attempting to continue with the string "789" does not yield a
match because only those alternatives that match at one point in the subject
are remembered. The problem arises because the start of the second alternative
matches within the first alternative. There is no problem with anchored
patterns or patterns such as:
<pre>
1234|ABCD
</pre>
where no string can be a partial match for both alternatives.
</P>
<P>
Last updated: 16 January 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,97 @@
<html>
<head>
<title>pcreperform specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcreperform man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
PCRE PERFORMANCE
</b><br>
<P>
Certain items that may appear in regular expression patterns are more efficient
than others. It is more efficient to use a character class like [aeiou] than a
set of alternatives such as (a|e|i|o|u). In general, the simplest construction
that provides the required behaviour is usually the most efficient. Jeffrey
Friedl's book contains a lot of useful general discussion about optimizing
regular expressions for efficient performance. This document contains a few
observations about PCRE.
</P>
<P>
Using Unicode character properties (the \p, \P, and \X escapes) is slow,
because PCRE has to scan a structure that contains data for over fifteen
thousand characters whenever it needs a character's property. If you can find
an alternative pattern that does not use character properties, it will probably
be faster.
</P>
<P>
When a pattern begins with .* not in parentheses, or in parentheses that are
not the subject of a backreference, and the PCRE_DOTALL option is set, the
pattern is implicitly anchored by PCRE, since it can match only at the start of
a subject string. However, if PCRE_DOTALL is not set, PCRE cannot make this
optimization, because the . metacharacter does not then match a newline, and if
the subject string contains newlines, the pattern may match from the character
immediately following one of them instead of from the very start. For example,
the pattern
<pre>
.*second
</pre>
matches the subject "first\nand second" (where \n stands for a newline
character), with the match starting at the seventh character. In order to do
this, PCRE has to retry the match starting after every newline in the subject.
</P>
<P>
If you are using such a pattern with subject strings that do not contain
newlines, the best performance is obtained by setting PCRE_DOTALL, or starting
the pattern with ^.* or ^.*? to indicate explicit anchoring. That saves PCRE
from having to scan along the subject looking for a newline to restart at.
</P>
<P>
Beware of patterns that contain nested indefinite repeats. These can take a
long time to run when applied to a string that does not match. Consider the
pattern fragment
<pre>
(a+)*
</pre>
This can match "aaaa" in 33 different ways, and this number increases very
rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
times, and for each of those cases other than 0, the + repeats can match
different numbers of times.) When the remainder of the pattern is such that the
entire match is going to fail, PCRE has in principle to try every possible
variation, and this can take an extremely long time.
</P>
<P>
An optimization catches some of the more simple cases such as
<pre>
(a+)*b
</pre>
where a literal character follows. Before embarking on the standard matching
procedure, PCRE checks that there is a "b" later in the subject string, and if
there is not, it fails the match immediately. However, when there is no
following literal this optimization cannot be used. You can see the difference
by comparing the behaviour of
<pre>
(a+)*\d
</pre>
with the pattern above. The former gives a failure almost instantly when
applied to a whole line of "a" characters, whereas the latter takes an
appreciable time with strings longer than about 20 characters.
</P>
<P>
In many cases, the solution to this kind of performance issue is to use an
atomic group or a possessive quantifier.
</P>
<P>
Last updated: 28 February 2005
<br>
Copyright &copy; 1997-2005 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,244 @@
<html>
<head>
<title>pcreposix specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcreposix man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF POSIX API</a>
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
<li><a name="TOC3" href="#SEC3">COMPILING A PATTERN</a>
<li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a>
<li><a name="TOC5" href="#SEC5">MATCHING A PATTERN</a>
<li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a>
<li><a name="TOC7" href="#SEC7">MEMORY USAGE</a>
<li><a name="TOC8" href="#SEC8">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br>
<P>
<b>#include &#60;pcreposix.h&#62;</b>
</P>
<P>
<b>int regcomp(regex_t *<i>preg</i>, const char *<i>pattern</i>,</b>
<b>int <i>cflags</i>);</b>
</P>
<P>
<b>int regexec(regex_t *<i>preg</i>, const char *<i>string</i>,</b>
<b>size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
</P>
<P>
<b>size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
<b>char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
</P>
<P>
<b>void regfree(regex_t *<i>preg</i>);</b>
</P>
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
<P>
This set of functions provides a POSIX-style API to the PCRE regular expression
package. See the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation for a description of PCRE's native API, which contains much
additional functionality.
</P>
<P>
The functions described here are just wrapper functions that ultimately call
the PCRE native API. Their prototypes are defined in the <b>pcreposix.h</b>
header file, and on Unix systems the library itself is called
<b>pcreposix.a</b>, so can be accessed by adding <b>-lpcreposix</b> to the
command for linking an application that uses them. Because the POSIX functions
call the native ones, it is also necessary to add <b>-lpcre</b>.
</P>
<P>
I have implemented only those option bits that can be reasonably mapped to PCRE
native options. In addition, the option REG_EXTENDED is defined with the value
zero. This has no effect, but since programs that are written to the POSIX
interface often use it, this makes it easier to slot in PCRE as a replacement
library. Other POSIX options are not even defined.
</P>
<P>
When PCRE is called via these functions, it is only the API that is POSIX-like
in style. The syntax and semantics of the regular expressions themselves are
still those of Perl, subject to the setting of various PCRE options, as
described below. "POSIX-like in style" means that the API approximates to the
POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
domains it is probably even less compatible.
</P>
<P>
The header for these functions is supplied as <b>pcreposix.h</b> to avoid any
potential clash with other POSIX libraries. It can, of course, be renamed or
aliased as <b>regex.h</b>, which is the "correct" name. It provides two
structure types, <i>regex_t</i> for compiled internal forms, and
<i>regmatch_t</i> for returning captured substrings. It also defines some
constants whose names start with "REG_"; these are used for setting options and
identifying error codes.
</P>
<P>
</P>
<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
<P>
The function <b>regcomp()</b> is called to compile a pattern into an
internal form. The pattern is a C string terminated by a binary zero, and
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
to a <b>regex_t</b> structure that is used as a base for storing information
about the compiled regular expression.
</P>
<P>
The argument <i>cflags</i> is either zero, or contains one or more of the bits
defined by the following macros:
<pre>
REG_DOTALL
</pre>
The PCRE_DOTALL option is set when the regular expression is passed for
compilation to the native function. Note that REG_DOTALL is not part of the
POSIX standard.
<pre>
REG_ICASE
</pre>
The PCRE_CASELESS option is set when the regular expression is passed for
compilation to the native function.
<pre>
REG_NEWLINE
</pre>
The PCRE_MULTILINE option is set when the regular expression is passed for
compilation to the native function. Note that this does <i>not</i> mimic the
defined POSIX behaviour for REG_NEWLINE (see the following section).
<pre>
REG_NOSUB
</pre>
The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
for compilation to the native function. In addition, when a pattern that is
compiled with this flag is passed to <b>regexec()</b> for matching, the
<i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings
are returned.
<pre>
REG_UTF8
</pre>
The PCRE_UTF8 option is set when the regular expression is passed for
compilation to the native function. This causes the pattern itself and all data
strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
is not part of the POSIX standard.
</P>
<P>
In the absence of these flags, no options are passed to the native function.
This means the the regex is compiled with PCRE default semantics. In
particular, the way it handles newline characters in the subject string is the
Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
<i>some</i> of the effects specified for REG_NEWLINE. It does not affect the way
newlines are matched by . (they aren't) or by a negative class such as [^a]
(they are).
</P>
<P>
The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
<i>preg</i> structure is filled in on success, and one member of the structure
is public: <i>re_nsub</i> contains the number of capturing subpatterns in
the regular expression. Various error codes are defined in the header file.
</P>
<br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br>
<P>
This area is not simple, because POSIX and Perl take different views of things.
It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
intended to be a POSIX engine. The following table lists the different
possibilities for matching newline characters in PCRE:
<pre>
Default Change with
. matches newline no PCRE_DOTALL
newline matches [^a] yes not changeable
$ matches \n at end yes PCRE_DOLLARENDONLY
$ matches \n in middle no PCRE_MULTILINE
^ matches \n in middle no PCRE_MULTILINE
</pre>
This is the equivalent table for POSIX:
<pre>
Default Change with
. matches newline yes REG_NEWLINE
newline matches [^a] yes REG_NEWLINE
$ matches \n at end no REG_NEWLINE
$ matches \n in middle no REG_NEWLINE
^ matches \n in middle no REG_NEWLINE
</pre>
PCRE's behaviour is the same as Perl's, except that there is no equivalent for
PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop
newline from matching [^a].
</P>
<P>
The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the
REG_NEWLINE action.
</P>
<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br>
<P>
The function <b>regexec()</b> is called to match a compiled pattern <i>preg</i>
against a given <i>string</i>, which is terminated by a zero byte, subject to
the options in <i>eflags</i>. These can be:
<pre>
REG_NOTBOL
</pre>
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
function.
<pre>
REG_NOTEOL
</pre>
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
function.
</P>
<P>
If the pattern was compiled with the REG_NOSUB flag, no data about any matched
strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
<b>regexec()</b> are ignored.
</P>
<P>
Otherwise,the portion of the string that was matched, and also any captured
substrings, are returned via the <i>pmatch</i> argument, which points to an
array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the
members <i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first
character of each substring and the offset to the first character after the end
of each substring, respectively. The 0th element of the vector relates to the
entire portion of <i>string</i> that was matched; subsequent elements relate to
the capturing subpatterns of the regular expression. Unused entries in the
array have both structure members set to -1.
</P>
<P>
A successful match yields a zero return; various error codes are defined in the
header file, of which REG_NOMATCH is the "expected" failure code.
</P>
<br><a name="SEC6" href="#TOC1">ERROR MESSAGES</a><br>
<P>
The <b>regerror()</b> function maps a non-zero errorcode from either
<b>regcomp()</b> or <b>regexec()</b> to a printable message. If <i>preg</i> is not
NULL, the error should have arisen from the use of that structure. A message
terminated by a binary zero is placed in <i>errbuf</i>. The length of the
message, including the zero, is limited to <i>errbuf_size</i>. The yield of the
function is the size of buffer needed to hold the whole message.
</P>
<br><a name="SEC7" href="#TOC1">MEMORY USAGE</a><br>
<P>
Compiling a regular expression causes memory to be allocated and associated
with the <i>preg</i> structure. The function <b>regfree()</b> frees all such
memory, after which <i>preg</i> may no longer be used as a compiled expression.
</P>
<br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
University Computing Service,
<br>
Cambridge CB2 3QG, England.
</P>
<P>
Last updated: 16 January 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,140 @@
<html>
<head>
<title>pcreprecompile specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcreprecompile man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE PATTERNS</a>
<li><a name="TOC2" href="#SEC2">SAVING A COMPILED PATTERN</a>
<li><a name="TOC3" href="#SEC3">RE-USING A PRECOMPILED PATTERN</a>
<li><a name="TOC4" href="#SEC4">COMPATIBILITY WITH DIFFERENT PCRE RELEASES</a>
</ul>
<br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE PATTERNS</a><br>
<P>
If you are running an application that uses a large number of regular
expression patterns, it may be useful to store them in a precompiled form
instead of having to compile them every time the application is run.
If you are not using any private character tables (see the
<a href="pcre_maketables.html"><b>pcre_maketables()</b></a>
documentation), this is relatively straightforward. If you are using private
tables, it is a little bit more complicated.
</P>
<P>
If you save compiled patterns to a file, you can copy them to a different host
and run them there. This works even if the new host has the opposite endianness
to the one on which the patterns were compiled. There may be a small
performance penalty, but it should be insignificant.
</P>
<br><a name="SEC2" href="#TOC1">SAVING A COMPILED PATTERN</a><br>
<P>
The value returned by <b>pcre_compile()</b> points to a single block of memory
that holds the compiled pattern and associated data. You can find the length of
this block in bytes by calling <b>pcre_fullinfo()</b> with an argument of
PCRE_INFO_SIZE. You can then save the data in any appropriate manner. Here is
sample code that compiles a pattern and writes it to a file. It assumes that
the variable <i>fd</i> refers to a file that is open for output:
<pre>
int erroroffset, rc, size;
char *error;
pcre *re;
re = pcre_compile("my pattern", 0, &error, &erroroffset, NULL);
if (re == NULL) { ... handle errors ... }
rc = pcre_fullinfo(re, NULL, PCRE_INFO_SIZE, &size);
if (rc &#60; 0) { ... handle errors ... }
rc = fwrite(re, 1, size, fd);
if (rc != size) { ... handle errors ... }
</pre>
In this example, the bytes that comprise the compiled pattern are copied
exactly. Note that this is binary data that may contain any of the 256 possible
byte values. On systems that make a distinction between binary and non-binary
data, be sure that the file is opened for binary output.
</P>
<P>
If you want to write more than one pattern to a file, you will have to devise a
way of separating them. For binary data, preceding each pattern with its length
is probably the most straightforward approach. Another possibility is to write
out the data in hexadecimal instead of binary, one pattern to a line.
</P>
<P>
Saving compiled patterns in a file is only one possible way of storing them for
later use. They could equally well be saved in a database, or in the memory of
some daemon process that passes them via sockets to the processes that want
them.
</P>
<P>
If the pattern has been studied, it is also possible to save the study data in
a similar way to the compiled pattern itself. When studying generates
additional information, <b>pcre_study()</b> returns a pointer to a
<b>pcre_extra</b> data block. Its format is defined in the
<a href="pcreapi.html#extradata">section on matching a pattern</a>
in the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation. The <i>study_data</i> field points to the binary study data, and
this is what you must save (not the <b>pcre_extra</b> block itself). The length
of the study data can be obtained by calling <b>pcre_fullinfo()</b> with an
argument of PCRE_INFO_STUDYSIZE. Remember to check that <b>pcre_study()</b> did
return a non-NULL value before trying to save the study data.
</P>
<br><a name="SEC3" href="#TOC1">RE-USING A PRECOMPILED PATTERN</a><br>
<P>
Re-using a precompiled pattern is straightforward. Having reloaded it into main
memory, you pass its pointer to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> in
the usual way. This should work even on another host, and even if that host has
the opposite endianness to the one where the pattern was compiled.
</P>
<P>
However, if you passed a pointer to custom character tables when the pattern
was compiled (the <i>tableptr</i> argument of <b>pcre_compile()</b>), you must
now pass a similar pointer to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>,
because the value saved with the compiled pattern will obviously be nonsense. A
field in a <b>pcre_extra()</b> block is used to pass this data, as described in
the
<a href="pcreapi.html#extradata">section on matching a pattern</a>
in the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation.
</P>
<P>
If you did not provide custom character tables when the pattern was compiled,
the pointer in the compiled pattern is NULL, which causes <b>pcre_exec()</b> to
use PCRE's internal tables. Thus, you do not need to take any special action at
run time in this case.
</P>
<P>
If you saved study data with the compiled pattern, you need to create your own
<b>pcre_extra</b> data block and set the <i>study_data</i> field to point to the
reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in the
<i>flags</i> field to indicate that study data is present. Then pass the
<b>pcre_extra</b> block to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> in the
usual way.
</P>
<br><a name="SEC4" href="#TOC1">COMPATIBILITY WITH DIFFERENT PCRE RELEASES</a><br>
<P>
The layout of the control block that is at the start of the data that makes up
a compiled pattern was changed for release 5.0. If you have any saved patterns
that were compiled with previous releases (not a facility that was previously
advertised), you will have to recompile them for release 5.0. However, from now
on, it should be possible to make changes in a compatible manner.
</P>
<P>
Notwithstanding the above, if you have any saved patterns in UTF-8 mode that
use \p or \P that were compiled with any release up to and including 6.4, you
will have to recompile them for release 6.5 and above.
</P>
<P>
Last updated: 01 February 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,81 @@
<html>
<head>
<title>pcresample specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcresample man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
PCRE SAMPLE PROGRAM
</b><br>
<P>
A simple, complete demonstration program, to get you started with using PCRE,
is supplied in the file <i>pcredemo.c</i> in the PCRE distribution.
</P>
<P>
The program compiles the regular expression that is its first argument, and
matches it against the subject string in its second argument. No PCRE options
are set, and default character tables are used. If matching succeeds, the
program outputs the portion of the subject that matched, together with the
contents of any captured substrings.
</P>
<P>
If the -g option is given on the command line, the program then goes on to
check for further matches of the same regular expression in the same subject
string. The logic is a little bit tricky because of the possibility of matching
an empty string. Comments in the code explain what is going on.
</P>
<P>
If PCRE is installed in the standard include and library directories for your
system, you should be able to compile the demonstration program using this
command:
<pre>
gcc -o pcredemo pcredemo.c -lpcre
</pre>
If PCRE is installed elsewhere, you may need to add additional options to the
command line. For example, on a Unix-like system that has PCRE installed in
<i>/usr/local</i>, you can compile the demonstration program using a command
like this:
<pre>
gcc -o pcredemo -I/usr/local/include pcredemo.c -L/usr/local/lib -lpcre
</pre>
Once you have compiled the demonstration program, you can run simple tests like
this:
<pre>
./pcredemo 'cat|dog' 'the cat sat on the mat'
./pcredemo -g 'cat|dog' 'the dog sat on the cat'
</pre>
Note that there is a much more comprehensive test program, called
<a href="pcretest.html"><b>pcretest</b>,</a>
which supports many more facilities for testing regular expressions and the
PCRE library. The <b>pcredemo</b> program is provided as a simple coding
example.
</P>
<P>
On some operating systems (e.g. Solaris), when PCRE is not installed in the
standard library directory, you may get an error like this when you try to run
<b>pcredemo</b>:
<pre>
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
</pre>
This is caused by the way shared library support works on those systems. You
need to add
<pre>
-R/usr/local/lib
</pre>
(for example) to the compile command to get round this problem.
</P>
<P>
Last updated: 09 September 2004
<br>
Copyright &copy; 1997-2004 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,127 @@
<html>
<head>
<title>pcrestack specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcrestack man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<br><b>
PCRE DISCUSSION OF STACK USAGE
</b><br>
<P>
When you call <b>pcre_exec()</b>, it makes use of an internal function called
<b>match()</b>. This calls itself recursively at branch points in the pattern,
in order to remember the state of the match so that it can back up and try a
different alternative if the first one fails. As matching proceeds deeper and
deeper into the tree of possibilities, the recursion depth increases.
</P>
<P>
Not all calls of <b>match()</b> increase the recursion depth; for an item such
as a* it may be called several times at the same level, after matching
different numbers of a's. Furthermore, in a number of cases where the result of
the recursive call would immediately be passed back as the result of the
current call (a "tail recursion"), the function is just restarted instead.
</P>
<P>
The <b>pcre_dfa_exec()</b> function operates in an entirely different way, and
hardly uses recursion at all. The limit on its complexity is the amount of
workspace it is given. The comments that follow do NOT apply to
<b>pcre_dfa_exec()</b>; they are relevant only for <b>pcre_exec()</b>.
</P>
<P>
You can set limits on the number of times that <b>match()</b> is called, both in
total and recursively. If the limit is exceeded, an error occurs. For details,
see the
<a href="pcreapi.html#extradata">section on extra data for <b>pcre_exec()</b></a>
in the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation.
</P>
<P>
Each time that <b>match()</b> is actually called recursively, it uses memory
from the process stack. For certain kinds of pattern and data, very large
amounts of stack may be needed, despite the recognition of "tail recursion".
You can often reduce the amount of recursion, and therefore the amount of stack
used, by modifying the pattern that is being matched. Consider, for example,
this pattern:
<pre>
([^&#60;]|&#60;(?!inet))+
</pre>
It matches from wherever it starts until it encounters "&#60;inet" or the end of
the data, and is the kind of pattern that might be used when processing an XML
file. Each iteration of the outer parentheses matches either one character that
is not "&#60;" or a "&#60;" that is not followed by "inet". However, each time a
parenthesis is processed, a recursion occurs, so this formulation uses a stack
frame for each matched character. For a long string, a lot of stack is
required. Consider now this rewritten pattern, which matches exactly the same
strings:
<pre>
([^&#60;]++|&#60;(?!inet))
</pre>
This uses very much less stack, because runs of characters that do not contain
"&#60;" are "swallowed" in one item inside the parentheses. Recursion happens only
when a "&#60;" character that is not followed by "inet" is encountered (and we
assume this is relatively rare). A possessive quantifier is used to stop any
backtracking into the runs of non-"&#60;" characters, but that is not related to
stack usage.
</P>
<P>
In environments where stack memory is constrained, you might want to compile
PCRE to use heap memory instead of stack for remembering back-up points. This
makes it run a lot more slowly, however. Details of how to do this are given in
the
<a href="pcrebuild.html"><b>pcrebuild</b></a>
documentation.
</P>
<P>
In Unix-like environments, there is not often a problem with the stack, though
the default limit on stack size varies from system to system. Values from 8Mb
to 64Mb are common. You can find your default limit by running the command:
<pre>
ulimit -s
</pre>
The effect of running out of stack is often SIGSEGV, though sometimes an error
message is given. You can normally increase the limit on stack size by code
such as this:
<pre>
struct rlimit rlim;
getrlimit(RLIMIT_STACK, &rlim);
rlim.rlim_cur = 100*1024*1024;
setrlimit(RLIMIT_STACK, &rlim);
</pre>
This reads the current limits (soft and hard) using <b>getrlimit()</b>, then
attempts to increase the soft limit to 100Mb using <b>setrlimit()</b>. You must
do this before calling <b>pcre_exec()</b>.
</P>
<P>
PCRE has an internal counter that can be used to limit the depth of recursion,
and thus cause <b>pcre_exec()</b> to give an error code before it runs out of
stack. By default, the limit is very large, and unlikely ever to operate. It
can be changed when PCRE is built, and it can also be set when
<b>pcre_exec()</b> is called. For details of these interfaces, see the
<a href="pcrebuild.html"><b>pcrebuild</b></a>
and
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation.
</P>
<P>
As a very rough rule of thumb, you should reckon on about 500 bytes per
recursion. Thus, if you want to limit your stack usage to 8Mb, you
should set the limit at 16000 recursions. A 64Mb stack, on the other hand, can
support around 128000 recursions. The <b>pcretest</b> test program has a command
line option (<b>-S</b>) that can be used to increase its stack.
</P>
<P>
Last updated: 29 June 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>

View File

@@ -0,0 +1,616 @@
<html>
<head>
<title>pcretest specification</title>
</head>
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
<h1>pcretest man page</h1>
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>
<p>
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
<br>
<ul>
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
<li><a name="TOC2" href="#SEC2">OPTIONS</a>
<li><a name="TOC3" href="#SEC3">DESCRIPTION</a>
<li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a>
<li><a name="TOC5" href="#SEC5">DATA LINES</a>
<li><a name="TOC6" href="#SEC6">THE ALTERNATIVE MATCHING FUNCTION</a>
<li><a name="TOC7" href="#SEC7">DEFAULT OUTPUT FROM PCRETEST</a>
<li><a name="TOC8" href="#SEC8">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
<li><a name="TOC9" href="#SEC9">RESTARTING AFTER A PARTIAL MATCH</a>
<li><a name="TOC10" href="#SEC10">CALLOUTS</a>
<li><a name="TOC11" href="#SEC11">SAVING AND RELOADING COMPILED PATTERNS</a>
<li><a name="TOC12" href="#SEC12">AUTHOR</a>
</ul>
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
<P>
<b>pcretest [options] [source] [destination]</b>
<br>
<br>
<b>pcretest</b> was written as a test program for the PCRE regular expression
library itself, but it can also be used for experimenting with regular
expressions. This document describes the features of the test program; for
details of the regular expressions themselves, see the
<a href="pcrepattern.html"><b>pcrepattern</b></a>
documentation. For details of the PCRE library function calls and their
options, see the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation.
</P>
<br><a name="SEC2" href="#TOC1">OPTIONS</a><br>
<P>
<b>-C</b>
Output the version number of the PCRE library, and all available information
about the optional features that are included, and then exit.
</P>
<P>
<b>-d</b>
Behave as if each regex has the <b>/D</b> (debug) modifier; the internal
form is output after compilation.
</P>
<P>
<b>-dfa</b>
Behave as if each data line contains the \D escape sequence; this causes the
alternative matching function, <b>pcre_dfa_exec()</b>, to be used instead of the
standard <b>pcre_exec()</b> function (more detail is given below).
</P>
<P>
<b>-i</b>
Behave as if each regex has the <b>/I</b> modifier; information about the
compiled pattern is given after compilation.
</P>
<P>
<b>-m</b>
Output the size of each compiled pattern after it has been compiled. This is
equivalent to adding <b>/M</b> to each regular expression. For compatibility
with earlier versions of pcretest, <b>-s</b> is a synonym for <b>-m</b>.
</P>
<P>
<b>-o</b> <i>osize</i>
Set the number of elements in the output vector that is used when calling
<b>pcre_exec()</b> to be <i>osize</i>. The default value is 45, which is enough
for 14 capturing subexpressions. The vector size can be changed for individual
matching calls by including \O in the data line (see below).
</P>
<P>
<b>-p</b>
Behave as if each regex has the <b>/P</b> modifier; the POSIX wrapper API is
used to call PCRE. None of the other options has any effect when <b>-p</b> is
set.
</P>
<P>
<b>-q</b>
Do not output the version number of <b>pcretest</b> at the start of execution.
</P>
<P>
<b>-S</b> <i>size</i>
On Unix-like systems, set the size of the runtime stack to <i>size</i>
megabytes.
</P>
<P>
<b>-t</b>
Run each compile, study, and match many times with a timer, and output
resulting time per compile or match (in milliseconds). Do not set <b>-m</b> with
<b>-t</b>, because you will then get the size output a zillion times, and the
timing will be distorted.
</P>
<br><a name="SEC3" href="#TOC1">DESCRIPTION</a><br>
<P>
If <b>pcretest</b> is given two filename arguments, it reads from the first and
writes to the second. If it is given only one filename argument, it reads from
that file and writes to stdout. Otherwise, it reads from stdin and writes to
stdout, and prompts for each line of input, using "re&#62;" to prompt for regular
expressions, and "data&#62;" to prompt for data lines.
</P>
<P>
The program handles any number of sets of input on a single input file. Each
set starts with a regular expression, and continues with any number of data
lines to be matched against the pattern.
</P>
<P>
Each data line is matched separately and independently. If you want to do
multi-line matches, you have to use the \n escape sequence (or \r or \r\n,
depending on the newline setting) in a single line of input to encode the
newline characters. There is no limit on the length of data lines; the input
buffer is automatically extended if it is too small.
</P>
<P>
An empty line signals the end of the data lines, at which point a new regular
expression is read. The regular expressions are given enclosed in any
non-alphanumeric delimiters other than backslash, for example:
<pre>
/(a|bc)x+yz/
</pre>
White space before the initial delimiter is ignored. A regular expression may
be continued over several input lines, in which case the newline characters are
included within it. It is possible to include the delimiter within the pattern
by escaping it, for example
<pre>
/abc\/def/
</pre>
If you do so, the escape and the delimiter form part of the pattern, but since
delimiters are always non-alphanumeric, this does not affect its interpretation.
If the terminating delimiter is immediately followed by a backslash, for
example,
<pre>
/abc/\
</pre>
then a backslash is added to the end of the pattern. This is done to provide a
way of testing the error condition that arises if a pattern finishes with a
backslash, because
<pre>
/abc\/
</pre>
is interpreted as the first line of a pattern that starts with "abc/", causing
pcretest to read the next line as a continuation of the regular expression.
</P>
<br><a name="SEC4" href="#TOC1">PATTERN MODIFIERS</a><br>
<P>
A pattern may be followed by any number of modifiers, which are mostly single
characters. Following Perl usage, these are referred to below as, for example,
"the <b>/i</b> modifier", even though the delimiter of the pattern need not
always be a slash, and no slash is used when writing modifiers. Whitespace may
appear between the final pattern delimiter and the first modifier, and between
the modifiers themselves.
</P>
<P>
The <b>/i</b>, <b>/m</b>, <b>/s</b>, and <b>/x</b> modifiers set the PCRE_CASELESS,
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when
<b>pcre_compile()</b> is called. These four modifier letters have the same
effect as they do in Perl. For example:
<pre>
/caseless/i
</pre>
The following table shows additional modifiers for setting PCRE options that do
not correspond to anything in Perl:
<pre>
<b>/A</b> PCRE_ANCHORED
<b>/C</b> PCRE_AUTO_CALLOUT
<b>/E</b> PCRE_DOLLAR_ENDONLY
<b>/f</b> PCRE_FIRSTLINE
<b>/J</b> PCRE_DUPNAMES
<b>/N</b> PCRE_NO_AUTO_CAPTURE
<b>/U</b> PCRE_UNGREEDY
<b>/X</b> PCRE_EXTRA
<b>/&#60;cr&#62;</b> PCRE_NEWLINE_CR
<b>/&#60;lf&#62;</b> PCRE_NEWLINE_LF
<b>/&#60;crlf&#62;</b> PCRE_NEWLINE_CRLF
</pre>
Those specifying line endings are literal strings as shown. Details of the
meanings of these PCRE options are given in the
<a href="pcreapi.html"><b>pcreapi</b></a>
documentation.
</P>
<br><b>
Finding all matches in a string
</b><br>
<P>
Searching for all possible matches within each subject string can be requested
by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called
again to search the remainder of the subject string. The difference between
<b>/g</b> and <b>/G</b> is that the former uses the <i>startoffset</i> argument to
<b>pcre_exec()</b> to start searching at a new point within the entire string
(which is in effect what Perl does), whereas the latter passes over a shortened
substring. This makes a difference to the matching process if the pattern
begins with a lookbehind assertion (including \b or \B).
</P>
<P>
If any call to <b>pcre_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches an
empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
flags set in order to search for another, non-empty, match at the same point.
If this second match fails, the start offset is advanced by one, and the normal
match is retried. This imitates the way Perl handles such cases when using the
<b>/g</b> modifier or the <b>split()</b> function.
</P>
<br><b>
Other modifiers
</b><br>
<P>
There are yet more modifiers for controlling the way <b>pcretest</b>
operates.
</P>
<P>
The <b>/+</b> modifier requests that as well as outputting the substring that
matched the entire pattern, pcretest should in addition output the remainder of
the subject string. This is useful for tests where the subject contains
multiple copies of the same substring.
</P>
<P>
The <b>/L</b> modifier must be followed directly by the name of a locale, for
example,
<pre>
/pattern/Lfr_FR
</pre>
For this reason, it must be the last modifier. The given locale is set,
<b>pcre_maketables()</b> is called to build a set of character tables for the
locale, and this is then passed to <b>pcre_compile()</b> when compiling the
regular expression. Without an <b>/L</b> modifier, NULL is passed as the tables
pointer; that is, <b>/L</b> applies only to the expression on which it appears.
</P>
<P>
The <b>/I</b> modifier requests that <b>pcretest</b> output information about the
compiled pattern (whether it is anchored, has a fixed first character, and
so on). It does this by calling <b>pcre_fullinfo()</b> after compiling a
pattern. If the pattern is studied, the results of that are also output.
</P>
<P>
The <b>/D</b> modifier is a PCRE debugging feature, which also assumes <b>/I</b>.
It causes the internal form of compiled regular expressions to be output after
compilation. If the pattern was studied, the information returned is also
output.
</P>
<P>
The <b>/F</b> modifier causes <b>pcretest</b> to flip the byte order of the
fields in the compiled pattern that contain 2-byte and 4-byte numbers. This
facility is for testing the feature in PCRE that allows it to execute patterns
that were compiled on a host with a different endianness. This feature is not
available when the POSIX interface to PCRE is being used, that is, when the
<b>/P</b> pattern modifier is specified. See also the section about saving and
reloading compiled patterns below.
</P>
<P>
The <b>/S</b> modifier causes <b>pcre_study()</b> to be called after the
expression has been compiled, and the results used when the expression is
matched.
</P>
<P>
The <b>/M</b> modifier causes the size of memory block used to hold the compiled
pattern to be output.
</P>
<P>
The <b>/P</b> modifier causes <b>pcretest</b> to call PCRE via the POSIX wrapper
API rather than its native API. When this is done, all other modifiers except
<b>/i</b>, <b>/m</b>, and <b>/+</b> are ignored. REG_ICASE is set if <b>/i</b> is
present, and REG_NEWLINE is set if <b>/m</b> is present. The wrapper functions
force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
</P>
<P>
The <b>/8</b> modifier causes <b>pcretest</b> to call PCRE with the PCRE_UTF8
option set. This turns on support for UTF-8 character handling in PCRE,
provided that it was compiled with this support enabled. This modifier also
causes any non-printing characters in output strings to be printed using the
\x{hh...} notation if they are valid UTF-8 sequences.
</P>
<P>
If the <b>/?</b> modifier is used with <b>/8</b>, it causes <b>pcretest</b> to
call <b>pcre_compile()</b> with the PCRE_NO_UTF8_CHECK option, to suppress the
checking of the string for UTF-8 validity.
</P>
<br><a name="SEC5" href="#TOC1">DATA LINES</a><br>
<P>
Before each data line is passed to <b>pcre_exec()</b>, leading and trailing
whitespace is removed, and it is then scanned for \ escapes. Some of these are
pretty esoteric features, intended for checking out some of the more
complicated features of PCRE. If you are just testing "ordinary" regular
expressions, you probably don't need any of these. The following escapes are
recognized:
<pre>
\a alarm (= BEL)
\b backspace
\e escape
\f formfeed
\n newline
\qdd set the PCRE_MATCH_LIMIT limit to dd (any number of digits)
\r carriage return
\t tab
\v vertical tab
\nnn octal character (up to 3 octal digits)
\xhh hexadecimal character (up to 2 hex digits)
\x{hh...} hexadecimal character, any number of digits in UTF-8 mode
\A pass the PCRE_ANCHORED option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\B pass the PCRE_NOTBOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\Cdd call pcre_copy_substring() for substring dd after a successful match (number less than 32)
\Cname call pcre_copy_named_substring() for substring "name" after a successful match (name termin-
ated by next non alphanumeric character)
\C+ show the current captured substrings at callout time
\C- do not supply a callout function
\C!n return 1 instead of 0 when callout number n is reached
\C!n!m return 1 instead of 0 when callout number n is reached for the nth time
\C*n pass the number n (may be negative) as callout data; this is used as the callout return value
\D use the <b>pcre_dfa_exec()</b> match function
\F only shortest match for <b>pcre_dfa_exec()</b>
\Gdd call pcre_get_substring() for substring dd after a successful match (number less than 32)
\Gname call pcre_get_named_substring() for substring "name" after a successful match (name termin-
ated by next non-alphanumeric character)
\L call pcre_get_substringlist() after a successful match
\M discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings
\N pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\Odd set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits)
\P pass the PCRE_PARTIAL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits)
\R pass the PCRE_DFA_RESTART option to <b>pcre_dfa_exec()</b>
\S output details of memory get/free calls during matching
\Z pass the PCRE_NOTEOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\? pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\&#62;dd start the match at offset dd (any number of digits);
this sets the <i>startoffset</i> argument for <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\&#60;cr&#62; pass the PCRE_NEWLINE_CR option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\&#60;lf&#62; pass the PCRE_NEWLINE_LF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
\&#60;crlf&#62; pass the PCRE_NEWLINE_CRLF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
</pre>
The escapes that specify line endings are literal strings, exactly as shown.
A backslash followed by anything else just escapes the anything else. If the
very last character is a backslash, it is ignored. This gives a way of passing
an empty line as data, since a real empty line terminates the data input.
</P>
<P>
If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with
different values in the <i>match_limit</i> and <i>match_limit_recursion</i>
fields of the <b>pcre_extra</b> data structure, until it finds the minimum
numbers for each parameter that allow <b>pcre_exec()</b> to complete. The
<i>match_limit</i> number is a measure of the amount of backtracking that takes
place, and checking it out can be instructive. For most simple matches, the
number is quite small, but for patterns with very large numbers of matching
possibilities, it can become large very quickly with increasing length of
subject string. The <i>match_limit_recursion</i> number is a measure of how much
stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed
to complete the match attempt.
</P>
<P>
When \O is used, the value specified may be higher or lower than the size set
by the <b>-O</b> command line option (or defaulted to 45); \O applies only to
the call of <b>pcre_exec()</b> for the line in which it appears.
</P>
<P>
If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper
API to be used, the only option-setting sequences that have any effect are \B
and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to
<b>regexec()</b>.
</P>
<P>
The use of \x{hh...} to represent UTF-8 characters is not dependent on the use
of the <b>/8</b> modifier on the pattern. It is recognized always. There may be
any number of hexadecimal digits inside the braces. The result is from one to
six bytes, encoded according to the UTF-8 rules.
</P>
<br><a name="SEC6" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
<P>
By default, <b>pcretest</b> uses the standard PCRE matching function,
<b>pcre_exec()</b> to match each data line. From release 6.0, PCRE supports an
alternative matching function, <b>pcre_dfa_test()</b>, which operates in a
different way, and has some restrictions. The differences between the two
functions are described in the
<a href="pcrematching.html"><b>pcrematching</b></a>
documentation.
</P>
<P>
If a data line contains the \D escape sequence, or if the command line
contains the <b>-dfa</b> option, the alternative matching function is called.
This function finds all possible matches at a given point. If, however, the \F
escape sequence is present in the data line, it stops after the first match is
found. This is always the shortest possible match.
</P>
<br><a name="SEC7" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br>
<P>
This section describes the output when the normal matching function,
<b>pcre_exec()</b>, is being used.
</P>
<P>
When a match succeeds, pcretest outputs the list of captured substrings that
<b>pcre_exec()</b> returns, starting with number 0 for the string that matched
the whole pattern. Otherwise, it outputs "No match" or "Partial match"
when <b>pcre_exec()</b> returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
respectively, and otherwise the PCRE negative error number. Here is an example
of an interactive <b>pcretest</b> run.
<pre>
$ pcretest
PCRE version 5.00 07-Sep-2004
re&#62; /^abc(\d+)/
data&#62; abc123
0: abc123
1: 123
data&#62; xyz
No match
</pre>
If the strings contain any non-printing characters, they are output as \0x
escapes, or as \x{...} escapes if the <b>/8</b> modifier was present on the
pattern. If the pattern has the <b>/+</b> modifier, the output for substring 0
is followed by the the rest of the subject string, identified by "0+" like
this:
<pre>
re&#62; /cat/+
data&#62; cataract
0: cat
0+ aract
</pre>
If the pattern has the <b>/g</b> or <b>/G</b> modifier, the results of successive
matching attempts are output in sequence, like this:
<pre>
re&#62; /\Bi(\w\w)/g
data&#62; Mississippi
0: iss
1: ss
0: iss
1: ss
0: ipp
1: pp
</pre>
"No match" is output only if the first match attempt fails.
</P>
<P>
If any of the sequences <b>\C</b>, <b>\G</b>, or <b>\L</b> are present in a
data line that is successfully matched, the substrings extracted by the
convenience functions are output with C, G, or L after the string number
instead of a colon. This is in addition to the normal full list. The string
length (that is, the return from the extraction function) is given in
parentheses after each string for <b>\C</b> and <b>\G</b>.
</P>
<P>
Note that while patterns can be continued over several lines (a plain "&#62;"
prompt is used for continuations), data lines may not. However newlines can be
included in data by means of the \n escape (or \r or \r\n for those newline
settings).
</P>
<br><a name="SEC8" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
<P>
When the alternative matching function, <b>pcre_dfa_exec()</b>, is used (by
means of the \D escape sequence or the <b>-dfa</b> command line option), the
output consists of a list of all the matches that start at the first point in
the subject where there is at least one match. For example:
<pre>
re&#62; /(tang|tangerine|tan)/
data&#62; yellow tangerine\D
0: tangerine
1: tang
2: tan
</pre>
(Using the normal matching function on this data finds only "tang".) The
longest matching string is always given first (and numbered zero).
</P>
<P>
If \fB/g\P is present on the pattern, the search for further matches resumes
at the end of the longest match. For example:
<pre>
re&#62; /(tang|tangerine|tan)/g
data&#62; yellow tangerine and tangy sultana\D
0: tangerine
1: tang
2: tan
0: tang
1: tan
0: tan
</pre>
Since the matching function does not support substring capture, the escape
sequences that are concerned with captured substrings are not relevant.
</P>
<br><a name="SEC9" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
<P>
When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
indicating that the subject partially matched the pattern, you can restart the
match with additional subject data by means of the \R escape sequence. For
example:
<pre>
re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
data&#62; 23ja\P\D
Partial match: 23ja
data&#62; n05\R\D
0: n05
</pre>
For further information about partial matching, see the
<a href="pcrepartial.html"><b>pcrepartial</b></a>
documentation.
</P>
<br><a name="SEC10" href="#TOC1">CALLOUTS</a><br>
<P>
If the pattern contains any callout requests, <b>pcretest</b>'s callout function
is called during matching. This works with both matching functions. By default,
the called function displays the callout number, the start and current
positions in the text at the callout time, and the next pattern item to be
tested. For example, the output
<pre>
---&#62;pqrabcdef
0 ^ ^ \d
</pre>
indicates that callout number 0 occurred for a match attempt starting at the
fourth character of the subject string, when the pointer was at the seventh
character of the data, and when the next pattern item was \d. Just one
circumflex is output if the start and current positions are the same.
</P>
<P>
Callouts numbered 255 are assumed to be automatic callouts, inserted as a
result of the <b>/C</b> pattern modifier. In this case, instead of showing the
callout number, the offset in the pattern, preceded by a plus, is output. For
example:
<pre>
re&#62; /\d?[A-E]\*/C
data&#62; E*
---&#62;E*
+0 ^ \d?
+3 ^ [A-E]
+8 ^^ \*
+10 ^ ^
0: E*
</pre>
The callout function in <b>pcretest</b> returns zero (carry on matching) by
default, but you can use a \C item in a data line (as described above) to
change this.
</P>
<P>
Inserting callouts can be helpful when using <b>pcretest</b> to check
complicated regular expressions. For further information about callouts, see
the
<a href="pcrecallout.html"><b>pcrecallout</b></a>
documentation.
</P>
<br><a name="SEC11" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
<P>
The facilities described in this section are not available when the POSIX
inteface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is
specified.
</P>
<P>
When the POSIX interface is not in use, you can cause <b>pcretest</b> to write a
compiled pattern to a file, by following the modifiers with &#62; and a file name.
For example:
<pre>
/pattern/im &#62;/some/file
</pre>
See the
<a href="pcreprecompile.html"><b>pcreprecompile</b></a>
documentation for a discussion about saving and re-using compiled patterns.
</P>
<P>
The data that is written is binary. The first eight bytes are the length of the
compiled pattern data followed by the length of the optional study data, each
written as four bytes in big-endian order (most significant byte first). If
there is no study data (either the pattern was not studied, or studying did not
return any data), the second length is zero. The lengths are followed by an
exact copy of the compiled pattern. If there is additional study data, this
follows immediately after the compiled pattern. After writing the file,
<b>pcretest</b> expects to read a new pattern.
</P>
<P>
A saved pattern can be reloaded into <b>pcretest</b> by specifing &#60; and a file
name instead of a pattern. The name of the file must not contain a &#60; character,
as otherwise <b>pcretest</b> will interpret the line as a pattern delimited by &#60;
characters.
For example:
<pre>
re&#62; &#60;/some/file
Compiled regex loaded from /some/file
No study data
</pre>
When the pattern has been loaded, <b>pcretest</b> proceeds to read data lines in
the usual way.
</P>
<P>
You can copy a file written by <b>pcretest</b> to a different host and reload it
there, even if the new host has opposite endianness to the one on which the
pattern was compiled. For example, you can compile on an i86 machine and run on
a SPARC machine.
</P>
<P>
File names for saving and reloading can be absolute or relative, but note that
the shell facility of expanding a file name that starts with a tilde (~) is not
available.
</P>
<P>
The ability to save and reload files in <b>pcretest</b> is intended for testing
and experimentation. It is not intended for production use because only a
single pattern can be written to a file. Furthermore, there is no facility for
supplying custom character tables for use with a reloaded pattern. If the
original pattern was compiled with custom tables, an attempt to match a subject
string using a reloaded pattern is likely to cause <b>pcretest</b> to crash.
Finally, if you attempt to load a file that is not in the correct format, the
result is undefined.
</P>
<br><a name="SEC12" href="#TOC1">AUTHOR</a><br>
<P>
Philip Hazel
<br>
University Computing Service,
<br>
Cambridge CB2 3QG, England.
</P>
<P>
Last updated: 29 June 2006
<br>
Copyright &copy; 1997-2006 University of Cambridge.
<p>
Return to the <a href="index.html">PCRE index page</a>.
</p>