update to pcre 7.9

git-svn-id: http://svn.freeswitch.org/svn/freeswitch/trunk@13706 d0543943-73ff-0310-b7d9-9358b9ac24b2
2025-08-13 01:26:58 +00:00 · 2009-06-08 23:51:30 +00:00
parent a1e5add731
commit f7efdaa901
178 changed files with 43560 additions and 11382 deletions
--- a/libs/pcre/doc/html/pcrematching.html
+++ b/libs/pcre/doc/html/pcrematching.html
@@ -16,9 +16,11 @@ man page, in case the conversion went wrong.
 <li><a name="TOC1" href="#SEC1">PCRE MATCHING ALGORITHMS</a>
 <li><a name="TOC2" href="#SEC2">REGULAR EXPRESSIONS AS TREES</a>
 <li><a name="TOC3" href="#SEC3">THE STANDARD MATCHING ALGORITHM</a>
-<li><a name="TOC4" href="#SEC4">THE DFA MATCHING ALGORITHM</a>
-<li><a name="TOC5" href="#SEC5">ADVANTAGES OF THE DFA ALGORITHM</a>
-<li><a name="TOC6" href="#SEC6">DISADVANTAGES OF THE DFA ALGORITHM</a>
+<li><a name="TOC4" href="#SEC4">THE ALTERNATIVE MATCHING ALGORITHM</a>
+<li><a name="TOC5" href="#SEC5">ADVANTAGES OF THE ALTERNATIVE ALGORITHM</a>
+<li><a name="TOC6" href="#SEC6">DISADVANTAGES OF THE ALTERNATIVE ALGORITHM</a>
+<li><a name="TOC7" href="#SEC7">AUTHOR</a>
+<li><a name="TOC8" href="#SEC8">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">PCRE MATCHING ALGORITHMS</a><br>
 <P>
@@ -46,7 +48,7 @@ is matched against the string
  &#60;something&#62; &#60;something else&#62; &#60;something further&#62;
 </pre>
 there are three possible answers. The standard algorithm finds only one of
-them, whereas the DFA algorithm finds all three.
+them, whereas the alternative algorithm finds all three.
 </P>
 <br><a name="SEC2" href="#TOC1">REGULAR EXPRESSIONS AS TREES</a><br>
 <P>
@@ -59,8 +61,8 @@ correspond to the two matching algorithms provided by PCRE.
 </P>
 <br><a name="SEC3" href="#TOC1">THE STANDARD MATCHING ALGORITHM</a><br>
 <P>
-In the terminology of Jeffrey Friedl's book \fIMastering Regular
-Expressions\fP, the standard algorithm is an "NFA algorithm". It conducts a
+In the terminology of Jeffrey Friedl's book "Mastering Regular
+Expressions", the standard algorithm is an "NFA algorithm". It conducts a
 depth-first search of the pattern tree. That is, it proceeds along a single
 path through the tree, checking that the subject matches what is required. When
 there is a mismatch, the algorithm tries any alternatives at the current point,
@@ -83,14 +85,15 @@ straightforward for this algorithm to keep track of the substrings that are
 matched by portions of the pattern in parentheses. This provides support for
 capturing parentheses and back references.
 </P>
-<br><a name="SEC4" href="#TOC1">THE DFA MATCHING ALGORITHM</a><br>
+<br><a name="SEC4" href="#TOC1">THE ALTERNATIVE MATCHING ALGORITHM</a><br>
 <P>
-DFA stands for "deterministic finite automaton", but you do not need to
-understand the origins of that name. This algorithm conducts a breadth-first
-search of the tree. Starting from the first matching point in the subject, it
-scans the subject string from left to right, once, character by character, and
-as it does this, it remembers all the paths through the tree that represent
-valid matches.
+This algorithm conducts a breadth-first search of the tree. Starting from the
+first matching point in the subject, it scans the subject string from left to
+right, once, character by character, and as it does this, it remembers all the
+paths through the tree that represent valid matches. In Friedl's terminology,
+this is a kind of "DFA algorithm", though it is not implemented as a
+traditional finite state machine (it keeps multiple states active
+simultaneously).
 </P>
 <P>
 The scan continues until either the end of the subject is reached, or there are
@@ -114,12 +117,21 @@ matches that start at later positions.
 </P>
 <P>
 There are a number of features of PCRE regular expressions that are not
-supported by the DFA matching algorithm. They are as follows:
+supported by the alternative matching algorithm. They are as follows:
 </P>
 <P>
 1. Because the algorithm finds all possible matches, the greedy or ungreedy
 nature of repetition quantifiers is not relevant. Greedy and ungreedy
-quantifiers are treated in exactly the same way.
+quantifiers are treated in exactly the same way. However, possessive
+quantifiers can make a difference when what follows could also match what is
+quantified, for example in a pattern like this:
+<pre>
+  ^a++\w!
+</pre>
+This pattern matches "aaab!" but not "aaa!", which would be matched by a
+non-possessive quantifier. Similarly, if an atomic group is present, it is
+matched as if it were a standalone pattern at the current point, and the
+longest match is then "locked in" for the rest of the overall pattern.
 </P>
 <P>
 2. When dealing with multiple paths through the tree simultaneously, it is not
@@ -133,22 +145,30 @@ not supported, and cause errors if encountered.
 </P>
 <P>
 4. For the same reason, conditional expressions that use a backreference as the
-condition are not supported.
+condition or test for a specific group recursion are not supported.
 </P>
 <P>
-5. Callouts are supported, but the value of the <i>capture_top</i> field is
+5. Because many paths through the tree may be active, the \K escape sequence,
+which resets the start of the match when encountered (but may be on some paths
+and not on others), is not supported. It causes an error if encountered.
+</P>
+<P>
+6. Callouts are supported, but the value of the <i>capture_top</i> field is
 always 1, and the value of the <i>capture_last</i> field is always -1.
 </P>
 <P>
-6.
-The \C escape sequence, which (in the standard algorithm) matches a single
-byte, even in UTF-8 mode, is not supported because the DFA algorithm moves
-through the subject string one character at a time, for all active paths
+7. The \C escape sequence, which (in the standard algorithm) matches a single
+byte, even in UTF-8 mode, is not supported because the alternative algorithm
+moves through the subject string one character at a time, for all active paths
 through the tree.
 </P>
-<br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE DFA ALGORITHM</a><br>
 <P>
-Using the DFA matching algorithm provides the following advantages:
+8. Except for (*FAIL), the backtracking control verbs such as (*PRUNE) are not
+supported. (*FAIL) is supported, and behaves like a failing negative assertion.
+</P>
+<br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>
+<P>
+Using the alternative matching algorithm provides the following advantages:
 </P>
 <P>
 1. All possible matches (at a single point in the subject) are automatically
@@ -159,17 +179,18 @@ callouts.
 <P>
 2. There is much better support for partial matching. The restrictions on the
 content of the pattern that apply when using the standard algorithm for partial
-matching do not apply to the DFA algorithm. For non-anchored patterns, the
-starting position of a partial match is available.
+matching do not apply to the alternative algorithm. For non-anchored patterns,
+the starting position of a partial match is available.
 </P>
 <P>
-3. Because the DFA algorithm scans the subject string just once, and never
-needs to backtrack, it is possible to pass very long subject strings to the
-matching function in several pieces, checking for partial matching each time.
+3. Because the alternative algorithm scans the subject string just once, and
+never needs to backtrack, it is possible to pass very long subject strings to
+the matching function in several pieces, checking for partial matching each
+time.
 </P>
-<br><a name="SEC6" href="#TOC1">DISADVANTAGES OF THE DFA ALGORITHM</a><br>
+<br><a name="SEC6" href="#TOC1">DISADVANTAGES OF THE ALTERNATIVE ALGORITHM</a><br>
 <P>
-The DFA algorithm suffers from a number of disadvantages:
+The alternative algorithm suffers from a number of disadvantages:
 </P>
 <P>
 1. It is substantially slower than the standard algorithm. This is partly
@@ -180,13 +201,24 @@ less susceptible to optimization.
 2. Capturing parentheses and back references are not supported.
 </P>
 <P>
-3. The "atomic group" feature of PCRE regular expressions is supported, but
-does not provide the advantage that it does for the standard algorithm.
+3. Although atomic groups are supported, their use does not provide the
+performance advantage that it does for the standard algorithm.
 </P>
+<br><a name="SEC7" href="#TOC1">AUTHOR</a><br>
 <P>
-Last updated: 06 June 2006
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC8" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 19 April 2008
+<br>
+Copyright &copy; 1997-2008 University of Cambridge.
 <br>
-Copyright &copy; 1997-2006 University of Cambridge.
 <p>
 Return to the <a href="index.html">PCRE index page</a>.
 </p>