mirror of
https://github.com/signalwire/freeswitch.git
synced 2025-03-27 09:10:51 +00:00
add pcre to in tree libs
git-svn-id: http://svn.freeswitch.org/svn/freeswitch/trunk@3732 d0543943-73ff-0310-b7d9-9358b9ac24b2
This commit is contained in:
parent
f82b80b57c
commit
9da5d7e90f
23
libs/pcre/AUTHORS
Normal file
23
libs/pcre/AUTHORS
Normal file
@ -0,0 +1,23 @@
|
||||
THE MAIN PCRE LIBRARY
|
||||
---------------------
|
||||
|
||||
Written by: Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
|
||||
University of Cambridge Computing Service,
|
||||
Cambridge, England. Phone: +44 1223 334714.
|
||||
|
||||
Copyright (c) 1997-2006 University of Cambridge
|
||||
All rights reserved
|
||||
|
||||
|
||||
THE C++ WRAPPER LIBRARY
|
||||
-----------------------
|
||||
|
||||
Written by: Google Inc.
|
||||
|
||||
Copyright (c) 2006 Google Inc
|
||||
All rights reserved
|
||||
|
||||
####
|
68
libs/pcre/COPYING
Normal file
68
libs/pcre/COPYING
Normal file
@ -0,0 +1,68 @@
|
||||
PCRE LICENCE
|
||||
------------
|
||||
|
||||
PCRE is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Release 6 of PCRE is distributed under the terms of the "BSD" licence, as
|
||||
specified below. The documentation for PCRE, supplied in the "doc"
|
||||
directory, is distributed under the same terms as the software itself.
|
||||
|
||||
The basic library functions are written in C and are freestanding. Also
|
||||
included in the distribution is a set of C++ wrapper functions.
|
||||
|
||||
|
||||
THE BASIC LIBRARY FUNCTIONS
|
||||
---------------------------
|
||||
|
||||
Written by: Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
|
||||
University of Cambridge Computing Service,
|
||||
Cambridge, England. Phone: +44 1223 334714.
|
||||
|
||||
Copyright (c) 1997-2006 University of Cambridge
|
||||
All rights reserved.
|
||||
|
||||
|
||||
THE C++ WRAPPER FUNCTIONS
|
||||
-------------------------
|
||||
|
||||
Contributed by: Google Inc.
|
||||
|
||||
Copyright (c) 2006, Google Inc.
|
||||
All rights reserved.
|
||||
|
||||
|
||||
THE "BSD" LICENCE
|
||||
-----------------
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice,
|
||||
this list of conditions and the following disclaimer.
|
||||
|
||||
* Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
* Neither the name of the University of Cambridge nor the name of Google
|
||||
Inc. nor the names of their contributors may be used to endorse or
|
||||
promote products derived from this software without specific prior
|
||||
written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
End
|
2279
libs/pcre/ChangeLog
Normal file
2279
libs/pcre/ChangeLog
Normal file
File diff suppressed because it is too large
Load Diff
185
libs/pcre/INSTALL
Normal file
185
libs/pcre/INSTALL
Normal file
@ -0,0 +1,185 @@
|
||||
Basic Installation
|
||||
==================
|
||||
|
||||
These are generic installation instructions that apply to systems that
|
||||
can run the `configure' shell script - Unix systems and any that imitate
|
||||
it. They are not specific to PCRE. There are PCRE-specific instructions
|
||||
for non-Unix systems in the file NON-UNIX-USE.
|
||||
|
||||
The `configure' shell script attempts to guess correct values for
|
||||
various system-dependent variables used during compilation. It uses
|
||||
those values to create a `Makefile' in each directory of the package.
|
||||
It may also create one or more `.h' files containing system-dependent
|
||||
definitions. Finally, it creates a shell script `config.status' that
|
||||
you can run in the future to recreate the current configuration, a file
|
||||
`config.cache' that saves the results of its tests to speed up
|
||||
reconfiguring, and a file `config.log' containing compiler output
|
||||
(useful mainly for debugging `configure').
|
||||
|
||||
If you need to do unusual things to compile the package, please try
|
||||
to figure out how `configure' could check whether to do them, and mail
|
||||
diffs or instructions to the address given in the `README' so they can
|
||||
be considered for the next release. If at some point `config.cache'
|
||||
contains results you don't want to keep, you may remove or edit it.
|
||||
|
||||
The file `configure.in' is used to create `configure' by a program
|
||||
called `autoconf'. You only need `configure.in' if you want to change
|
||||
it or regenerate `configure' using a newer version of `autoconf'.
|
||||
|
||||
The simplest way to compile this package is:
|
||||
|
||||
1. `cd' to the directory containing the package's source code and type
|
||||
`./configure' to configure the package for your system. If you're
|
||||
using `csh' on an old version of System V, you might need to type
|
||||
`sh ./configure' instead to prevent `csh' from trying to execute
|
||||
`configure' itself.
|
||||
|
||||
Running `configure' takes awhile. While running, it prints some
|
||||
messages telling which features it is checking for.
|
||||
|
||||
2. Type `make' to compile the package.
|
||||
|
||||
3. Optionally, type `make check' to run any self-tests that come with
|
||||
the package.
|
||||
|
||||
4. Type `make install' to install the programs and any data files and
|
||||
documentation.
|
||||
|
||||
5. You can remove the program binaries and object files from the
|
||||
source code directory by typing `make clean'. To also remove the
|
||||
files that `configure' created (so you can compile the package for
|
||||
a different kind of computer), type `make distclean'. There is
|
||||
also a `make maintainer-clean' target, but that is intended mainly
|
||||
for the package's developers. If you use it, you may have to get
|
||||
all sorts of other programs in order to regenerate files that came
|
||||
with the distribution.
|
||||
|
||||
Compilers and Options
|
||||
=====================
|
||||
|
||||
Some systems require unusual options for compilation or linking that
|
||||
the `configure' script does not know about. You can give `configure'
|
||||
initial values for variables by setting them in the environment. Using
|
||||
a Bourne-compatible shell, you can do that on the command line like
|
||||
this:
|
||||
CC=c89 CFLAGS=-O2 LIBS=-lposix ./configure
|
||||
|
||||
Or on systems that have the `env' program, you can do it like this:
|
||||
env CPPFLAGS=-I/usr/local/include LDFLAGS=-s ./configure
|
||||
|
||||
Compiling For Multiple Architectures
|
||||
====================================
|
||||
|
||||
You can compile the package for more than one kind of computer at the
|
||||
same time, by placing the object files for each architecture in their
|
||||
own directory. To do this, you must use a version of `make' that
|
||||
supports the `VPATH' variable, such as GNU `make'. `cd' to the
|
||||
directory where you want the object files and executables to go and run
|
||||
the `configure' script. `configure' automatically checks for the
|
||||
source code in the directory that `configure' is in and in `..'.
|
||||
|
||||
If you have to use a `make' that does not supports the `VPATH'
|
||||
variable, you have to compile the package for one architecture at a time
|
||||
in the source code directory. After you have installed the package for
|
||||
one architecture, use `make distclean' before reconfiguring for another
|
||||
architecture.
|
||||
|
||||
Installation Names
|
||||
==================
|
||||
|
||||
By default, `make install' will install the package's files in
|
||||
`/usr/local/bin', `/usr/local/man', etc. You can specify an
|
||||
installation prefix other than `/usr/local' by giving `configure' the
|
||||
option `--prefix=PATH'.
|
||||
|
||||
You can specify separate installation prefixes for
|
||||
architecture-specific files and architecture-independent files. If you
|
||||
give `configure' the option `--exec-prefix=PATH', the package will use
|
||||
PATH as the prefix for installing programs and libraries.
|
||||
Documentation and other data files will still use the regular prefix.
|
||||
|
||||
In addition, if you use an unusual directory layout you can give
|
||||
options like `--bindir=PATH' to specify different values for particular
|
||||
kinds of files. Run `configure --help' for a list of the directories
|
||||
you can set and what kinds of files go in them.
|
||||
|
||||
If the package supports it, you can cause programs to be installed
|
||||
with an extra prefix or suffix on their names by giving `configure' the
|
||||
option `--program-prefix=PREFIX' or `--program-suffix=SUFFIX'.
|
||||
|
||||
Optional Features
|
||||
=================
|
||||
|
||||
Some packages pay attention to `--enable-FEATURE' options to
|
||||
`configure', where FEATURE indicates an optional part of the package.
|
||||
They may also pay attention to `--with-PACKAGE' options, where PACKAGE
|
||||
is something like `gnu-as' or `x' (for the X Window System). The
|
||||
`README' should mention any `--enable-' and `--with-' options that the
|
||||
package recognizes.
|
||||
|
||||
For packages that use the X Window System, `configure' can usually
|
||||
find the X include and library files automatically, but if it doesn't,
|
||||
you can use the `configure' options `--x-includes=DIR' and
|
||||
`--x-libraries=DIR' to specify their locations.
|
||||
|
||||
Specifying the System Type
|
||||
==========================
|
||||
|
||||
There may be some features `configure' can not figure out
|
||||
automatically, but needs to determine by the type of host the package
|
||||
will run on. Usually `configure' can figure that out, but if it prints
|
||||
a message saying it can not guess the host type, give it the
|
||||
`--host=TYPE' option. TYPE can either be a short name for the system
|
||||
type, such as `sun4', or a canonical name with three fields:
|
||||
CPU-COMPANY-SYSTEM
|
||||
|
||||
See the file `config.sub' for the possible values of each field. If
|
||||
`config.sub' isn't included in this package, then this package doesn't
|
||||
need to know the host type.
|
||||
|
||||
If you are building compiler tools for cross-compiling, you can also
|
||||
use the `--target=TYPE' option to select the type of system they will
|
||||
produce code for and the `--build=TYPE' option to select the type of
|
||||
system on which you are compiling the package.
|
||||
|
||||
Sharing Defaults
|
||||
================
|
||||
|
||||
If you want to set default values for `configure' scripts to share,
|
||||
you can create a site shell script called `config.site' that gives
|
||||
default values for variables like `CC', `cache_file', and `prefix'.
|
||||
`configure' looks for `PREFIX/share/config.site' if it exists, then
|
||||
`PREFIX/etc/config.site' if it exists. Or, you can set the
|
||||
`CONFIG_SITE' environment variable to the location of the site script.
|
||||
A warning: not all `configure' scripts look for a site script.
|
||||
|
||||
Operation Controls
|
||||
==================
|
||||
|
||||
`configure' recognizes the following options to control how it
|
||||
operates.
|
||||
|
||||
`--cache-file=FILE'
|
||||
Use and save the results of the tests in FILE instead of
|
||||
`./config.cache'. Set FILE to `/dev/null' to disable caching, for
|
||||
debugging `configure'.
|
||||
|
||||
`--help'
|
||||
Print a summary of the options to `configure', and exit.
|
||||
|
||||
`--quiet'
|
||||
`--silent'
|
||||
`-q'
|
||||
Do not print messages saying which checks are being made. To
|
||||
suppress all normal output, redirect it to `/dev/null' (any error
|
||||
messages will still be shown).
|
||||
|
||||
`--srcdir=DIR'
|
||||
Look for the package's source code in directory DIR. Usually
|
||||
`configure' can determine that directory automatically.
|
||||
|
||||
`--version'
|
||||
Print the version of Autoconf used to generate the `configure'
|
||||
script, and exit.
|
||||
|
||||
`configure' also accepts some other, not widely useful, options.
|
68
libs/pcre/LICENCE
Normal file
68
libs/pcre/LICENCE
Normal file
@ -0,0 +1,68 @@
|
||||
PCRE LICENCE
|
||||
------------
|
||||
|
||||
PCRE is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Release 6 of PCRE is distributed under the terms of the "BSD" licence, as
|
||||
specified below. The documentation for PCRE, supplied in the "doc"
|
||||
directory, is distributed under the same terms as the software itself.
|
||||
|
||||
The basic library functions are written in C and are freestanding. Also
|
||||
included in the distribution is a set of C++ wrapper functions.
|
||||
|
||||
|
||||
THE BASIC LIBRARY FUNCTIONS
|
||||
---------------------------
|
||||
|
||||
Written by: Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
|
||||
University of Cambridge Computing Service,
|
||||
Cambridge, England. Phone: +44 1223 334714.
|
||||
|
||||
Copyright (c) 1997-2006 University of Cambridge
|
||||
All rights reserved.
|
||||
|
||||
|
||||
THE C++ WRAPPER FUNCTIONS
|
||||
-------------------------
|
||||
|
||||
Contributed by: Google Inc.
|
||||
|
||||
Copyright (c) 2006, Google Inc.
|
||||
All rights reserved.
|
||||
|
||||
|
||||
THE "BSD" LICENCE
|
||||
-----------------
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice,
|
||||
this list of conditions and the following disclaimer.
|
||||
|
||||
* Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
* Neither the name of the University of Cambridge nor the name of Google
|
||||
Inc. nor the names of their contributors may be used to endorse or
|
||||
promote products derived from this software without specific prior
|
||||
written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
End
|
606
libs/pcre/Makefile.in
Normal file
606
libs/pcre/Makefile.in
Normal file
@ -0,0 +1,606 @@
|
||||
|
||||
# Makefile.in for PCRE (Perl-Compatible Regular Expression) library.
|
||||
|
||||
|
||||
#############################################################################
|
||||
|
||||
# PCRE is developed on a Unix system. I do not use Windows or Macs, and know
|
||||
# nothing about building software on them. Although the code of PCRE should
|
||||
# be very portable, the building system in this Makefile is designed for Unix
|
||||
# systems. However, there are features that have been supplied to me by various
|
||||
# people that should make it work on MinGW and Cygwin systems.
|
||||
|
||||
# This setting enables Unix-style directory scanning in pcregrep, triggered
|
||||
# by the -f option. Maybe one day someone will add code for other systems.
|
||||
|
||||
PCREGREP_OSTYPE=-DIS_UNIX
|
||||
|
||||
#############################################################################
|
||||
|
||||
|
||||
# Libtool places .o files in the .libs directory; this can mean that "make"
|
||||
# thinks is it not up-to-date when in fact it is. This setting helps when
|
||||
# GNU "make" is being used. It presumably does no harm in other cases.
|
||||
|
||||
VPATH=.libs
|
||||
|
||||
|
||||
#---------------------------------------------------------------------------#
|
||||
# The following lines are modified by "configure" to insert data that it is #
|
||||
# given in its arguments, or which it finds out for itself. #
|
||||
#---------------------------------------------------------------------------#
|
||||
|
||||
SHELL = @SHELL@
|
||||
prefix = @prefix@
|
||||
exec_prefix = @exec_prefix@
|
||||
top_srcdir = @top_srcdir@
|
||||
|
||||
mkinstalldirs = $(SHELL) $(top_srcdir)/mkinstalldirs
|
||||
|
||||
# NB: top_builddir is not referred to directly below, but it is used in the
|
||||
# setting of $(LIBTOOL), so don't remove it!
|
||||
|
||||
top_builddir = .
|
||||
|
||||
# BINDIR is the directory in which the pcregrep, pcretest, and pcre-config
|
||||
# commands are installed.
|
||||
# INCDIR is the directory in which the public header files pcre.h and
|
||||
# pcreposix.h are installed.
|
||||
# LIBDIR is the directory in which the libraries are installed.
|
||||
# MANDIR is the directory in which the man pages are installed.
|
||||
|
||||
BINDIR = @bindir@
|
||||
LIBDIR = @libdir@
|
||||
INCDIR = @includedir@
|
||||
MANDIR = @mandir@
|
||||
|
||||
# EXEEXT is set by configure to the extention of an executable file
|
||||
# OBJEXT is set by configure to the extention of an object file
|
||||
# The BUILD_* equivalents are the same but for the host we're building on
|
||||
|
||||
EXEEXT = @EXEEXT@
|
||||
OBJEXT = @OBJEXT@
|
||||
# Note that these are just here to have a convenient place to look at the
|
||||
# outcome.
|
||||
BUILD_EXEEXT = @BUILD_EXEEXT@
|
||||
BUILD_OBJEXT = @BUILD_OBJEXT@
|
||||
|
||||
# POSIX_OBJ and POSIX_LOBJ are either set empty, or to the names of the
|
||||
# POSIX object files.
|
||||
|
||||
POSIX_OBJ = @POSIX_OBJ@
|
||||
POSIX_LOBJ = @POSIX_LOBJ@
|
||||
|
||||
# The compiler, C flags, preprocessor flags, etc
|
||||
|
||||
CC = @CC@
|
||||
CXX = @CXX@
|
||||
CFLAGS = @CFLAGS@
|
||||
CXXFLAGS = @CXXFLAGS@
|
||||
LDFLAGS = @LDFLAGS@
|
||||
CXXLDFLAGS = @CXXLDFLAGS@
|
||||
|
||||
CC_FOR_BUILD = @CC_FOR_BUILD@
|
||||
CFLAGS_FOR_BUILD = @CFLAGS_FOR_BUILD@
|
||||
CXX_FOR_BUILD = @CXX_FOR_BUILD@
|
||||
CXXFLAGS_FOR_BUILD = @CXXFLAGS_FOR_BUILD@
|
||||
LDFLAGS_FOR_BUILD = $(LDFLAGS)
|
||||
|
||||
UCP = @UCP@
|
||||
UTF8 = @UTF8@
|
||||
NEWLINE = @NEWLINE@
|
||||
POSIX_MALLOC_THRESHOLD = @POSIX_MALLOC_THRESHOLD@
|
||||
LINK_SIZE = @LINK_SIZE@
|
||||
MATCH_LIMIT = @MATCH_LIMIT@ @MATCH_LIMIT_RECURSION@
|
||||
NO_RECURSE = @NO_RECURSE@
|
||||
EBCDIC = @EBCDIC@
|
||||
|
||||
INSTALL = @INSTALL@
|
||||
INSTALL_DATA = @INSTALL_DATA@
|
||||
|
||||
# LIBTOOL enables the building of shared and static libraries. It is set up
|
||||
# to do one or the other or both by ./configure.
|
||||
|
||||
LIBTOOL = @LIBTOOL@
|
||||
LTCOMPILE = $(LIBTOOL) --mode=compile $(CC) -c $(CFLAGS) -I. -I$(top_srcdir) $(NEWLINE) $(LINK_SIZE) $(MATCH_LIMIT) $(NO_RECURSE) $(EBCDIC)
|
||||
LTCXXCOMPILE = $(LIBTOOL) --mode=compile $(CXX) -c $(CXXFLAGS) -I. -I$(top_srcdir) $(NEWLINE) $(LINK_SIZE) $(MATCH_LIMIT) $(NO_RECURSE) $(EBCDIC)
|
||||
@ON_WINDOWS@LINK = $(CC) $(LDFLAGS) -I. -I$(top_srcdir) -L.libs
|
||||
@NOT_ON_WINDOWS@LINK = $(LIBTOOL) --mode=link $(CC) $(CFLAGS) $(LDFLAGS) -I. -I$(top_srcdir)
|
||||
LINKLIB = $(LIBTOOL) --mode=link $(CC) -export-symbols-regex '^[^_]' $(LDFLAGS) -I. -I$(top_srcdir)
|
||||
LINK_FOR_BUILD = $(LIBTOOL) --mode=link $(CC_FOR_BUILD) $(CFLAGS_FOR_BUILD) $(LDFLAGS_FOR_BUILD) -I. -I$(top_srcdir)
|
||||
@ON_WINDOWS@CXXLINK = $(CXX) $(LDFLAGS) -I. -I$(top_srcdir) -L.libs
|
||||
@NOT_ON_WINDOWS@CXXLINK = $(LIBTOOL) --mode=link $(CXX) $(CXXFLAGS) $(CXXLDFLAGS) -I. -I$(top_srcdir)
|
||||
CXXLINKLIB = $(LIBTOOL) --mode=link $(CXX) $(LDFLAGS) -I. -I$(top_srcdir)
|
||||
|
||||
# These are the version numbers for the shared libraries
|
||||
|
||||
PCRELIBVERSION = @PCRE_LIB_VERSION@
|
||||
PCREPOSIXLIBVERSION = @PCRE_POSIXLIB_VERSION@
|
||||
PCRECPPLIBVERSION = @PCRE_CPPLIB_VERSION@
|
||||
|
||||
##############################################################################
|
||||
|
||||
|
||||
OBJ = pcre_chartables.@OBJEXT@ \
|
||||
pcre_compile.@OBJEXT@ \
|
||||
pcre_config.@OBJEXT@ \
|
||||
pcre_dfa_exec.@OBJEXT@ \
|
||||
pcre_exec.@OBJEXT@ \
|
||||
pcre_fullinfo.@OBJEXT@ \
|
||||
pcre_get.@OBJEXT@ \
|
||||
pcre_globals.@OBJEXT@ \
|
||||
pcre_info.@OBJEXT@ \
|
||||
pcre_maketables.@OBJEXT@ \
|
||||
pcre_ord2utf8.@OBJEXT@ \
|
||||
pcre_refcount.@OBJEXT@ \
|
||||
pcre_study.@OBJEXT@ \
|
||||
pcre_tables.@OBJEXT@ \
|
||||
pcre_try_flipped.@OBJEXT@ \
|
||||
pcre_ucp_searchfuncs.@OBJEXT@ \
|
||||
pcre_valid_utf8.@OBJEXT@ \
|
||||
pcre_version.@OBJEXT@ \
|
||||
pcre_xclass.@OBJEXT@ \
|
||||
$(POSIX_OBJ)
|
||||
|
||||
LOBJ = pcre_chartables.lo \
|
||||
pcre_compile.lo \
|
||||
pcre_config.lo \
|
||||
pcre_dfa_exec.lo \
|
||||
pcre_exec.lo \
|
||||
pcre_fullinfo.lo \
|
||||
pcre_get.lo \
|
||||
pcre_globals.lo \
|
||||
pcre_info.lo \
|
||||
pcre_maketables.lo \
|
||||
pcre_ord2utf8.lo \
|
||||
pcre_refcount.lo \
|
||||
pcre_study.lo \
|
||||
pcre_tables.lo \
|
||||
pcre_try_flipped.lo \
|
||||
pcre_ucp_searchfuncs.lo \
|
||||
pcre_valid_utf8.lo \
|
||||
pcre_version.lo \
|
||||
pcre_xclass.lo \
|
||||
$(POSIX_LOBJ)
|
||||
|
||||
CPPOBJ = pcrecpp.@OBJEXT@ \
|
||||
pcre_scanner.@OBJEXT@ \
|
||||
pcre_stringpiece.@OBJEXT@
|
||||
|
||||
CPPLOBJ = pcrecpp.lo \
|
||||
pcre_scanner.lo \
|
||||
pcre_stringpiece.lo
|
||||
|
||||
CPP_TARGETS = libpcrecpp.la \
|
||||
pcrecpp_unittest@EXEEXT@ \
|
||||
pcre_scanner_unittest@EXEEXT@ \
|
||||
pcre_stringpiece_unittest@EXEEXT@
|
||||
|
||||
all: libpcre.la @POSIX_LIB@ pcretest@EXEEXT@ pcregrep@EXEEXT@ \
|
||||
@MAYBE_CPP_TARGETS@ @ON_WINDOWS@ winshared
|
||||
|
||||
pcregrep@EXEEXT@: libpcre.la pcregrep.@OBJEXT@ @ON_WINDOWS@ winshared
|
||||
$(LINK) -o pcregrep@EXEEXT@ pcregrep.@OBJEXT@ libpcre.la
|
||||
|
||||
pcretest@EXEEXT@: libpcre.la @POSIX_LIB@ pcretest.@OBJEXT@ \
|
||||
@ON_WINDOWS@ winshared
|
||||
$(LINK) $(PURIFY) $(EFENCE) -o pcretest@EXEEXT@ \
|
||||
pcretest.@OBJEXT@ \
|
||||
libpcre.la @POSIX_LIB@
|
||||
|
||||
pcrecpp_unittest@EXEEXT@: libpcrecpp.la pcrecpp_unittest.@OBJEXT@ \
|
||||
@ON_WINDOWS@ winshared
|
||||
$(CXXLINK) $(PURIFY) $(EFENCE) -o pcrecpp_unittest@EXEEXT@ \
|
||||
pcrecpp_unittest.@OBJEXT@ \
|
||||
libpcrecpp.la @POSIX_LIB@
|
||||
|
||||
pcre_scanner_unittest@EXEEXT@: libpcrecpp.la pcre_scanner_unittest.@OBJEXT@ \
|
||||
@ON_WINDOWS@ winshared
|
||||
$(CXXLINK) $(PURIFY) $(EFENCE) \
|
||||
-o pcre_scanner_unittest@EXEEXT@ \
|
||||
pcre_scanner_unittest.@OBJEXT@ \
|
||||
libpcrecpp.la @POSIX_LIB@
|
||||
|
||||
pcre_stringpiece_unittest@EXEEXT@: libpcrecpp.la \
|
||||
pcre_stringpiece_unittest.@OBJEXT@ @ON_WINDOWS@ winshared
|
||||
$(CXXLINK) $(PURIFY) $(EFENCE) \
|
||||
-o pcre_stringpiece_unittest@EXEEXT@ \
|
||||
pcre_stringpiece_unittest.@OBJEXT@ \
|
||||
libpcrecpp.la @POSIX_LIB@
|
||||
|
||||
libpcre.la: $(OBJ)
|
||||
-rm -f libpcre.la
|
||||
$(LINKLIB) -rpath $(LIBDIR) -version-info \
|
||||
'$(PCRELIBVERSION)' -o libpcre.la $(LOBJ)
|
||||
|
||||
libpcreposix.la: libpcre.la pcreposix.@OBJEXT@
|
||||
-rm -f libpcreposix.la
|
||||
$(LINKLIB) -rpath $(LIBDIR) libpcre.la -version-info \
|
||||
'$(PCREPOSIXLIBVERSION)' -o libpcreposix.la pcreposix.lo
|
||||
|
||||
libpcrecpp.la: libpcre.la $(CPPOBJ)
|
||||
-rm -f libpcrecpp.la
|
||||
$(CXXLINKLIB) -rpath $(LIBDIR) libpcre.la -version-info \
|
||||
'$(PCRECPPLIBVERSION)' -o libpcrecpp.la $(CPPLOBJ)
|
||||
|
||||
# Note that files generated by ./configure and by dftables are in the current
|
||||
# directory, not the source directory.
|
||||
|
||||
pcre_chartables.@OBJEXT@: pcre_chartables.c
|
||||
@$(LTCOMPILE) pcre_chartables.c
|
||||
|
||||
pcre_compile.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_compile.c \
|
||||
$(top_srcdir)/pcre_printint.src
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_compile.c
|
||||
|
||||
pcre_config.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_config.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_config.c
|
||||
|
||||
pcre_dfa_exec.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_dfa_exec.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_dfa_exec.c
|
||||
|
||||
pcre_exec.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_exec.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_exec.c
|
||||
|
||||
pcre_fullinfo.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_fullinfo.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_fullinfo.c
|
||||
|
||||
pcre_get.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_get.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_get.c
|
||||
|
||||
pcre_globals.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_globals.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_globals.c
|
||||
|
||||
pcre_info.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_info.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_info.c
|
||||
|
||||
pcre_maketables.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_maketables.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_maketables.c
|
||||
|
||||
pcre_ord2utf8.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_ord2utf8.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_ord2utf8.c
|
||||
|
||||
pcre_refcount.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_refcount.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_refcount.c
|
||||
|
||||
pcre_study.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_study.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_study.c
|
||||
|
||||
pcre_tables.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_tables.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_tables.c
|
||||
|
||||
pcre_try_flipped.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_try_flipped.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_try_flipped.c
|
||||
|
||||
pcre_ucp_searchfuncs.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h \
|
||||
$(top_srcdir)/pcre_ucp_searchfuncs.c \
|
||||
$(top_srcdir)/ucptable.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_ucp_searchfuncs.c
|
||||
|
||||
pcre_valid_utf8.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_valid_utf8.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_valid_utf8.c
|
||||
|
||||
pcre_version.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_version.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_version.c
|
||||
|
||||
pcre_xclass.@OBJEXT@: Makefile config.h $(top_srcdir)/pcre.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre_xclass.c
|
||||
@$(LTCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_xclass.c
|
||||
|
||||
pcreposix.@OBJEXT@: $(top_srcdir)/pcreposix.c $(top_srcdir)/pcreposix.h \
|
||||
$(top_srcdir)/pcre_internal.h $(top_srcdir)/pcre.h config.h Makefile
|
||||
@$(LTCOMPILE) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcreposix.c
|
||||
|
||||
pcrecpp.@OBJEXT@: $(top_srcdir)/pcrecpp.cc $(top_srcdir)/pcrecpp.h \
|
||||
pcrecpparg.h pcre_stringpiece.h $(top_srcdir)/pcre.h config.h Makefile
|
||||
@$(LTCXXCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcrecpp.cc
|
||||
|
||||
pcre_scanner.@OBJEXT@: $(top_srcdir)/pcre_scanner.cc \
|
||||
$(top_srcdir)/pcre_scanner.h \
|
||||
$(top_srcdir)/pcrecpp.h pcrecpparg.h pcre_stringpiece.h \
|
||||
$(top_srcdir)/pcre.h config.h Makefile
|
||||
@$(LTCXXCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_scanner.cc
|
||||
|
||||
pcre_stringpiece.@OBJEXT@: $(top_srcdir)/pcre_stringpiece.cc \
|
||||
pcre_stringpiece.h \
|
||||
config.h Makefile
|
||||
@$(LTCXXCOMPILE) $(UTF8) $(UCP) $(POSIX_MALLOC_THRESHOLD) \
|
||||
$(top_srcdir)/pcre_stringpiece.cc
|
||||
|
||||
pcretest.@OBJEXT@: $(top_srcdir)/pcretest.c $(top_srcdir)/pcre_internal.h \
|
||||
$(top_srcdir)/pcre_printint.src $(top_srcdir)/pcre.h config.h Makefile
|
||||
$(CC) -c $(CFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
|
||||
$(LINK_SIZE) $(top_srcdir)/pcretest.c
|
||||
|
||||
pcrecpp_unittest.@OBJEXT@: $(top_srcdir)/pcrecpp_unittest.cc \
|
||||
$(top_srcdir)/pcrecpp.h \
|
||||
pcrecpparg.h pcre_stringpiece.h $(top_srcdir)/pcre.h config.h Makefile
|
||||
$(CXX) -c $(CXXFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
|
||||
$(LINK_SIZE) $(top_srcdir)/pcrecpp_unittest.cc
|
||||
|
||||
pcre_stringpiece_unittest.@OBJEXT@: $(top_srcdir)/pcre_stringpiece_unittest.cc \
|
||||
pcre_stringpiece.h pcrecpparg.h config.h Makefile
|
||||
$(CXX) -c $(CXXFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
|
||||
$(LINK_SIZE) $(top_srcdir)/pcre_stringpiece_unittest.cc
|
||||
|
||||
pcre_scanner_unittest.@OBJEXT@: $(top_srcdir)/pcre_scanner_unittest.cc \
|
||||
$(top_srcdir)/pcre_scanner.h \
|
||||
$(top_srcdir)/pcrecpp.h pcre_stringpiece.h \
|
||||
$(top_srcdir)/pcre.h pcrecpparg.h config.h Makefile
|
||||
$(CXX) -c $(CXXFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
|
||||
$(LINK_SIZE) $(top_srcdir)/pcre_scanner_unittest.cc
|
||||
|
||||
pcregrep.@OBJEXT@: $(top_srcdir)/pcregrep.c $(top_srcdir)/pcre.h Makefile config.h
|
||||
$(CC) -c $(CFLAGS) -I. -I$(top_srcdir) $(UTF8) $(UCP) \
|
||||
$(PCREGREP_OSTYPE) $(top_srcdir)/pcregrep.c
|
||||
|
||||
# Some Windows-specific targets for MinGW. Do not use for Cygwin.
|
||||
|
||||
winshared : .libs/@WIN_PREFIX@pcre.dll .libs/@WIN_PREFIX@pcreposix.dll \
|
||||
.libs/@WIN_PREFIX@pcrecpp.dll
|
||||
|
||||
.libs/@WIN_PREFIX@pcre.dll : libpcre.la
|
||||
$(CC) $(CFLAGS) -shared -o $@ \
|
||||
-Wl,--whole-archive .libs/libpcre.a \
|
||||
-Wl,--out-implib,.libs/libpcre.dll.a \
|
||||
-Wl,--output-def,.libs/@WIN_PREFIX@pcre.dll-def \
|
||||
-Wl,--export-all-symbols \
|
||||
-Wl,--no-whole-archive
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcre.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcre.dll.a'#" \
|
||||
< .libs/libpcre.lai > .libs/libpcre.lai.tmp && \
|
||||
mv -f .libs/libpcre.lai.tmp .libs/libpcre.lai
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcre.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcre.dll.a'#" \
|
||||
< libpcre.la > libpcre.la.tmp && \
|
||||
mv -f libpcre.la.tmp libpcre.la
|
||||
|
||||
|
||||
.libs/@WIN_PREFIX@pcreposix.dll: libpcreposix.la libpcre.la
|
||||
$(CC) $(CFLAGS) -shared -o $@ \
|
||||
-Wl,--whole-archive .libs/libpcreposix.a \
|
||||
-Wl,--out-implib,.libs/@WIN_PREFIX@pcreposix.dll.a \
|
||||
-Wl,--output-def,.libs/@WIN_PREFIX@libpcreposix.dll-def \
|
||||
-Wl,--export-all-symbols \
|
||||
-Wl,--no-whole-archive .libs/libpcre.a
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcreposix.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcreposix.dll.a'#"\
|
||||
< .libs/libpcreposix.lai > .libs/libpcreposix.lai.tmp && \
|
||||
mv -f .libs/libpcreposix.lai.tmp .libs/libpcreposix.lai
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcreposix.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcreposix.dll.a'#"\
|
||||
< libpcreposix.la > libpcreposix.la.tmp && \
|
||||
mv -f libpcreposix.la.tmp libpcreposix.la
|
||||
|
||||
.libs/@WIN_PREFIX@pcrecpp.dll: libpcrecpp.la libpcre.la
|
||||
$(CXX) $(CXXFLAGS) -shared -o $@ \
|
||||
-Wl,--whole-archive .libs/libpcrecpp.a \
|
||||
-Wl,--out-implib,.libs/@WIN_PREFIX@pcrecpp.dll.a \
|
||||
-Wl,--output-def,.libs/@WIN_PREFIX@libpcrecpp.dll-def \
|
||||
-Wl,--export-all-symbols \
|
||||
-Wl,--no-whole-archive .libs/libpcre.a
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcrecpp.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcrecpp.dll.a'#"\
|
||||
< .libs/libpcrecpp.lai > .libs/libpcrecpp.lai.tmp && \
|
||||
mv -f .libs/libpcrecpp.lai.tmp .libs/libpcrecpp.lai
|
||||
sed -e "s#dlname=''#dlname='../bin/@WIN_PREFIX@pcrecpp.dll'#" \
|
||||
-e "s#library_names=''#library_names='libpcrecpp.dll.a'#"\
|
||||
< libpcrecpp.la > libpcrecpp.la.tmp && \
|
||||
mv -f libpcrecpp.la.tmp libpcrecpp.la
|
||||
|
||||
|
||||
wininstall : winshared
|
||||
$(mkinstalldirs) $(DESTDIR)$(LIBDIR)
|
||||
$(mkinstalldirs) $(DESTDIR)$(BINDIR)
|
||||
$(INSTALL) .libs/@WIN_PREFIX@pcre.dll $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcre.dll
|
||||
$(INSTALL) .libs/@WIN_PREFIX@pcreposix.dll $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcreposix.dll
|
||||
$(INSTALL) .libs/@WIN_PREFIX@libpcreposix.dll.a $(DESTDIR)$(LIBDIR)/@WIN_PREFIX@libpcreposix.dll.a
|
||||
$(INSTALL) .libs/@WIN_PREFIX@libpcre.dll.a $(DESTDIR)$(LIBDIR)/@WIN_PREFIX@libpcre.dll.a
|
||||
@HAVE_CPP@ $(INSTALL) .libs/@WIN_PREFIX@pcrecpp.dll $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcrecpp.dll
|
||||
@HAVE_CPP@ $(INSTALL) .libs/@WIN_PREFIX@libpcrecpp.dll.a $(DESTDIR)$(LIBDIR)/@WIN_PREFIX@libpcrecpp.dll.a
|
||||
-strip -g $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcre.dll
|
||||
-strip -g $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcreposix.dll
|
||||
@HAVE_CPP@ -strip -g $(DESTDIR)$(BINDIR)/@WIN_PREFIX@pcrecpp.dll
|
||||
-strip $(DESTDIR)$(BINDIR)/pcregrep@EXEEXT@
|
||||
-strip $(DESTDIR)$(BINDIR)/pcretest@EXEEXT@
|
||||
|
||||
# An auxiliary program makes the default character table source. This is put
|
||||
# in the current directory, NOT the $top_srcdir directory.
|
||||
|
||||
pcre_chartables.c: dftables@BUILD_EXEEXT@
|
||||
./dftables@BUILD_EXEEXT@ pcre_chartables.c
|
||||
|
||||
dftables.@BUILD_OBJEXT@: $(top_srcdir)/dftables.c \
|
||||
$(top_srcdir)/pcre_maketables.c $(top_srcdir)/pcre_internal.h \
|
||||
$(top_srcdir)/pcre.h config.h Makefile
|
||||
$(CC_FOR_BUILD) -c $(CFLAGS_FOR_BUILD) -I. $(top_srcdir)/dftables.c
|
||||
|
||||
dftables@BUILD_EXEEXT@: dftables.@BUILD_OBJEXT@
|
||||
$(LINK_FOR_BUILD) -o dftables@BUILD_EXEEXT@ dftables.@OBJEXT@
|
||||
|
||||
install: all @ON_WINDOWS@ wininstall
|
||||
@NOT_ON_WINDOWS@ $(mkinstalldirs) $(DESTDIR)$(LIBDIR)
|
||||
@NOT_ON_WINDOWS@ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcre.la $(DESTDIR)$(LIBDIR)/libpcre.la"
|
||||
@NOT_ON_WINDOWS@ $(LIBTOOL) --mode=install $(INSTALL) libpcre.la $(DESTDIR)$(LIBDIR)/libpcre.la
|
||||
@NOT_ON_WINDOWS@ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcreposix.la $(DESTDIR)$(LIBDIR)/libpcreposix.la"
|
||||
@NOT_ON_WINDOWS@ $(LIBTOOL) --mode=install $(INSTALL) libpcreposix.la $(DESTDIR)$(LIBDIR)/libpcreposix.la
|
||||
@NOT_ON_WINDOWS@@HAVE_CPP@ echo "$(LIBTOOL) --mode=install $(INSTALL) libpcrecpp.la $(DESTDIR)$(LIBDIR)/libpcrecpp.la"
|
||||
@NOT_ON_WINDOWS@@HAVE_CPP@ $(LIBTOOL) --mode=install $(INSTALL) libpcrecpp.la $(DESTDIR)$(LIBDIR)/libpcrecpp.la
|
||||
@NOT_ON_WINDOWS@ $(LIBTOOL) --finish $(DESTDIR)$(LIBDIR)
|
||||
$(mkinstalldirs) $(DESTDIR)$(INCDIR)
|
||||
$(INSTALL_DATA) $(top_srcdir)/pcre.h $(DESTDIR)$(INCDIR)/pcre.h
|
||||
$(INSTALL_DATA) $(top_srcdir)/pcreposix.h $(DESTDIR)$(INCDIR)/pcreposix.h
|
||||
@HAVE_CPP@ $(INSTALL_DATA) $(top_srcdir)/pcrecpp.h $(DESTDIR)$(INCDIR)/pcrecpp.h
|
||||
@HAVE_CPP@ $(INSTALL_DATA) pcrecpparg.h $(DESTDIR)$(INCDIR)/pcrecpparg.h
|
||||
@HAVE_CPP@ $(INSTALL_DATA) pcre_stringpiece.h $(DESTDIR)$(INCDIR)/pcre_stringpiece.h
|
||||
@HAVE_CPP@ $(INSTALL_DATA) $(top_srcdir)/pcre_scanner.h $(DESTDIR)$(INCDIR)/pcre_scanner.h
|
||||
$(mkinstalldirs) $(DESTDIR)$(MANDIR)/man3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre.3 $(DESTDIR)$(MANDIR)/man3/pcre.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcreapi.3 $(DESTDIR)$(MANDIR)/man3/pcreapi.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrebuild.3 $(DESTDIR)$(MANDIR)/man3/pcrebuild.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrecallout.3 $(DESTDIR)$(MANDIR)/man3/pcrecallout.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrecompat.3 $(DESTDIR)$(MANDIR)/man3/pcrecompat.3
|
||||
@HAVE_CPP@ $(INSTALL_DATA) $(top_srcdir)/doc/pcrecpp.3 $(DESTDIR)$(MANDIR)/man3/pcrecpp.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrematching.3 $(DESTDIR)$(MANDIR)/man3/pcrematching.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrepartial.3 $(DESTDIR)$(MANDIR)/man3/pcrepartial.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrepattern.3 $(DESTDIR)$(MANDIR)/man3/pcrepattern.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcreperform.3 $(DESTDIR)$(MANDIR)/man3/pcreperform.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcreposix.3 $(DESTDIR)$(MANDIR)/man3/pcreposix.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcreprecompile.3 $(DESTDIR)$(MANDIR)/man3/pcreprecompile.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcresample.3 $(DESTDIR)$(MANDIR)/man3/pcresample.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcrestack.3 $(DESTDIR)$(MANDIR)/man3/pcrestack.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_compile.3 $(DESTDIR)$(MANDIR)/man3/pcre_compile.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_compile2.3 $(DESTDIR)$(MANDIR)/man3/pcre_compile2.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_config.3 $(DESTDIR)$(MANDIR)/man3/pcre_config.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_copy_named_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_copy_named_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_copy_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_copy_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_dfa_exec.3 $(DESTDIR)$(MANDIR)/man3/pcre_dfa_exec.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_exec.3 $(DESTDIR)$(MANDIR)/man3/pcre_exec.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_free_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_free_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_free_substring_list.3 $(DESTDIR)$(MANDIR)/man3/pcre_free_substring_list.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_fullinfo.3 $(DESTDIR)$(MANDIR)/man3/pcre_fullinfo.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_named_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_named_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_stringnumber.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_stringnumber.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_stringtable_entries.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_stringtable_entries.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_substring.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_substring.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_get_substring_list.3 $(DESTDIR)$(MANDIR)/man3/pcre_get_substring_list.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_info.3 $(DESTDIR)$(MANDIR)/man3/pcre_info.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_maketables.3 $(DESTDIR)$(MANDIR)/man3/pcre_maketables.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_refcount.3 $(DESTDIR)$(MANDIR)/man3/pcre_refcount.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_study.3 $(DESTDIR)$(MANDIR)/man3/pcre_study.3
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcre_version.3 $(DESTDIR)$(MANDIR)/man3/pcre_version.3
|
||||
$(mkinstalldirs) $(DESTDIR)$(MANDIR)/man1
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcregrep.1 $(DESTDIR)$(MANDIR)/man1/pcregrep.1
|
||||
$(INSTALL_DATA) $(top_srcdir)/doc/pcretest.1 $(DESTDIR)$(MANDIR)/man1/pcretest.1
|
||||
$(mkinstalldirs) $(DESTDIR)$(BINDIR)
|
||||
$(LIBTOOL) --mode=install $(INSTALL) pcregrep@EXEEXT@ $(DESTDIR)$(BINDIR)/pcregrep@EXEEXT@
|
||||
$(LIBTOOL) --mode=install $(INSTALL) pcretest@EXEEXT@ $(DESTDIR)$(BINDIR)/pcretest@EXEEXT@
|
||||
$(INSTALL) pcre-config $(DESTDIR)$(BINDIR)/pcre-config
|
||||
$(mkinstalldirs) $(DESTDIR)$(LIBDIR)/pkgconfig
|
||||
$(INSTALL_DATA) libpcre.pc $(DESTDIR)$(LIBDIR)/pkgconfig/libpcre.pc
|
||||
|
||||
# The uninstall target removes all the files that were installed.
|
||||
|
||||
uninstall:; -rm -rf \
|
||||
$(DESTDIR)$(LIBDIR)/libpcre.* \
|
||||
$(DESTDIR)$(LIBDIR)/libpcreposix.* \
|
||||
$(DESTDIR)$(LIBDIR)/libpcrecpp.* \
|
||||
$(DESTDIR)$(INCDIR)/pcre.h \
|
||||
$(DESTDIR)$(INCDIR)/pcreposix.h \
|
||||
$(DESTDIR)$(INCDIR)/pcrecpp.h \
|
||||
$(DESTDIR)$(INCDIR)/pcrecpparg.h \
|
||||
$(DESTDIR)$(INCDIR)/pcre_scanner.h \
|
||||
$(DESTDIR)$(INCDIR)/pcre_stringpiece.h \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcreapi.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrebuild.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrecallout.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrecompat.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrecpp.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrematching.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrepartial.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrepattern.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcreperform.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcreposix.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcreprecompile.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcresample.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcrestack.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_compile.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_compile2.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_config.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_copy_named_substring.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_copy_substring.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_dfa_exec.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_exec.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_free_substring.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_free_substring_list.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_fullinfo.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_get_named_substring.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_get_stringnumber.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_get_stringtable_entries.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_get_substring.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_get_substring_list.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_info.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_maketables.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_refcount.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_study.3 \
|
||||
$(DESTDIR)$(MANDIR)/man3/pcre_version.3 \
|
||||
$(DESTDIR)$(MANDIR)/man1/pcregrep.1 \
|
||||
$(DESTDIR)$(MANDIR)/man1/pcretest.1 \
|
||||
$(DESTDIR)$(BINDIR)/pcregrep@EXEEXT@ \
|
||||
$(DESTDIR)$(BINDIR)/pcretest@EXEEXT@ \
|
||||
$(DESTDIR)$(BINDIR)/pcre-config \
|
||||
$(DESTDIR)$(LIBDIR)/pkgconfig/libpcre.pc
|
||||
|
||||
# We deliberately omit dftables and pcre_chartables.c from 'make clean'; once
|
||||
# made pcre_chartables.c shouldn't change, and if people have edited the tables
|
||||
# by hand, you don't want to throw them away.
|
||||
|
||||
clean:; -rm -rf *.@OBJEXT@ *.lo *.a *.la .libs pcretest@EXEEXT@ pcre_stringpiece_unittest@EXEEXT@ pcrecpp_unittest@EXEEXT@ pcre_scanner_unittest@EXEEXT@ pcregrep@EXEEXT@ testtry
|
||||
|
||||
# But "make distclean" should get back to a virgin distribution
|
||||
|
||||
distclean: clean
|
||||
-rm -f pcre_chartables.c libtool pcre-config libpcre.pc \
|
||||
pcre_stringpiece.h pcrecpparg.h \
|
||||
dftables@EXEEXT@ RunGrepTest RunTest \
|
||||
Makefile config.h config.status config.log config.cache
|
||||
|
||||
check: runtest
|
||||
|
||||
@WIN_PREFIX@pcre.dll : winshared
|
||||
cp .libs/@WIN_PREFIX@pcre.dll .
|
||||
|
||||
test: runtest
|
||||
|
||||
runtest: all @ON_WINDOWS@ @WIN_PREFIX@pcre.dll
|
||||
@./RunTest
|
||||
@./RunGrepTest
|
||||
@HAVE_CPP@ @echo ""
|
||||
@HAVE_CPP@ @echo "Testing C++ wrapper"
|
||||
@HAVE_CPP@ @echo ""; echo "Test 1++: stringpiece"
|
||||
@HAVE_CPP@ @./pcre_stringpiece_unittest@EXEEXT@
|
||||
@HAVE_CPP@ @echo ""; echo "Test 2++: RE class"
|
||||
@HAVE_CPP@ @./pcrecpp_unittest@EXEEXT@
|
||||
@HAVE_CPP@ @echo ""; echo "Test 3++: Scanner class"
|
||||
@HAVE_CPP@ @./pcre_scanner_unittest@EXEEXT@
|
||||
|
||||
# End
|
266
libs/pcre/NEWS
Normal file
266
libs/pcre/NEWS
Normal file
@ -0,0 +1,266 @@
|
||||
News about PCRE releases
|
||||
------------------------
|
||||
|
||||
Release 6.7 04-Jul-06
|
||||
---------------------
|
||||
|
||||
The main additions to this release are the ability to use the same name for
|
||||
multiple sets of parentheses, and support for CRLF line endings in both the
|
||||
library and pcregrep (and in pcretest for testing).
|
||||
|
||||
Thanks to Ian Taylor, the stack usage for many kinds of pattern has been
|
||||
significantly reduced for certain subject strings.
|
||||
|
||||
|
||||
Release 6.5 01-Feb-06
|
||||
---------------------
|
||||
|
||||
Important changes in this release:
|
||||
|
||||
1. A number of new features have been added to pcregrep.
|
||||
|
||||
2. The Unicode property tables have been updated to Unicode 4.1.0, and the
|
||||
supported properties have been extended with script names such as "Arabic",
|
||||
and the derived properties "Any" and "L&". This has necessitated a change to
|
||||
the interal format of compiled patterns. Any saved compiled patterns that
|
||||
use \p or \P must be recompiled.
|
||||
|
||||
3. The specification of recursion in patterns has been changed so that all
|
||||
recursive subpatterns are automatically treated as atomic groups. Thus, for
|
||||
example, (?R) is treated as if it were (?>(?R)). This is necessary because
|
||||
otherwise there are situations where recursion does not work.
|
||||
|
||||
See the ChangeLog for a complete list of changes, which include a number of bug
|
||||
fixes and tidies.
|
||||
|
||||
|
||||
Release 6.0 07-Jun-05
|
||||
---------------------
|
||||
|
||||
The release number has been increased to 6.0 because of the addition of several
|
||||
major new pieces of functionality.
|
||||
|
||||
A new function, pcre_dfa_exec(), which implements pattern matching using a DFA
|
||||
algorithm, has been added. This has a number of advantages for certain cases,
|
||||
though it does run more slowly, and lacks the ability to capture substrings. On
|
||||
the other hand, it does find all matches, not just the first, and it works
|
||||
better for partial matching. The pcrematching man page discusses the
|
||||
differences.
|
||||
|
||||
The pcretest program has been enhanced so that it can make use of the new
|
||||
pcre_dfa_exec() matching function and the extra features it provides.
|
||||
|
||||
The distribution now includes a C++ wrapper library. This is built
|
||||
automatically if a C++ compiler is found. The pcrecpp man page discusses this
|
||||
interface.
|
||||
|
||||
The code itself has been re-organized into many more files, one for each
|
||||
function, so it no longer requires everything to be linked in when static
|
||||
linkage is used. As a consequence, some internal functions have had to have
|
||||
their names exposed. These functions all have names starting with _pcre_. They
|
||||
are undocumented, and are not intended for use by outside callers.
|
||||
|
||||
The pcregrep program has been enhanced with new functionality such as
|
||||
multiline-matching and options for output more matching context. See the
|
||||
ChangeLog for a complete list of changes to the library and the utility
|
||||
programs.
|
||||
|
||||
|
||||
Release 5.0 13-Sep-04
|
||||
---------------------
|
||||
|
||||
The licence under which PCRE is released has been changed to the more
|
||||
conventional "BSD" licence.
|
||||
|
||||
In the code, some bugs have been fixed, and there are also some major changes
|
||||
in this release (which is why I've increased the number to 5.0). Some changes
|
||||
are internal rearrangements, and some provide a number of new facilities. The
|
||||
new features are:
|
||||
|
||||
1. There's an "automatic callout" feature that inserts callouts before every
|
||||
item in the regex, and there's a new callout field that gives the position
|
||||
in the pattern - useful for debugging and tracing.
|
||||
|
||||
2. The extra_data structure can now be used to pass in a set of character
|
||||
tables at exec time. This is useful if compiled regex are saved and re-used
|
||||
at a later time when the tables may not be at the same address. If the
|
||||
default internal tables are used, the pointer saved with the compiled
|
||||
pattern is now set to NULL, which means that you don't need to do anything
|
||||
special unless you are using custom tables.
|
||||
|
||||
3. It is possible, with some restrictions on the content of the regex, to
|
||||
request "partial" matching. A special return code is given if all of the
|
||||
subject string matched part of the regex. This could be useful for testing
|
||||
an input field as it is being typed.
|
||||
|
||||
4. There is now some optional support for Unicode character properties, which
|
||||
means that the patterns items such as \p{Lu} and \X can now be used. Only
|
||||
the general category properties are supported. If PCRE is compiled with this
|
||||
support, an additional 90K data structure is include, which increases the
|
||||
size of the library dramatically.
|
||||
|
||||
5. There is support for saving compiled patterns and re-using them later.
|
||||
|
||||
6. There is support for running regular expressions that were compiled on a
|
||||
different host with the opposite endianness.
|
||||
|
||||
7. The pcretest program has been extended to accommodate the new features.
|
||||
|
||||
The main internal rearrangement is that sequences of literal characters are no
|
||||
longer handled as strings. Instead, each character is handled on its own. This
|
||||
makes some UTF-8 handling easier, and makes the support of partial matching
|
||||
possible. Compiled patterns containing long literal strings will be larger as a
|
||||
result of this change; I hope that performance will not be much affected.
|
||||
|
||||
|
||||
Release 4.5 01-Dec-03
|
||||
---------------------
|
||||
|
||||
Again mainly a bug-fix and tidying release, with only a couple of new features:
|
||||
|
||||
1. It's possible now to compile PCRE so that it does not use recursive
|
||||
function calls when matching. Instead it gets memory from the heap. This slows
|
||||
things down, but may be necessary on systems with limited stacks.
|
||||
|
||||
2. UTF-8 string checking has been tightened to reject overlong sequences and to
|
||||
check that a starting offset points to the start of a character. Failure of the
|
||||
latter returns a new error code: PCRE_ERROR_BADUTF8_OFFSET.
|
||||
|
||||
3. PCRE can now be compiled for systems that use EBCDIC code.
|
||||
|
||||
|
||||
Release 4.4 21-Aug-03
|
||||
---------------------
|
||||
|
||||
This is mainly a bug-fix and tidying release. The only new feature is that PCRE
|
||||
checks UTF-8 strings for validity by default. There is an option to suppress
|
||||
this, just in case anybody wants that teeny extra bit of performance.
|
||||
|
||||
|
||||
Releases 4.1 - 4.3
|
||||
------------------
|
||||
|
||||
Sorry, I forgot about updating the NEWS file for these releases. Please take a
|
||||
look at ChangeLog.
|
||||
|
||||
|
||||
Release 4.0 17-Feb-03
|
||||
---------------------
|
||||
|
||||
There have been a lot of changes for the 4.0 release, adding additional
|
||||
functionality and mending bugs. Below is a list of the highlights of the new
|
||||
functionality. For full details of these features, please consult the
|
||||
documentation. For a complete list of changes, see the ChangeLog file.
|
||||
|
||||
1. Support for Perl's \Q...\E escapes.
|
||||
|
||||
2. "Possessive quantifiers" ?+, *+, ++, and {,}+ which come from Sun's Java
|
||||
package. They provide some syntactic sugar for simple cases of "atomic
|
||||
grouping".
|
||||
|
||||
3. Support for the \G assertion. It is true when the current matching position
|
||||
is at the start point of the match.
|
||||
|
||||
4. A new feature that provides some of the functionality that Perl provides
|
||||
with (?{...}). The facility is termed a "callout". The way it is done in PCRE
|
||||
is for the caller to provide an optional function, by setting pcre_callout to
|
||||
its entry point. To get the function called, the regex must include (?C) at
|
||||
appropriate points.
|
||||
|
||||
5. Support for recursive calls to individual subpatterns. This makes it really
|
||||
easy to get totally confused.
|
||||
|
||||
6. Support for named subpatterns. The Python syntax (?P<name>...) is used to
|
||||
name a group.
|
||||
|
||||
7. Several extensions to UTF-8 support; it is now fairly complete. There is an
|
||||
option for pcregrep to make it operate in UTF-8 mode.
|
||||
|
||||
8. The single man page has been split into a number of separate man pages.
|
||||
These also give rise to individual HTML pages which are put in a separate
|
||||
directory. There is an index.html page that lists them all. Some hyperlinking
|
||||
between the pages has been installed.
|
||||
|
||||
|
||||
Release 3.5 15-Aug-01
|
||||
---------------------
|
||||
|
||||
1. The configuring system has been upgraded to use later versions of autoconf
|
||||
and libtool. By default it builds both a shared and a static library if the OS
|
||||
supports it. You can use --disable-shared or --disable-static on the configure
|
||||
command if you want only one of them.
|
||||
|
||||
2. The pcretest utility is now installed along with pcregrep because it is
|
||||
useful for users (to test regexs) and by doing this, it automatically gets
|
||||
relinked by libtool. The documentation has been turned into a man page, so
|
||||
there are now .1, .txt, and .html versions in /doc.
|
||||
|
||||
3. Upgrades to pcregrep:
|
||||
(i) Added long-form option names like gnu grep.
|
||||
(ii) Added --help to list all options with an explanatory phrase.
|
||||
(iii) Added -r, --recursive to recurse into sub-directories.
|
||||
(iv) Added -f, --file to read patterns from a file.
|
||||
|
||||
4. Added --enable-newline-is-cr and --enable-newline-is-lf to the configure
|
||||
script, to force use of CR or LF instead of \n in the source. On non-Unix
|
||||
systems, the value can be set in config.h.
|
||||
|
||||
5. The limit of 200 on non-capturing parentheses is a _nesting_ limit, not an
|
||||
absolute limit. Changed the text of the error message to make this clear, and
|
||||
likewise updated the man page.
|
||||
|
||||
6. The limit of 99 on the number of capturing subpatterns has been removed.
|
||||
The new limit is 65535, which I hope will not be a "real" limit.
|
||||
|
||||
|
||||
Release 3.3 01-Aug-00
|
||||
---------------------
|
||||
|
||||
There is some support for UTF-8 character strings. This is incomplete and
|
||||
experimental. The documentation describes what is and what is not implemented.
|
||||
Otherwise, this is just a bug-fixing release.
|
||||
|
||||
|
||||
Release 3.0 01-Feb-00
|
||||
---------------------
|
||||
|
||||
1. A "configure" script is now used to configure PCRE for Unix systems. It
|
||||
builds a Makefile, a config.h file, and the pcre-config script.
|
||||
|
||||
2. PCRE is built as a shared library by default.
|
||||
|
||||
3. There is support for POSIX classes such as [:alpha:].
|
||||
|
||||
5. There is an experimental recursion feature.
|
||||
|
||||
----------------------------------------------------------------------------
|
||||
IMPORTANT FOR THOSE UPGRADING FROM VERSIONS BEFORE 2.00
|
||||
|
||||
Please note that there has been a change in the API such that a larger
|
||||
ovector is required at matching time, to provide some additional workspace.
|
||||
The new man page has details. This change was necessary in order to support
|
||||
some of the new functionality in Perl 5.005.
|
||||
|
||||
IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.00
|
||||
|
||||
Another (I hope this is the last!) change has been made to the API for the
|
||||
pcre_compile() function. An additional argument has been added to make it
|
||||
possible to pass over a pointer to character tables built in the current
|
||||
locale by pcre_maketables(). To use the default tables, this new arguement
|
||||
should be passed as NULL.
|
||||
|
||||
IMPORTANT FOR THOSE UPGRADING FROM VERSION 2.05
|
||||
|
||||
Yet another (and again I hope this really is the last) change has been made
|
||||
to the API for the pcre_exec() function. An additional argument has been
|
||||
added to make it possible to start the match other than at the start of the
|
||||
subject string. This is important if there are lookbehinds. The new man
|
||||
page has the details, but you just want to convert existing programs, all
|
||||
you need to do is to stick in a new fifth argument to pcre_exec(), with a
|
||||
value of zero. For example, change
|
||||
|
||||
pcre_exec(pattern, extra, subject, length, options, ovec, ovecsize)
|
||||
to
|
||||
pcre_exec(pattern, extra, subject, length, 0, options, ovec, ovecsize)
|
||||
|
||||
****
|
269
libs/pcre/NON-UNIX-USE
Normal file
269
libs/pcre/NON-UNIX-USE
Normal file
@ -0,0 +1,269 @@
|
||||
Compiling PCRE on non-Unix systems
|
||||
----------------------------------
|
||||
|
||||
See below for comments on Cygwin or MinGW and OpenVMS usage. I (Philip Hazel)
|
||||
have no knowledge of Windows or VMS sytems and how their libraries work. The
|
||||
items in the PCRE Makefile that relate to anything other than Unix-like systems
|
||||
have been contributed by PCRE users. There are some other comments and files in
|
||||
the Contrib directory on the ftp site that you may find useful. See
|
||||
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
|
||||
|
||||
If you want to compile PCRE for a non-Unix system (or perhaps, more strictly,
|
||||
for a system that does not support "configure" and "make" files), note that
|
||||
the basic PCRE library consists entirely of code written in Standard C, and so
|
||||
should compile successfully on any system that has a Standard C compiler and
|
||||
library. The C++ wrapper functions are a separate issue (see below).
|
||||
|
||||
|
||||
GENERIC INSTRUCTIONS FOR THE C LIBRARY
|
||||
|
||||
The following are generic comments about building PCRE. The interspersed
|
||||
indented commands are suggestions from Mark Tetrode as to which commands you
|
||||
might use on a Windows system to build a static library.
|
||||
|
||||
(1) Copy or rename the file config.in as config.h, and change the macros that
|
||||
define HAVE_STRERROR and HAVE_MEMMOVE to define them as 1 rather than 0.
|
||||
Unfortunately, because of the way Unix autoconf works, the default setting has
|
||||
to be 0. You may also want to make changes to other macros in config.h. In
|
||||
particular, if you want to force a specific value for newline, you can define
|
||||
the NEWLINE macro. The default is to use '\n', thereby using whatever value
|
||||
your compiler gives to '\n'.
|
||||
|
||||
rem Mark Tetrode's commands
|
||||
copy config.in config.h
|
||||
rem Use write, because notepad cannot handle UNIX files. Change values.
|
||||
write config.h
|
||||
|
||||
(2) Compile dftables.c as a stand-alone program, and then run it with
|
||||
the single argument "pcre_chartables.c". This generates a set of standard
|
||||
character tables and writes them to that file.
|
||||
|
||||
rem Mark Tetrode's commands
|
||||
rem Compile & run
|
||||
cl -DSUPPORT_UTF8 -DSUPPORT_UCP dftables.c
|
||||
dftables.exe pcre_chartables.c
|
||||
|
||||
(3) Compile the following source files:
|
||||
|
||||
pcre_chartables.c
|
||||
pcre_compile.c
|
||||
pcre_config.c
|
||||
pcre_dfa_exec.c
|
||||
pcre_exec.c
|
||||
pcre_fullinfo.c
|
||||
pcre_get.c
|
||||
pcre_globals.c
|
||||
pcre_info.c
|
||||
pcre_maketables.c
|
||||
pcre_ord2utf8.c
|
||||
pcre_refcount.c
|
||||
pcre_study.c
|
||||
pcre_tables.c
|
||||
pcre_try_flipped.c
|
||||
pcre_ucp_searchfuncs.c
|
||||
pcre_valid_utf8.c
|
||||
pcre_version.c
|
||||
pcre_xclass.c
|
||||
|
||||
and link them all together into an object library in whichever form your system
|
||||
keeps such libraries. This is the pcre C library. If your system has static and
|
||||
shared libraries, you may have to do this once for each type.
|
||||
|
||||
rem These comments are out-of-date, referring to a previous release which
|
||||
rem had fewer source files. Replace with the file names from above.
|
||||
rem Mark Tetrode's commands, for a static library
|
||||
rem Compile & lib
|
||||
cl -DSUPPORT_UTF8 -DSUPPORT_UCP -DPOSIX_MALLOC_THRESHOLD=10 /c maketables.c get.c study.c pcre.c
|
||||
lib /OUT:pcre.lib maketables.obj get.obj study.obj pcre.obj
|
||||
|
||||
(4) Similarly, compile pcreposix.c and link it (on its own) as the pcreposix
|
||||
library.
|
||||
|
||||
rem Mark Tetrode's commands, for a static library
|
||||
rem Compile & lib
|
||||
cl -DSUPPORT_UTF8 -DSUPPORT_UCP -DPOSIX_MALLOC_THRESHOLD=10 /c pcreposix.c
|
||||
lib /OUT:pcreposix.lib pcreposix.obj
|
||||
|
||||
(5) Compile the test program pcretest.c. This needs the functions in the
|
||||
pcre and pcreposix libraries when linking.
|
||||
|
||||
rem Mark Tetrode's commands
|
||||
rem compile & link
|
||||
cl /F0x400000 pcretest.c pcre.lib pcreposix.lib
|
||||
|
||||
(6) Run pcretest on the testinput files in the testdata directory, and check
|
||||
that the output matches the corresponding testoutput files. You must use the
|
||||
-i option when checking testinput2. Note that the supplied files are in Unix
|
||||
format, with just LF characters as line terminators. You may need to edit them
|
||||
to change this if your system uses a different convention.
|
||||
|
||||
rem Mark Tetrode's commands
|
||||
pcretest testdata\testinput1 testdata\myoutput1
|
||||
windiff testdata\testoutput1 testdata\myoutput1
|
||||
pcretest -i testdata\testinput2 testdata\myoutput2
|
||||
windiff testdata\testoutput2 testdata\myoutput2
|
||||
pcretest testdata\testinput3 testdata\myoutput3
|
||||
windiff testdata\testoutput3 testdata\myoutput3
|
||||
pcretest testdata\testinput4 testdata\myoutput4
|
||||
windiff testdata\testoutput4 testdata\myoutput4
|
||||
pcretest testdata\testinput5 testdata\myoutput5
|
||||
windiff testdata\testoutput5 testdata\myoutput5
|
||||
pcretest testdata\testinput6 testdata\myoutput6
|
||||
windiff testdata\testoutput6 testdata\myoutput6
|
||||
|
||||
Note that there are now three more tests (7, 8, 9) that did not exist when Mark
|
||||
wrote those comments. The test the new pcre_dfa_exec() function.
|
||||
|
||||
(7) If you want to use the pcregrep command, compile and link pcregrep.c; it
|
||||
uses only the basic PCRE library.
|
||||
|
||||
|
||||
THE C++ WRAPPER FUNCTIONS
|
||||
|
||||
The PCRE distribution now contains some C++ wrapper functions and tests,
|
||||
contributed by Google Inc. On a system that can use "configure" and "make",
|
||||
the functions are automatically built into a library called pcrecpp. It should
|
||||
be straightforward to compile the .cc files manually on other systems. The
|
||||
files called xxx_unittest.cc are test programs for each of the corresponding
|
||||
xxx.cc files.
|
||||
|
||||
|
||||
FURTHER REMARKS
|
||||
|
||||
If you have a system without "configure" but where you can use a Makefile, edit
|
||||
Makefile.in to create Makefile, substituting suitable values for the variables
|
||||
at the head of the file.
|
||||
|
||||
Some help in building a Win32 DLL of PCRE in GnuWin32 environments was
|
||||
contributed by Paul Sokolovsky. These environments are Mingw32
|
||||
(http://www.xraylith.wisc.edu/~khan/software/gnu-win32/) and CygWin
|
||||
(http://sourceware.cygnus.com/cygwin/). Paul comments:
|
||||
|
||||
For CygWin, set CFLAGS=-mno-cygwin, and do 'make dll'. You'll get
|
||||
pcre.dll (containing pcreposix also), libpcre.dll.a, and dynamically
|
||||
linked pgrep and pcretest. If you have /bin/sh, run RunTest (three
|
||||
main test go ok, locale not supported).
|
||||
|
||||
Changes to do MinGW with autoconf 2.50 were supplied by Fred Cox
|
||||
<sailorFred@yahoo.com>, who comments as follows:
|
||||
|
||||
If you are using the PCRE DLL, the normal Unix style configure && make &&
|
||||
make check && make install should just work[*]. If you want to statically
|
||||
link against the .a file, you must define PCRE_STATIC before including
|
||||
pcre.h, otherwise the pcre_malloc and pcre_free exported functions will be
|
||||
declared __declspec(dllimport), with hilarious results. See the configure.in
|
||||
and pcretest.c for how it is done for the static test.
|
||||
|
||||
Also, there will only be a libpcre.la, not a libpcreposix.la, as you
|
||||
would expect from the Unix version. The single DLL includes the pcreposix
|
||||
interface.
|
||||
|
||||
[*] But note that the supplied test files are in Unix format, with just LF
|
||||
characters as line terminators. You will have to edit them to change to CR LF
|
||||
terminators.
|
||||
|
||||
A script for building PCRE using Borland's C++ compiler for use with VPASCAL
|
||||
was contributed by Alexander Tokarev. It is called makevp.bat.
|
||||
|
||||
These are some further comments about Win32 builds from Mark Evans. They
|
||||
were contributed before Fred Cox's changes were made, so it is possible that
|
||||
they may no longer be relevant.
|
||||
|
||||
"The documentation for Win32 builds is a bit shy. Under MSVC6 I
|
||||
followed their instructions to the letter, but there were still
|
||||
some things missing.
|
||||
|
||||
(1) Must #define STATIC for entire project if linking statically.
|
||||
(I see no reason to use DLLs for code this compact.) This of
|
||||
course is a project setting in MSVC under Preprocessor.
|
||||
|
||||
(2) Missing some #ifdefs relating to the function pointers
|
||||
pcre_malloc and pcre_free. See my solution below. (The stubs
|
||||
may not be mandatory but they made me feel better.)"
|
||||
|
||||
=========================
|
||||
#ifdef _WIN32
|
||||
#include <malloc.h>
|
||||
|
||||
void* malloc_stub(size_t N)
|
||||
{ return malloc(N); }
|
||||
void free_stub(void* p)
|
||||
{ free(p); }
|
||||
void *(*pcre_malloc)(size_t) = &malloc_stub;
|
||||
void (*pcre_free)(void *) = &free_stub;
|
||||
|
||||
#else
|
||||
|
||||
void *(*pcre_malloc)(size_t) = malloc;
|
||||
void (*pcre_free)(void *) = free;
|
||||
|
||||
#endif
|
||||
=========================
|
||||
|
||||
|
||||
BUILDING PCRE ON OPENVMS
|
||||
|
||||
Dan Mooney sent the following comments about building PCRE on OpenVMS. They
|
||||
relate to an older version of PCRE that used fewer source files, so the exact
|
||||
commands will need changing. See the current list of source files above.
|
||||
|
||||
"It was quite easy to compile and link the library. I don't have a formal
|
||||
make file but the attached file [reproduced below] contains the OpenVMS DCL
|
||||
commands I used to build the library. I had to add #define
|
||||
POSIX_MALLOC_THRESHOLD 10 to pcre.h since it was not defined anywhere.
|
||||
|
||||
The library was built on:
|
||||
O/S: HP OpenVMS v7.3-1
|
||||
Compiler: Compaq C v6.5-001-48BCD
|
||||
Linker: vA13-01
|
||||
|
||||
The test results did not match 100% due to the issues you mention in your
|
||||
documentation regarding isprint(), iscntrl(), isgraph() and ispunct(). I
|
||||
modified some of the character tables temporarily and was able to get the
|
||||
results to match. Tests using the fr locale did not match since I don't have
|
||||
that locale loaded. The study size was always reported to be 3 less than the
|
||||
value in the standard test output files."
|
||||
|
||||
=========================
|
||||
$! This DCL procedure builds PCRE on OpenVMS
|
||||
$!
|
||||
$! I followed the instructions in the non-unix-use file in the distribution.
|
||||
$!
|
||||
$ COMPILE == "CC/LIST/NOMEMBER_ALIGNMENT/PREFIX_LIBRARY_ENTRIES=ALL_ENTRIES
|
||||
$ COMPILE DFTABLES.C
|
||||
$ LINK/EXE=DFTABLES.EXE DFTABLES.OBJ
|
||||
$ RUN DFTABLES.EXE/OUTPUT=CHARTABLES.C
|
||||
$ COMPILE MAKETABLES.C
|
||||
$ COMPILE GET.C
|
||||
$ COMPILE STUDY.C
|
||||
$! I had to set POSIX_MALLOC_THRESHOLD to 10 in PCRE.H since the symbol
|
||||
$! did not seem to be defined anywhere.
|
||||
$! I edited pcre.h and added #DEFINE SUPPORT_UTF8 to enable UTF8 support.
|
||||
$ COMPILE PCRE.C
|
||||
$ LIB/CREATE PCRE MAKETABLES.OBJ, GET.OBJ, STUDY.OBJ, PCRE.OBJ
|
||||
$! I had to set POSIX_MALLOC_THRESHOLD to 10 in PCRE.H since the symbol
|
||||
$! did not seem to be defined anywhere.
|
||||
$ COMPILE PCREPOSIX.C
|
||||
$ LIB/CREATE PCREPOSIX PCREPOSIX.OBJ
|
||||
$ COMPILE PCRETEST.C
|
||||
$ LINK/EXE=PCRETEST.EXE PCRETEST.OBJ, PCRE/LIB, PCREPOSIX/LIB
|
||||
$! C programs that want access to command line arguments must be
|
||||
$! defined as a symbol
|
||||
$ PCRETEST :== "$ SYS$ROADSUSERS:[DMOONEY.REGEXP]PCRETEST.EXE"
|
||||
$! Arguments must be enclosed in quotes.
|
||||
$ PCRETEST "-C"
|
||||
$! Test results:
|
||||
$!
|
||||
$! The test results did not match 100%. The functions isprint(), iscntrl(),
|
||||
$! isgraph() and ispunct() on OpenVMS must not produce the same results
|
||||
$! as the system that built the test output files provided with the
|
||||
$! distribution.
|
||||
$!
|
||||
$! The study size did not match and was always 3 less on OpenVMS.
|
||||
$!
|
||||
$! Locale could not be set to fr
|
||||
$!
|
||||
=========================
|
||||
|
||||
****
|
528
libs/pcre/README
Normal file
528
libs/pcre/README
Normal file
@ -0,0 +1,528 @@
|
||||
README file for PCRE (Perl-compatible regular expression library)
|
||||
-----------------------------------------------------------------
|
||||
|
||||
The latest release of PCRE is always available from
|
||||
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre-xxx.tar.gz
|
||||
|
||||
Please read the NEWS file if you are upgrading from a previous release.
|
||||
|
||||
|
||||
The PCRE APIs
|
||||
-------------
|
||||
|
||||
PCRE is written in C, and it has its own API. The distribution now includes a
|
||||
set of C++ wrapper functions, courtesy of Google Inc. (see the pcrecpp man page
|
||||
for details).
|
||||
|
||||
Also included are a set of C wrapper functions that are based on the POSIX
|
||||
API. These end up in the library called libpcreposix. Note that this just
|
||||
provides a POSIX calling interface to PCRE: the regular expressions themselves
|
||||
still follow Perl syntax and semantics. The header file for the POSIX-style
|
||||
functions is called pcreposix.h. The official POSIX name is regex.h, but I
|
||||
didn't want to risk possible problems with existing files of that name by
|
||||
distributing it that way. To use it with an existing program that uses the
|
||||
POSIX API, it will have to be renamed or pointed at by a link.
|
||||
|
||||
If you are using the POSIX interface to PCRE and there is already a POSIX regex
|
||||
library installed on your system, you must take care when linking programs to
|
||||
ensure that they link with PCRE's libpcreposix library. Otherwise they may pick
|
||||
up the "real" POSIX functions of the same name.
|
||||
|
||||
|
||||
Documentation for PCRE
|
||||
----------------------
|
||||
|
||||
If you install PCRE in the normal way, you will end up with an installed set of
|
||||
man pages whose names all start with "pcre". The one that is just called "pcre"
|
||||
lists all the others. In addition to these man pages, the PCRE documentation is
|
||||
supplied in two other forms; however, as there is no standard place to install
|
||||
them, they are left in the doc directory of the unpacked source distribution.
|
||||
These forms are:
|
||||
|
||||
1. Files called doc/pcre.txt, doc/pcregrep.txt, and doc/pcretest.txt. The
|
||||
first of these is a concatenation of the text forms of all the section 3
|
||||
man pages except those that summarize individual functions. The other two
|
||||
are the text forms of the section 1 man pages for the pcregrep and
|
||||
pcretest commands. Text forms are provided for ease of scanning with text
|
||||
editors or similar tools.
|
||||
|
||||
2. A subdirectory called doc/html contains all the documentation in HTML
|
||||
form, hyperlinked in various ways, and rooted in a file called
|
||||
doc/index.html.
|
||||
|
||||
|
||||
Contributions by users of PCRE
|
||||
------------------------------
|
||||
|
||||
You can find contributions from PCRE users in the directory
|
||||
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/Contrib
|
||||
|
||||
where there is also a README file giving brief descriptions of what they are.
|
||||
Several of them provide support for compiling PCRE on various flavours of
|
||||
Windows systems (I myself do not use Windows). Some are complete in themselves;
|
||||
others are pointers to URLs containing relevant files.
|
||||
|
||||
|
||||
Building PCRE on a Unix-like system
|
||||
-----------------------------------
|
||||
|
||||
If you are using HP's ANSI C++ compiler (aCC), please see the special note
|
||||
in the section entitled "Using HP's ANSI C++ compiler (aCC)" below.
|
||||
|
||||
To build PCRE on a Unix-like system, first run the "configure" command from the
|
||||
PCRE distribution directory, with your current directory set to the directory
|
||||
where you want the files to be created. This command is a standard GNU
|
||||
"autoconf" configuration script, for which generic instructions are supplied in
|
||||
INSTALL.
|
||||
|
||||
Most commonly, people build PCRE within its own distribution directory, and in
|
||||
this case, on many systems, just running "./configure" is sufficient, but the
|
||||
usual methods of changing standard defaults are available. For example:
|
||||
|
||||
CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local
|
||||
|
||||
specifies that the C compiler should be run with the flags '-O2 -Wall' instead
|
||||
of the default, and that "make install" should install PCRE under /opt/local
|
||||
instead of the default /usr/local.
|
||||
|
||||
If you want to build in a different directory, just run "configure" with that
|
||||
directory as current. For example, suppose you have unpacked the PCRE source
|
||||
into /source/pcre/pcre-xxx, but you want to build it in /build/pcre/pcre-xxx:
|
||||
|
||||
cd /build/pcre/pcre-xxx
|
||||
/source/pcre/pcre-xxx/configure
|
||||
|
||||
PCRE is written in C and is normally compiled as a C library. However, it is
|
||||
possible to build it as a C++ library, though the provided building apparatus
|
||||
does not have any features to support this.
|
||||
|
||||
There are some optional features that can be included or omitted from the PCRE
|
||||
library. You can read more about them in the pcrebuild man page.
|
||||
|
||||
. If you want to suppress the building of the C++ wrapper library, you can add
|
||||
--disable-cpp to the "configure" command. Otherwise, when "configure" is run,
|
||||
will try to find a C++ compiler and C++ header files, and if it succeeds, it
|
||||
will try to build the C++ wrapper.
|
||||
|
||||
. If you want to make use of the support for UTF-8 character strings in PCRE,
|
||||
you must add --enable-utf8 to the "configure" command. Without it, the code
|
||||
for handling UTF-8 is not included in the library. (Even when included, it
|
||||
still has to be enabled by an option at run time.)
|
||||
|
||||
. If, in addition to support for UTF-8 character strings, you want to include
|
||||
support for the \P, \p, and \X sequences that recognize Unicode character
|
||||
properties, you must add --enable-unicode-properties to the "configure"
|
||||
command. This adds about 30K to the size of the library (in the form of a
|
||||
property table); only the basic two-letter properties such as Lu are
|
||||
supported.
|
||||
|
||||
. You can build PCRE to recognize either CR or LF or the sequence CRLF as
|
||||
indicating the end of a line. Whatever you specify at build time is the
|
||||
default; the caller of PCRE can change the selection at run time. The default
|
||||
newline indicator is a single LF character (the Unix standard). You can
|
||||
specify the default newline indicator by adding --newline-is-cr or
|
||||
--newline-is-lf or --newline-is-crlf to the "configure" command,
|
||||
respectively.
|
||||
|
||||
. When called via the POSIX interface, PCRE uses malloc() to get additional
|
||||
storage for processing capturing parentheses if there are more than 10 of
|
||||
them. You can increase this threshold by setting, for example,
|
||||
|
||||
--with-posix-malloc-threshold=20
|
||||
|
||||
on the "configure" command.
|
||||
|
||||
. PCRE has a counter that can be set to limit the amount of resources it uses.
|
||||
If the limit is exceeded during a match, the match fails. The default is ten
|
||||
million. You can change the default by setting, for example,
|
||||
|
||||
--with-match-limit=500000
|
||||
|
||||
on the "configure" command. This is just the default; individual calls to
|
||||
pcre_exec() can supply their own value. There is discussion on the pcreapi
|
||||
man page.
|
||||
|
||||
. There is a separate counter that limits the depth of recursive function calls
|
||||
during a matching process. This also has a default of ten million, which is
|
||||
essentially "unlimited". You can change the default by setting, for example,
|
||||
|
||||
--with-match-limit-recursion=500000
|
||||
|
||||
Recursive function calls use up the runtime stack; running out of stack can
|
||||
cause programs to crash in strange ways. There is a discussion about stack
|
||||
sizes in the pcrestack man page.
|
||||
|
||||
. The default maximum compiled pattern size is around 64K. You can increase
|
||||
this by adding --with-link-size=3 to the "configure" command. You can
|
||||
increase it even more by setting --with-link-size=4, but this is unlikely
|
||||
ever to be necessary. If you build PCRE with an increased link size, test 2
|
||||
(and 5 if you are using UTF-8) will fail. Part of the output of these tests
|
||||
is a representation of the compiled pattern, and this changes with the link
|
||||
size.
|
||||
|
||||
. You can build PCRE so that its internal match() function that is called from
|
||||
pcre_exec() does not call itself recursively. Instead, it uses blocks of data
|
||||
from the heap via special functions pcre_stack_malloc() and pcre_stack_free()
|
||||
to save data that would otherwise be saved on the stack. To build PCRE like
|
||||
this, use
|
||||
|
||||
--disable-stack-for-recursion
|
||||
|
||||
on the "configure" command. PCRE runs more slowly in this mode, but it may be
|
||||
necessary in environments with limited stack sizes. This applies only to the
|
||||
pcre_exec() function; it does not apply to pcre_dfa_exec(), which does not
|
||||
use deeply nested recursion.
|
||||
|
||||
The "configure" script builds eight files for the basic C library:
|
||||
|
||||
. Makefile is the makefile that builds the library
|
||||
. config.h contains build-time configuration options for the library
|
||||
. pcre-config is a script that shows the settings of "configure" options
|
||||
. libpcre.pc is data for the pkg-config command
|
||||
. libtool is a script that builds shared and/or static libraries
|
||||
. RunTest is a script for running tests on the library
|
||||
. RunGrepTest is a script for running tests on the pcregrep command
|
||||
|
||||
In addition, if a C++ compiler is found, the following are also built:
|
||||
|
||||
. pcrecpp.h is the header file for programs that call PCRE via the C++ wrapper
|
||||
. pcre_stringpiece.h is the header for the C++ "stringpiece" functions
|
||||
|
||||
The "configure" script also creates config.status, which is an executable
|
||||
script that can be run to recreate the configuration, and config.log, which
|
||||
contains compiler output from tests that "configure" runs.
|
||||
|
||||
Once "configure" has run, you can run "make". It builds two libraries, called
|
||||
libpcre and libpcreposix, a test program called pcretest, and the pcregrep
|
||||
command. If a C++ compiler was found on your system, it also builds the C++
|
||||
wrapper library, which is called libpcrecpp, and some test programs called
|
||||
pcrecpp_unittest, pcre_scanner_unittest, and pcre_stringpiece_unittest.
|
||||
|
||||
The command "make test" runs all the appropriate tests. Details of the PCRE
|
||||
tests are given in a separate section of this document, below.
|
||||
|
||||
You can use "make install" to copy the libraries, the public header files
|
||||
pcre.h, pcreposix.h, pcrecpp.h, and pcre_stringpiece.h (the last two only if
|
||||
the C++ wrapper was built), and the man pages to appropriate live directories
|
||||
on your system, in the normal way.
|
||||
|
||||
If you want to remove PCRE from your system, you can run "make uninstall".
|
||||
This removes all the files that "make install" installed. However, it does not
|
||||
remove any directories, because these are often shared with other programs.
|
||||
|
||||
|
||||
Retrieving configuration information on Unix-like systems
|
||||
---------------------------------------------------------
|
||||
|
||||
Running "make install" also installs the command pcre-config, which can be used
|
||||
to recall information about the PCRE configuration and installation. For
|
||||
example:
|
||||
|
||||
pcre-config --version
|
||||
|
||||
prints the version number, and
|
||||
|
||||
pcre-config --libs
|
||||
|
||||
outputs information about where the library is installed. This command can be
|
||||
included in makefiles for programs that use PCRE, saving the programmer from
|
||||
having to remember too many details.
|
||||
|
||||
The pkg-config command is another system for saving and retrieving information
|
||||
about installed libraries. Instead of separate commands for each library, a
|
||||
single command is used. For example:
|
||||
|
||||
pkg-config --cflags pcre
|
||||
|
||||
The data is held in *.pc files that are installed in a directory called
|
||||
pkgconfig.
|
||||
|
||||
|
||||
Shared libraries on Unix-like systems
|
||||
-------------------------------------
|
||||
|
||||
The default distribution builds PCRE as shared libraries and static libraries,
|
||||
as long as the operating system supports shared libraries. Shared library
|
||||
support relies on the "libtool" script which is built as part of the
|
||||
"configure" process.
|
||||
|
||||
The libtool script is used to compile and link both shared and static
|
||||
libraries. They are placed in a subdirectory called .libs when they are newly
|
||||
built. The programs pcretest and pcregrep are built to use these uninstalled
|
||||
libraries (by means of wrapper scripts in the case of shared libraries). When
|
||||
you use "make install" to install shared libraries, pcregrep and pcretest are
|
||||
automatically re-built to use the newly installed shared libraries before being
|
||||
installed themselves. However, the versions left in the source directory still
|
||||
use the uninstalled libraries.
|
||||
|
||||
To build PCRE using static libraries only you must use --disable-shared when
|
||||
configuring it. For example:
|
||||
|
||||
./configure --prefix=/usr/gnu --disable-shared
|
||||
|
||||
Then run "make" in the usual way. Similarly, you can use --disable-static to
|
||||
build only shared libraries.
|
||||
|
||||
|
||||
Cross-compiling on a Unix-like system
|
||||
-------------------------------------
|
||||
|
||||
You can specify CC and CFLAGS in the normal way to the "configure" command, in
|
||||
order to cross-compile PCRE for some other host. However, during the building
|
||||
process, the dftables.c source file is compiled *and run* on the local host, in
|
||||
order to generate the default character tables (the chartables.c file). It
|
||||
therefore needs to be compiled with the local compiler, not the cross compiler.
|
||||
You can do this by specifying CC_FOR_BUILD (and if necessary CFLAGS_FOR_BUILD;
|
||||
there are also CXX_FOR_BUILD and CXXFLAGS_FOR_BUILD for the C++ wrapper)
|
||||
when calling the "configure" command. If they are not specified, they default
|
||||
to the values of CC and CFLAGS.
|
||||
|
||||
|
||||
Using HP's ANSI C++ compiler (aCC)
|
||||
----------------------------------
|
||||
|
||||
Unless C++ support is disabled by specifiying the "--disable-cpp" option of the
|
||||
"configure" script, you *must* include the "-AA" option in the CXXFLAGS
|
||||
environment variable in order for the C++ components to compile correctly.
|
||||
|
||||
Also, note that the aCC compiler on PA-RISC platforms may have a defect whereby
|
||||
needed libraries fail to get included when specifying the "-AA" compiler
|
||||
option. If you experience unresolved symbols when linking the C++ programs,
|
||||
use the workaround of specifying the following environment variable prior to
|
||||
running the "configure" script:
|
||||
|
||||
CXXLDFLAGS="-lstd_v2 -lCsup_v2"
|
||||
|
||||
|
||||
Building on non-Unix systems
|
||||
----------------------------
|
||||
|
||||
For a non-Unix system, read the comments in the file NON-UNIX-USE, though if
|
||||
the system supports the use of "configure" and "make" you may be able to build
|
||||
PCRE in the same way as for Unix systems.
|
||||
|
||||
PCRE has been compiled on Windows systems and on Macintoshes, but I don't know
|
||||
the details because I don't use those systems. It should be straightforward to
|
||||
build PCRE on any system that has a Standard C compiler, because it uses only
|
||||
Standard C functions.
|
||||
|
||||
|
||||
Testing PCRE
|
||||
------------
|
||||
|
||||
To test PCRE on a Unix system, run the RunTest script that is created by the
|
||||
configuring process. There is also a script called RunGrepTest that tests the
|
||||
options of the pcregrep command. If the C++ wrapper library is build, three
|
||||
test programs called pcrecpp_unittest, pcre_scanner_unittest, and
|
||||
pcre_stringpiece_unittest are provided.
|
||||
|
||||
Both the scripts and all the program tests are run if you obey "make runtest",
|
||||
"make check", or "make test". For other systems, see the instructions in
|
||||
NON-UNIX-USE.
|
||||
|
||||
The RunTest script runs the pcretest test program (which is documented in its
|
||||
own man page) on each of the testinput files (in the testdata directory) in
|
||||
turn, and compares the output with the contents of the corresponding testoutput
|
||||
file. A file called testtry is used to hold the main output from pcretest
|
||||
(testsavedregex is also used as a working file). To run pcretest on just one of
|
||||
the test files, give its number as an argument to RunTest, for example:
|
||||
|
||||
RunTest 2
|
||||
|
||||
The first file can also be fed directly into the perltest script to check that
|
||||
Perl gives the same results. The only difference you should see is in the first
|
||||
few lines, where the Perl version is given instead of the PCRE version.
|
||||
|
||||
The second set of tests check pcre_fullinfo(), pcre_info(), pcre_study(),
|
||||
pcre_copy_substring(), pcre_get_substring(), pcre_get_substring_list(), error
|
||||
detection, and run-time flags that are specific to PCRE, as well as the POSIX
|
||||
wrapper API. It also uses the debugging flag to check some of the internals of
|
||||
pcre_compile().
|
||||
|
||||
If you build PCRE with a locale setting that is not the standard C locale, the
|
||||
character tables may be different (see next paragraph). In some cases, this may
|
||||
cause failures in the second set of tests. For example, in a locale where the
|
||||
isprint() function yields TRUE for characters in the range 128-255, the use of
|
||||
[:isascii:] inside a character class defines a different set of characters, and
|
||||
this shows up in this test as a difference in the compiled code, which is being
|
||||
listed for checking. Where the comparison test output contains [\x00-\x7f] the
|
||||
test will contain [\x00-\xff], and similarly in some other cases. This is not a
|
||||
bug in PCRE.
|
||||
|
||||
The third set of tests checks pcre_maketables(), the facility for building a
|
||||
set of character tables for a specific locale and using them instead of the
|
||||
default tables. The tests make use of the "fr_FR" (French) locale. Before
|
||||
running the test, the script checks for the presence of this locale by running
|
||||
the "locale" command. If that command fails, or if it doesn't include "fr_FR"
|
||||
in the list of available locales, the third test cannot be run, and a comment
|
||||
is output to say why. If running this test produces instances of the error
|
||||
|
||||
** Failed to set locale "fr_FR"
|
||||
|
||||
in the comparison output, it means that locale is not available on your system,
|
||||
despite being listed by "locale". This does not mean that PCRE is broken.
|
||||
|
||||
The fourth test checks the UTF-8 support. It is not run automatically unless
|
||||
PCRE is built with UTF-8 support. To do this you must set --enable-utf8 when
|
||||
running "configure". This file can be also fed directly to the perltest script,
|
||||
provided you are running Perl 5.8 or higher. (For Perl 5.6, a small patch,
|
||||
commented in the script, can be be used.)
|
||||
|
||||
The fifth test checks error handling with UTF-8 encoding, and internal UTF-8
|
||||
features of PCRE that are not relevant to Perl.
|
||||
|
||||
The sixth and test checks the support for Unicode character properties. It it
|
||||
not run automatically unless PCRE is built with Unicode property support. To to
|
||||
this you must set --enable-unicode-properties when running "configure".
|
||||
|
||||
The seventh, eighth, and ninth tests check the pcre_dfa_exec() alternative
|
||||
matching function, in non-UTF-8 mode, UTF-8 mode, and UTF-8 mode with Unicode
|
||||
property support, respectively. The eighth and ninth tests are not run
|
||||
automatically unless PCRE is build with the relevant support.
|
||||
|
||||
|
||||
Character tables
|
||||
----------------
|
||||
|
||||
PCRE uses four tables for manipulating and identifying characters whose values
|
||||
are less than 256. The final argument of the pcre_compile() function is a
|
||||
pointer to a block of memory containing the concatenated tables. A call to
|
||||
pcre_maketables() can be used to generate a set of tables in the current
|
||||
locale. If the final argument for pcre_compile() is passed as NULL, a set of
|
||||
default tables that is built into the binary is used.
|
||||
|
||||
The source file called chartables.c contains the default set of tables. This is
|
||||
not supplied in the distribution, but is built by the program dftables
|
||||
(compiled from dftables.c), which uses the ANSI C character handling functions
|
||||
such as isalnum(), isalpha(), isupper(), islower(), etc. to build the table
|
||||
sources. This means that the default C locale which is set for your system will
|
||||
control the contents of these default tables. You can change the default tables
|
||||
by editing chartables.c and then re-building PCRE. If you do this, you should
|
||||
probably also edit Makefile to ensure that the file doesn't ever get
|
||||
re-generated.
|
||||
|
||||
The first two 256-byte tables provide lower casing and case flipping functions,
|
||||
respectively. The next table consists of three 32-byte bit maps which identify
|
||||
digits, "word" characters, and white space, respectively. These are used when
|
||||
building 32-byte bit maps that represent character classes.
|
||||
|
||||
The final 256-byte table has bits indicating various character types, as
|
||||
follows:
|
||||
|
||||
1 white space character
|
||||
2 letter
|
||||
4 decimal digit
|
||||
8 hexadecimal digit
|
||||
16 alphanumeric or '_'
|
||||
128 regular expression metacharacter or binary zero
|
||||
|
||||
You should not alter the set of characters that contain the 128 bit, as that
|
||||
will cause PCRE to malfunction.
|
||||
|
||||
|
||||
Manifest
|
||||
--------
|
||||
|
||||
The distribution should contain the following files:
|
||||
|
||||
(A) The actual source files of the PCRE library functions and their
|
||||
headers:
|
||||
|
||||
dftables.c auxiliary program for building chartables.c
|
||||
|
||||
pcreposix.c )
|
||||
pcre_compile.c )
|
||||
pcre_config.c )
|
||||
pcre_dfa_exec.c )
|
||||
pcre_exec.c )
|
||||
pcre_fullinfo.c )
|
||||
pcre_get.c ) sources for the functions in the library,
|
||||
pcre_globals.c ) and some internal functions that they use
|
||||
pcre_info.c )
|
||||
pcre_maketables.c )
|
||||
pcre_ord2utf8.c )
|
||||
pcre_refcount.c )
|
||||
pcre_study.c )
|
||||
pcre_tables.c )
|
||||
pcre_try_flipped.c )
|
||||
pcre_ucp_searchfuncs.c)
|
||||
pcre_valid_utf8.c )
|
||||
pcre_version.c )
|
||||
pcre_xclass.c )
|
||||
ucptable.c )
|
||||
|
||||
pcre_printint.src ) debugging function that is #included in pcretest, and
|
||||
) can also be #included in pcre_compile()
|
||||
|
||||
pcre.h the public PCRE header file
|
||||
pcreposix.h header for the external POSIX wrapper API
|
||||
pcre_internal.h header for internal use
|
||||
ucp.h ) headers concerned with
|
||||
ucpinternal.h ) Unicode property handling
|
||||
config.in template for config.h, which is built by configure
|
||||
|
||||
pcrecpp.h the header file for the C++ wrapper
|
||||
pcrecpparg.h.in "source" for another C++ header file
|
||||
pcrecpp.cc )
|
||||
pcre_scanner.cc ) source for the C++ wrapper library
|
||||
|
||||
pcre_stringpiece.h.in "source" for pcre_stringpiece.h, the header for the
|
||||
C++ stringpiece functions
|
||||
pcre_stringpiece.cc source for the C++ stringpiece functions
|
||||
|
||||
(B) Auxiliary files:
|
||||
|
||||
AUTHORS information about the author of PCRE
|
||||
ChangeLog log of changes to the code
|
||||
INSTALL generic installation instructions
|
||||
LICENCE conditions for the use of PCRE
|
||||
COPYING the same, using GNU's standard name
|
||||
Makefile.in template for Unix Makefile, which is built by configure
|
||||
NEWS important changes in this release
|
||||
NON-UNIX-USE notes on building PCRE on non-Unix systems
|
||||
README this file
|
||||
RunTest.in template for a Unix shell script for running tests
|
||||
RunGrepTest.in template for a Unix shell script for pcregrep tests
|
||||
config.guess ) files used by libtool,
|
||||
config.sub ) used only when building a shared library
|
||||
config.h.in "source" for the config.h header file
|
||||
configure a configuring shell script (built by autoconf)
|
||||
configure.ac the autoconf input used to build configure
|
||||
doc/Tech.Notes notes on the encoding
|
||||
doc/*.3 man page sources for the PCRE functions
|
||||
doc/*.1 man page sources for pcregrep and pcretest
|
||||
doc/html/* HTML documentation
|
||||
doc/pcre.txt plain text version of the man pages
|
||||
doc/pcretest.txt plain text documentation of test program
|
||||
doc/perltest.txt plain text documentation of Perl test program
|
||||
install-sh a shell script for installing files
|
||||
libpcre.pc.in "source" for libpcre.pc for pkg-config
|
||||
ltmain.sh file used to build a libtool script
|
||||
mkinstalldirs script for making install directories
|
||||
pcretest.c comprehensive test program
|
||||
pcredemo.c simple demonstration of coding calls to PCRE
|
||||
perltest Perl test program
|
||||
pcregrep.c source of a grep utility that uses PCRE
|
||||
pcre-config.in source of script which retains PCRE information
|
||||
pcrecpp_unittest.c )
|
||||
pcre_scanner_unittest.c ) test programs for the C++ wrapper
|
||||
pcre_stringpiece_unittest.c )
|
||||
testdata/testinput* test data for main library tests
|
||||
testdata/testoutput* expected test results
|
||||
testdata/grep* input and output for pcregrep tests
|
||||
|
||||
(C) Auxiliary files for Win32 DLL
|
||||
|
||||
libpcre.def
|
||||
libpcreposix.def
|
||||
|
||||
(D) Auxiliary file for VPASCAL
|
||||
|
||||
makevp.bat
|
||||
|
||||
Philip Hazel
|
||||
Email local part: ph10
|
||||
Email domain: cam.ac.uk
|
||||
June 2006
|
208
libs/pcre/RunGrepTest.in
Normal file
208
libs/pcre/RunGrepTest.in
Normal file
@ -0,0 +1,208 @@
|
||||
#! /bin/sh
|
||||
|
||||
# This file is generated by configure from RunGrepTest.in. Make any changes
|
||||
# to that file.
|
||||
|
||||
echo "Testing pcregrep"
|
||||
./pcregrep -V
|
||||
|
||||
# Run pcregrep tests. The assumption is that the PCRE tests check the library
|
||||
# itself. What we are checking here is the file handling and options that are
|
||||
# supported by pcregrep.
|
||||
|
||||
cf=diff
|
||||
valgrind=
|
||||
if [ ! -d testdata ] ; then
|
||||
ln -s @top_srcdir@/testdata testdata
|
||||
fi
|
||||
testdata=./testdata
|
||||
|
||||
while [ $# -gt 0 ] ; do
|
||||
case $1 in
|
||||
valgrind) valgrind="valgrind -q --leak-check=no";;
|
||||
*) echo "Unknown argument $1"; exit 1;;
|
||||
esac
|
||||
shift
|
||||
done
|
||||
|
||||
echo "---------------------------- Test 1 ------------------------------" >testtry
|
||||
$valgrind ./pcregrep PATTERN $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 2 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep '^PATTERN' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 3 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -in PATTERN $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 4 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -ic PATTERN $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 5 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -in PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 6 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -inh PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 7 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -il PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 8 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -l PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 9 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -q PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
echo "RC=$?" >>testtry
|
||||
|
||||
echo "---------------------------- Test 10 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -q NEVER-PATTERN $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
echo "RC=$?" >>testtry
|
||||
|
||||
echo "---------------------------- Test 11 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -vn pattern $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 12 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -ix pattern $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 13 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -f$testdata/greplist $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 14 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -w pat $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 15 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep 'abc^*' $testdata/grepinput 2>>testtry >>testtry
|
||||
|
||||
echo "---------------------------- Test 16 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep abc $testdata/grepinput $testdata/nonexistfile 2>>testtry >>testtry
|
||||
|
||||
echo "---------------------------- Test 17 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -M 'the\noutput' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 18 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -Mn '(the\noutput|dog\.\n--)' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 19 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -Mix 'Pattern' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 20 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -Mixn 'complete pair\nof lines' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 21 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -nA3 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 22 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -nB3 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 23 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -C3 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 24 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -A9 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 25 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -nB9 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 26 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -A9 -B9 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 27 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -A10 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 28 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -nB10 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 29 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -C12 -B10 'four' $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 30 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -inB3 'pattern' $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 31 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -inA3 'pattern' $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 32 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -L 'fox' $testdata/grepinput $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 33 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep 'fox' $testdata/grepnonexist >>testtry 2>&1
|
||||
echo "RC=$?" >>testtry
|
||||
|
||||
echo "---------------------------- Test 34 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -s 'fox' $testdata/grepnonexist >>testtry 2>&1
|
||||
echo "RC=$?" >>testtry
|
||||
|
||||
echo "---------------------------- Test 35 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -L -r --include=grepinputx 'fox' $testdata >>testtry
|
||||
echo "RC=$?" >>testtry
|
||||
|
||||
echo "---------------------------- Test 36 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep -L -r --include=grepinput --exclude 'grepinput$' 'fox' $testdata >>testtry
|
||||
echo "RC=$?" >>testtry
|
||||
|
||||
echo "---------------------------- Test 37 -----------------------------" >>testtry
|
||||
$valgrind ./pcregrep '^(a+)*\d' $testdata/grepinput >>testtry 2>teststderr
|
||||
echo "RC=$?" >>testtry
|
||||
echo "======== STDERR ========" >>testtry
|
||||
cat teststderr >>testtry
|
||||
|
||||
echo "---------------------------- Test 38 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep '>\x00<' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 39 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -A1 'before the binary zero' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 40 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -B1 'after the binary zero' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 41 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -B1 -o '\w+ the binary zero' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 41 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -B1 -onH '\w+ the binary zero' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 42 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -on 'before|zero|after' $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 43 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -on -e before -e zero -e after $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 44 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -on -f $testdata/greplist -e binary $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 45 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -e abc -e '(unclosed' $testdata/grepinput 2>>testtry >>testtry
|
||||
|
||||
echo "---------------------------- Test 46 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -Fx "AB.VE
|
||||
elephant" $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 47 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -F "AB.VE
|
||||
elephant" $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 48 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -F -e DATA -e "AB.VE
|
||||
elephant" $testdata/grepinput >>testtry
|
||||
|
||||
echo "---------------------------- Test 49 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep "^(abc|def|ghi|jkl)" $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 50 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep -N CR "^(abc|def|ghi|jkl)" $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 51 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep --newline=crlf "^(abc|def|ghi|jkl)" $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 52 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep --newline=cr -F "def
jkl" $testdata/grepinputx >>testtry
|
||||
|
||||
echo "---------------------------- Test 53 ------------------------------" >>testtry
|
||||
$valgrind ./pcregrep --newline=crlf -F "xxx
|
||||
jkl" $testdata/grepinputx >>testtry
|
||||
|
||||
# Now compare the results.
|
||||
|
||||
$cf testtry $testdata/grepoutput
|
||||
if [ $? != 0 ] ; then exit 1; else exit 0; fi
|
||||
|
||||
# End
|
258
libs/pcre/RunTest.in
Executable file
258
libs/pcre/RunTest.in
Executable file
@ -0,0 +1,258 @@
|
||||
#! /bin/sh
|
||||
|
||||
# This file is generated by configure from RunTest.in. Make any changes
|
||||
# to that file.
|
||||
|
||||
# Run PCRE tests
|
||||
|
||||
cf=diff
|
||||
valgrind=
|
||||
if [ ! -d testdata ] ; then
|
||||
ln -s @top_srcdir@/testdata testdata
|
||||
fi
|
||||
testdata=./testdata
|
||||
|
||||
|
||||
# Select which tests to run; if no selection, run all
|
||||
|
||||
do1=no
|
||||
do2=no
|
||||
do3=no
|
||||
do4=no
|
||||
do5=no
|
||||
do6=no
|
||||
do7=no
|
||||
do8=no
|
||||
do9=no
|
||||
|
||||
while [ $# -gt 0 ] ; do
|
||||
case $1 in
|
||||
1) do1=yes;;
|
||||
2) do2=yes;;
|
||||
3) do3=yes;;
|
||||
4) do4=yes;;
|
||||
5) do5=yes;;
|
||||
6) do6=yes;;
|
||||
7) do7=yes;;
|
||||
8) do8=yes;;
|
||||
9) do9=yes;;
|
||||
valgrind) valgrind="valgrind -q";;
|
||||
*) echo "Unknown test number $1"; exit 1;;
|
||||
esac
|
||||
shift
|
||||
done
|
||||
|
||||
if [ "@LINK_SIZE@" != "" -a "@LINK_SIZE@" != "-DLINK_SIZE=2" ] ; then
|
||||
if [ $do2 = yes ] ; then
|
||||
echo "Can't run test 2 with an internal link size other than 2"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do5 = yes ] ; then
|
||||
echo "Can't run test 5 with an internal link size other than 2"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do6 = yes ] ; then
|
||||
echo "Can't run test 6 with an internal link size other than 2"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ "@UTF8@" = "" ] ; then
|
||||
if [ $do4 = yes ] ; then
|
||||
echo "Can't run test 4 because UTF-8 support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do5 = yes ] ; then
|
||||
echo "Can't run test 5 because UTF-8 support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do6 = yes ] ; then
|
||||
echo "Can't run test 6 because UTF-8 support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do8 = yes ] ; then
|
||||
echo "Can't run test 8 because UTF-8 support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do9 = yes ] ; then
|
||||
echo "Can't run test 9 because UTF-8 support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ "@UCP@" = "" ] ; then
|
||||
if [ $do6 = yes ] ; then
|
||||
echo "Can't run test 6 because Unicode property support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
if [ $do9 = yes ] ; then
|
||||
echo "Can't run test 9 because Unicode property support is not configured"
|
||||
exit 1
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ $do1 = no -a $do2 = no -a $do3 = no -a $do4 = no -a \
|
||||
$do5 = no -a $do6 = no -a $do7 = no -a $do8 = no -a \
|
||||
$do9 = no ] ; then
|
||||
do1=yes
|
||||
do2=yes
|
||||
do3=yes
|
||||
if [ "@UTF8@" != "" ] ; then do4=yes; fi
|
||||
if [ "@UTF8@" != "" ] ; then do5=yes; fi
|
||||
if [ "@UTF8@" != "" -a "@UCP@" != "" ] ; then do6=yes; fi
|
||||
do7=yes
|
||||
if [ "@UTF8@" != "" ] ; then do8=yes; fi
|
||||
if [ "@UTF8@" != "" -a "@UCP@" != "" ] ; then do9=yes; fi
|
||||
fi
|
||||
|
||||
# Show which release
|
||||
|
||||
./pcretest /dev/null
|
||||
|
||||
# Primary test, Perl-compatible
|
||||
|
||||
if [ $do1 = yes ] ; then
|
||||
echo "Test 1: main functionality (Perl compatible)"
|
||||
$valgrind ./pcretest -q $testdata/testinput1 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput1
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
fi
|
||||
|
||||
# PCRE tests that are not Perl-compatible - API & error tests, mostly
|
||||
|
||||
if [ $do2 = yes ] ; then
|
||||
if [ "@LINK_SIZE@" = "" -o "@LINK_SIZE@" = "-DLINK_SIZE=2" ] ; then
|
||||
echo "Test 2: API and error handling (not Perl compatible)"
|
||||
$valgrind ./pcretest -q -i $testdata/testinput2 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput2
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
else
|
||||
echo Test 2 skipped for link size other than 2 \(@LINK_SIZE@\)
|
||||
echo " "
|
||||
fi
|
||||
fi
|
||||
|
||||
# Locale-specific tests, provided the "fr_FR" locale is available
|
||||
|
||||
if [ $do3 = yes ] ; then
|
||||
locale -a | grep '^fr_FR$' >/dev/null
|
||||
if [ $? -eq 0 ] ; then
|
||||
echo "Test 3: locale-specific features (using 'fr_FR' locale)"
|
||||
$valgrind ./pcretest -q $testdata/testinput3 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput3
|
||||
if [ $? != 0 ] ; then
|
||||
echo " "
|
||||
echo "Locale test did not run entirely successfully."
|
||||
echo "This usually means that there is a problem with the locale"
|
||||
echo "settings rather than a bug in PCRE."
|
||||
else
|
||||
echo "OK"
|
||||
fi
|
||||
echo " "
|
||||
else exit 1
|
||||
fi
|
||||
else
|
||||
echo "Cannot test locale-specific features - 'fr_FR' locale not found,"
|
||||
echo "or the \"locale\" command is not available to check for it."
|
||||
echo " "
|
||||
fi
|
||||
fi
|
||||
|
||||
# Additional tests for UTF8 support
|
||||
|
||||
if [ $do4 = yes ] ; then
|
||||
echo "Test 4: UTF-8 support (Perl compatible)"
|
||||
$valgrind ./pcretest -q $testdata/testinput4 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput4
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
fi
|
||||
|
||||
if [ $do5 = yes ] ; then
|
||||
if [ "@LINK_SIZE@" = "" -o "@LINK_SIZE@" = "-DLINK_SIZE=2" ] ; then
|
||||
echo "Test 5: API and internals for UTF-8 support (not Perl compatible)"
|
||||
$valgrind ./pcretest -q $testdata/testinput5 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput5
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
else
|
||||
echo Test 5 skipped for link size other than 2 \(@LINK_SIZE@\)
|
||||
echo " "
|
||||
fi
|
||||
fi
|
||||
|
||||
if [ $do6 = yes ] ; then
|
||||
if [ "@LINK_SIZE@" = "" -o "@LINK_SIZE@" = "-DLINK_SIZE=2" ] ; then
|
||||
echo "Test 6: Unicode property support"
|
||||
$valgrind ./pcretest -q $testdata/testinput6 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput6
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
else
|
||||
echo Test 6 skipped for link size other than 2 \(@LINK_SIZE@\)
|
||||
echo " "
|
||||
fi
|
||||
fi
|
||||
|
||||
# Tests for DFA matching support
|
||||
|
||||
if [ $do7 = yes ] ; then
|
||||
echo "Test 7: DFA matching"
|
||||
$valgrind ./pcretest -q -dfa $testdata/testinput7 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput7
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
fi
|
||||
|
||||
if [ $do8 = yes ] ; then
|
||||
echo "Test 8: DFA matching with UTF-8"
|
||||
$valgrind ./pcretest -q -dfa $testdata/testinput8 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput8
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
fi
|
||||
|
||||
if [ $do9 = yes ] ; then
|
||||
echo "Test 9: DFA matching with Unicode properties"
|
||||
$valgrind ./pcretest -q -dfa $testdata/testinput9 testtry
|
||||
if [ $? = 0 ] ; then
|
||||
$cf testtry $testdata/testoutput9
|
||||
if [ $? != 0 ] ; then exit 1; fi
|
||||
else exit 1
|
||||
fi
|
||||
echo "OK"
|
||||
echo " "
|
||||
fi
|
||||
|
||||
# End
|
1495
libs/pcre/config.guess
vendored
Executable file
1495
libs/pcre/config.guess
vendored
Executable file
File diff suppressed because it is too large
Load Diff
143
libs/pcre/config.h.in
Normal file
143
libs/pcre/config.h.in
Normal file
@ -0,0 +1,143 @@
|
||||
|
||||
/* On Unix-like systems config.in is converted by "configure" into config.h.
|
||||
Some other environments also support the use of "configure". PCRE is written in
|
||||
Standard C, but there are a few non-standard things it can cope with, allowing
|
||||
it to run on SunOS4 and other "close to standard" systems.
|
||||
|
||||
On a non-Unix-like system you should just copy this file into config.h, and set
|
||||
up the macros the way you need them. You should normally change the definitions
|
||||
of HAVE_STRERROR and HAVE_MEMMOVE to 1. Unfortunately, because of the way
|
||||
autoconf works, these cannot be made the defaults. If your system has bcopy()
|
||||
and not memmove(), change the definition of HAVE_BCOPY instead of HAVE_MEMMOVE.
|
||||
If your system has neither bcopy() nor memmove(), leave them both as 0; an
|
||||
emulation function will be used. */
|
||||
|
||||
/* If you are compiling for a system that uses EBCDIC instead of ASCII
|
||||
character codes, define this macro as 1. On systems that can use "configure",
|
||||
this can be done via --enable-ebcdic. */
|
||||
|
||||
#ifndef EBCDIC
|
||||
#define EBCDIC 0
|
||||
#endif
|
||||
|
||||
/* If you are compiling for a system other than a Unix-like system or Win32,
|
||||
and it needs some magic to be inserted before the definition of a function that
|
||||
is exported by the library, define this macro to contain the relevant magic. If
|
||||
you do not define this macro, it defaults to "extern" for a C compiler and
|
||||
"extern C" for a C++ compiler on non-Win32 systems. This macro apears at the
|
||||
start of every exported function that is part of the external API. It does not
|
||||
appear on functions that are "external" in the C sense, but which are internal
|
||||
to the library. */
|
||||
|
||||
/* #define PCRE_DATA_SCOPE */
|
||||
|
||||
/* Define the following macro to empty if the "const" keyword does not work. */
|
||||
|
||||
#undef const
|
||||
|
||||
/* Define the following macro to "unsigned" if <stddef.h> does not define
|
||||
size_t. */
|
||||
|
||||
#undef size_t
|
||||
|
||||
/* The following two definitions are mainly for the benefit of SunOS4, which
|
||||
does not have the strerror() or memmove() functions that should be present in
|
||||
all Standard C libraries. The macros HAVE_STRERROR and HAVE_MEMMOVE should
|
||||
normally be defined with the value 1 for other systems, but unfortunately we
|
||||
cannot make this the default because "configure" files generated by autoconf
|
||||
will only change 0 to 1; they won't change 1 to 0 if the functions are not
|
||||
found. */
|
||||
|
||||
#define HAVE_STRERROR 0
|
||||
#define HAVE_MEMMOVE 0
|
||||
|
||||
/* There are some non-Unix-like systems that don't even have bcopy(). If this
|
||||
macro is false, an emulation is used. If HAVE_MEMMOVE is set to 1, the value of
|
||||
HAVE_BCOPY is not relevant. */
|
||||
|
||||
#define HAVE_BCOPY 0
|
||||
|
||||
/* The value of NEWLINE determines the newline character. The default is to
|
||||
leave it up to the compiler, but some sites want to force a particular value.
|
||||
On Unix-like systems, "configure" can be used to override this default. */
|
||||
|
||||
#ifndef NEWLINE
|
||||
#define NEWLINE '\n'
|
||||
#endif
|
||||
|
||||
/* The value of LINK_SIZE determines the number of bytes used to store links as
|
||||
offsets within the compiled regex. The default is 2, which allows for compiled
|
||||
patterns up to 64K long. This covers the vast majority of cases. However, PCRE
|
||||
can also be compiled to use 3 or 4 bytes instead. This allows for longer
|
||||
patterns in extreme cases. On systems that support it, "configure" can be used
|
||||
to override this default. */
|
||||
|
||||
#ifndef LINK_SIZE
|
||||
#define LINK_SIZE 2
|
||||
#endif
|
||||
|
||||
/* When calling PCRE via the POSIX interface, additional working storage is
|
||||
required for holding the pointers to capturing substrings because PCRE requires
|
||||
three integers per substring, whereas the POSIX interface provides only two. If
|
||||
the number of expected substrings is small, the wrapper function uses space on
|
||||
the stack, because this is faster than using malloc() for each call. The
|
||||
threshold above which the stack is no longer used is defined by POSIX_MALLOC_
|
||||
THRESHOLD. On systems that support it, "configure" can be used to override this
|
||||
default. */
|
||||
|
||||
#ifndef POSIX_MALLOC_THRESHOLD
|
||||
#define POSIX_MALLOC_THRESHOLD 10
|
||||
#endif
|
||||
|
||||
/* PCRE uses recursive function calls to handle backtracking while matching.
|
||||
This can sometimes be a problem on systems that have stacks of limited size.
|
||||
Define NO_RECURSE to get a version that doesn't use recursion in the match()
|
||||
function; instead it creates its own stack by steam using pcre_recurse_malloc()
|
||||
to obtain memory from the heap. For more detail, see the comments and other
|
||||
stuff just above the match() function. On systems that support it, "configure"
|
||||
can be used to set this in the Makefile (use --disable-stack-for-recursion). */
|
||||
|
||||
/* #define NO_RECURSE */
|
||||
|
||||
/* The value of MATCH_LIMIT determines the default number of times the internal
|
||||
match() function can be called during a single execution of pcre_exec(). There
|
||||
is a runtime interface for setting a different limit. The limit exists in order
|
||||
to catch runaway regular expressions that take for ever to determine that they
|
||||
do not match. The default is set very large so that it does not accidentally
|
||||
catch legitimate cases. On systems that support it, "configure" can be used to
|
||||
override this default default. */
|
||||
|
||||
#ifndef MATCH_LIMIT
|
||||
#define MATCH_LIMIT 10000000
|
||||
#endif
|
||||
|
||||
/* The above limit applies to all calls of match(), whether or not they
|
||||
increase the recursion depth. In some environments it is desirable to limit the
|
||||
depth of recursive calls of match() more strictly, in order to restrict the
|
||||
maximum amount of stack (or heap, if NO_RECURSE is defined) that is used. The
|
||||
value of MATCH_LIMIT_RECURSION applies only to recursive calls of match(). To
|
||||
have any useful effect, it must be less than the value of MATCH_LIMIT. There is
|
||||
a runtime method for setting a different limit. On systems that support it,
|
||||
"configure" can be used to override this default default. */
|
||||
|
||||
#ifndef MATCH_LIMIT_RECURSION
|
||||
#define MATCH_LIMIT_RECURSION MATCH_LIMIT
|
||||
#endif
|
||||
|
||||
/* These three limits are parameterized just in case anybody ever wants to
|
||||
change them. Care must be taken if they are increased, because they guard
|
||||
against integer overflow caused by enormously large patterns. */
|
||||
|
||||
#ifndef MAX_NAME_SIZE
|
||||
#define MAX_NAME_SIZE 32
|
||||
#endif
|
||||
|
||||
#ifndef MAX_NAME_COUNT
|
||||
#define MAX_NAME_COUNT 10000
|
||||
#endif
|
||||
|
||||
#ifndef MAX_DUPLENGTH
|
||||
#define MAX_DUPLENGTH 30000
|
||||
#endif
|
||||
|
||||
/* End */
|
1627
libs/pcre/config.sub
vendored
Executable file
1627
libs/pcre/config.sub
vendored
Executable file
File diff suppressed because it is too large
Load Diff
21093
libs/pcre/configure
vendored
Executable file
21093
libs/pcre/configure
vendored
Executable file
File diff suppressed because it is too large
Load Diff
302
libs/pcre/configure.ac
Normal file
302
libs/pcre/configure.ac
Normal file
@ -0,0 +1,302 @@
|
||||
dnl Process this file with autoconf to produce a configure script.
|
||||
|
||||
dnl This configure.in file has been hacked around quite a lot as a result of
|
||||
dnl patches that various people have sent to me (PH). Sometimes the information
|
||||
dnl I get is contradictory. I've tried to put in comments that explain things,
|
||||
dnl but in some cases the information is second-hand and I have no way of
|
||||
dnl verifying it. I am not an autoconf or libtool expert!
|
||||
|
||||
dnl This is required at the start; the name is the name of a file
|
||||
dnl it should be seeing, to verify it is in the same directory.
|
||||
|
||||
AC_INIT(dftables.c)
|
||||
AC_CONFIG_SRCDIR([pcre.h])
|
||||
|
||||
dnl A safety precaution
|
||||
|
||||
AC_PREREQ(2.57)
|
||||
|
||||
dnl Arrange to build config.h from config.h.in.
|
||||
dnl Manual says this macro should come right after AC_INIT.
|
||||
AC_CONFIG_HEADER(config.h)
|
||||
|
||||
dnl Default values for miscellaneous macros
|
||||
|
||||
POSIX_MALLOC_THRESHOLD=-DPOSIX_MALLOC_THRESHOLD=10
|
||||
|
||||
dnl Provide versioning information for libtool shared libraries that
|
||||
dnl are built by default on Unix systems.
|
||||
|
||||
PCRE_LIB_VERSION=0:1:0
|
||||
PCRE_POSIXLIB_VERSION=0:0:0
|
||||
PCRE_CPPLIB_VERSION=0:0:0
|
||||
|
||||
dnl Find the PCRE version from the pcre.h file. The PCRE_VERSION variable is
|
||||
dnl substituted in pcre-config.in.
|
||||
|
||||
PCRE_MAJOR=`grep '#define PCRE_MAJOR' ${srcdir}/pcre.h | cut -c 29-`
|
||||
PCRE_MINOR=`grep '#define PCRE_MINOR' ${srcdir}/pcre.h | cut -c 29-`
|
||||
PCRE_PRERELEASE=`grep '#define PCRE_PRERELEASE' ${srcdir}/pcre.h | cut -c 29-`
|
||||
PCRE_VERSION=${PCRE_MAJOR}.${PCRE_MINOR}${PCRE_PRERELEASE}
|
||||
|
||||
dnl Handle --disable-cpp
|
||||
|
||||
AC_ARG_ENABLE(cpp,
|
||||
[ --disable-cpp disable C++ support],
|
||||
want_cpp="$enableval", want_cpp=yes)
|
||||
|
||||
dnl Checks for programs.
|
||||
|
||||
AC_PROG_CC
|
||||
|
||||
dnl Test for C++ for the C++ wrapper libpcrecpp. It seems, however, that
|
||||
dnl AC_PROC_CXX will set $CXX to "g++" when no C++ compiler is installed, even
|
||||
dnl though that is completely bogus. (This may happen only on certain systems
|
||||
dnl with certain versions of autoconf, of course.) An attempt to include this
|
||||
dnl test inside a check for want_cpp was criticized by a libtool expert, who
|
||||
dnl tells me that it isn't allowed.
|
||||
|
||||
AC_PROG_CXX
|
||||
|
||||
dnl The icc compiler has the same options as gcc, so let the rest of the
|
||||
dnl configure script think it has gcc when setting up dnl options etc.
|
||||
dnl This is a nasty hack which no longer seems necessary with the update
|
||||
dnl to the latest libtool files, so I have commented it out.
|
||||
dnl
|
||||
dnl if test "$CC" = "icc" ; then GCC=yes ; fi
|
||||
|
||||
AC_PROG_INSTALL
|
||||
AC_LIBTOOL_WIN32_DLL
|
||||
AC_PROG_LIBTOOL
|
||||
|
||||
dnl We need to find a compiler for compiling a program to run on the local host
|
||||
dnl while building. It needs to be different from CC when cross-compiling.
|
||||
dnl There is a macro called AC_PROG_CC_FOR_BUILD in the GNU archive for
|
||||
dnl figuring this out automatically. Unfortunately, it does not work with the
|
||||
dnl latest versions of autoconf. So for the moment, we just default to the
|
||||
dnl same values as the "main" compiler. People who are cross-compiling will
|
||||
dnl just have to adjust the Makefile by hand or set these values when they
|
||||
dnl run "configure".
|
||||
|
||||
CC_FOR_BUILD=${CC_FOR_BUILD:-'$(CC)'}
|
||||
CXX_FOR_BUILD=${CXX_FOR_BUILD:-'$(CXX)'}
|
||||
CFLAGS_FOR_BUILD=${CFLAGS_FOR_BUILD:-'$(CFLAGS)'}
|
||||
CPPFLAGS_FOR_BUILD=${CFLAGS_FOR_BUILD:-'$(CPPFLAGS)'}
|
||||
CXXFLAGS_FOR_BUILD=${CXXFLAGS_FOR_BUILD:-'$(CXXFLAGS)'}
|
||||
BUILD_EXEEXT=${BUILD_EXEEXT:-'$(EXEEXT)'}
|
||||
BUILD_OBJEXT=${BUILD_OBJEXT:-'$(OBJEXT)'}
|
||||
|
||||
dnl Checks for header files.
|
||||
|
||||
AC_HEADER_STDC
|
||||
AC_CHECK_HEADERS(limits.h)
|
||||
|
||||
dnl The files below are C++ header files. One person told me (PH) that
|
||||
dnl AC_LANG_CPLUSPLUS unsets CXX if it was explicitly set to something which
|
||||
dnl doesn't work. However, this doesn't always seem to be the case.
|
||||
|
||||
if test "x$want_cpp" = "xyes" -a -n "$CXX"
|
||||
then
|
||||
AC_LANG_SAVE
|
||||
AC_LANG_CPLUSPLUS
|
||||
|
||||
dnl We could be more clever here, given we're doing AC_SUBST with this
|
||||
dnl (eg set a var to be the name of the include file we want). But we're not
|
||||
dnl so it's easy to change back to 'regular' autoconf vars if we needed to.
|
||||
AC_CHECK_HEADERS(string, [pcre_have_cpp_headers="1"],
|
||||
[pcre_have_cpp_headers="0"])
|
||||
AC_CHECK_HEADERS(bits/type_traits.h, [pcre_have_bits_type_traits="1"],
|
||||
[pcre_have_bits_type_traits="0"])
|
||||
AC_CHECK_HEADERS(type_traits.h, [pcre_have_type_traits="1"],
|
||||
[pcre_have_type_traits="0"])
|
||||
dnl Using AC_SUBST eliminates the need to include config.h in a public .h file
|
||||
AC_SUBST(pcre_have_bits_type_traits)
|
||||
AC_SUBST(pcre_have_type_traits)
|
||||
AC_LANG_RESTORE
|
||||
fi
|
||||
|
||||
dnl From the above, we now have enough info to know if C++ is fully installed
|
||||
if test "x$want_cpp" = "xyes" -a -n "$CXX" -a "$pcre_have_cpp_headers" = 1; then
|
||||
MAYBE_CPP_TARGETS='$(CPP_TARGETS)'
|
||||
HAVE_CPP=
|
||||
else
|
||||
MAYBE_CPP_TARGETS=
|
||||
HAVE_CPP="#"
|
||||
fi
|
||||
AC_SUBST(MAYBE_CPP_TARGETS)
|
||||
AC_SUBST(HAVE_CPP)
|
||||
|
||||
dnl Checks for typedefs, structures, and compiler characteristics.
|
||||
|
||||
AC_C_CONST
|
||||
AC_TYPE_SIZE_T
|
||||
|
||||
AC_CHECK_TYPES([long long], [pcre_have_long_long="1"], [pcre_have_long_long="0"])
|
||||
AC_CHECK_TYPES([unsigned long long], [pcre_have_ulong_long="1"], [pcre_have_ulong_long="0"])
|
||||
AC_SUBST(pcre_have_long_long)
|
||||
AC_SUBST(pcre_have_ulong_long)
|
||||
|
||||
dnl Checks for library functions.
|
||||
|
||||
AC_CHECK_FUNCS(bcopy memmove strerror strtoq strtoll)
|
||||
|
||||
dnl Handle --enable-utf8
|
||||
|
||||
AC_ARG_ENABLE(utf8,
|
||||
[ --enable-utf8 enable UTF8 support],
|
||||
if test "$enableval" = "yes"; then
|
||||
UTF8=-DSUPPORT_UTF8
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --enable-unicode-properties
|
||||
|
||||
AC_ARG_ENABLE(unicode-properties,
|
||||
[ --enable-unicode-properties enable Unicode properties support],
|
||||
if test "$enableval" = "yes"; then
|
||||
UCP=-DSUPPORT_UCP
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --enable-newline-is-cr
|
||||
|
||||
AC_ARG_ENABLE(newline-is-cr,
|
||||
[ --enable-newline-is-cr use CR as the newline character],
|
||||
if test "$enableval" = "yes"; then
|
||||
NEWLINE=-DNEWLINE=13
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --enable-newline-is-lf
|
||||
|
||||
AC_ARG_ENABLE(newline-is-lf,
|
||||
[ --enable-newline-is-lf use LF as the newline character],
|
||||
if test "$enableval" = "yes"; then
|
||||
NEWLINE=-DNEWLINE=10
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --enable-newline-is-crlf
|
||||
|
||||
AC_ARG_ENABLE(newline-is-crlf,
|
||||
[ --enable-newline-is-crlf use CRLF as the newline sequence],
|
||||
if test "$enableval" = "yes"; then
|
||||
NEWLINE=-DNEWLINE=3338
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --enable-ebcdic
|
||||
|
||||
AC_ARG_ENABLE(ebcdic,
|
||||
[ --enable-ebcdic assume EBCDIC coding rather than ASCII],
|
||||
if test "$enableval" == "yes"; then
|
||||
EBCDIC=-DEBCDIC=1
|
||||
fi
|
||||
)
|
||||
|
||||
dnl Handle --disable-stack-for-recursion
|
||||
|
||||
AC_ARG_ENABLE(stack-for-recursion,
|
||||
[ --disable-stack-for-recursion disable use of stack recursion when matching],
|
||||
if test "$enableval" = "no"; then
|
||||
NO_RECURSE=-DNO_RECURSE
|
||||
fi
|
||||
)
|
||||
|
||||
dnl There doesn't seem to be a straightforward way of having parameters
|
||||
dnl that set values, other than fudging the --with thing. So that's what
|
||||
dnl I've done.
|
||||
|
||||
dnl Handle --with-posix-malloc-threshold=n
|
||||
|
||||
AC_ARG_WITH(posix-malloc-threshold,
|
||||
[ --with-posix-malloc-threshold=10 threshold for POSIX malloc usage],
|
||||
POSIX_MALLOC_THRESHOLD=-DPOSIX_MALLOC_THRESHOLD=$withval
|
||||
)
|
||||
|
||||
dnl Handle --with-link-size=n
|
||||
|
||||
AC_ARG_WITH(link-size,
|
||||
[ --with-link-size=2 internal link size (2, 3, or 4 allowed)],
|
||||
LINK_SIZE=-DLINK_SIZE=$withval
|
||||
)
|
||||
|
||||
dnl Handle --with-match-limit=n
|
||||
|
||||
AC_ARG_WITH(match-limit,
|
||||
[ --with-match-limit=10000000 default limit on internal looping],
|
||||
MATCH_LIMIT=-DMATCH_LIMIT=$withval
|
||||
)
|
||||
|
||||
dnl Handle --with-match-limit_recursion=n
|
||||
|
||||
AC_ARG_WITH(match-limit-recursion,
|
||||
[ --with-match-limit-recursion=10000000 default limit on internal recursion],
|
||||
MATCH_LIMIT_RECURSION=-DMATCH_LIMIT_RECURSION=$withval
|
||||
)
|
||||
|
||||
dnl Unicode character property support implies UTF-8 support
|
||||
|
||||
if test "$UCP" != "" ; then
|
||||
UTF8=-DSUPPORT_UTF8
|
||||
fi
|
||||
|
||||
dnl "Export" these variables
|
||||
|
||||
AC_SUBST(BUILD_EXEEXT)
|
||||
AC_SUBST(BUILD_OBJEXT)
|
||||
AC_SUBST(CC_FOR_BUILD)
|
||||
AC_SUBST(CXX_FOR_BUILD)
|
||||
AC_SUBST(CFLAGS_FOR_BUILD)
|
||||
AC_SUBST(CXXFLAGS_FOR_BUILD)
|
||||
AC_SUBST(CXXLDFLAGS)
|
||||
AC_SUBST(EBCDIC)
|
||||
AC_SUBST(HAVE_MEMMOVE)
|
||||
AC_SUBST(HAVE_STRERROR)
|
||||
AC_SUBST(LINK_SIZE)
|
||||
AC_SUBST(MATCH_LIMIT)
|
||||
AC_SUBST(MATCH_LIMIT_RECURSION)
|
||||
AC_SUBST(NEWLINE)
|
||||
AC_SUBST(NO_RECURSE)
|
||||
AC_SUBST(PCRE_LIB_VERSION)
|
||||
AC_SUBST(PCRE_POSIXLIB_VERSION)
|
||||
AC_SUBST(PCRE_CPPLIB_VERSION)
|
||||
AC_SUBST(PCRE_VERSION)
|
||||
AC_SUBST(POSIX_MALLOC_THRESHOLD)
|
||||
AC_SUBST(UCP)
|
||||
AC_SUBST(UTF8)
|
||||
|
||||
dnl Stuff to make MinGW work better. Special treatment is no longer
|
||||
dnl needed for Cygwin.
|
||||
|
||||
case $host_os in
|
||||
mingw* )
|
||||
POSIX_OBJ=pcreposix.o
|
||||
POSIX_LOBJ=pcreposix.lo
|
||||
POSIX_LIB=
|
||||
ON_WINDOWS=
|
||||
NOT_ON_WINDOWS="#"
|
||||
WIN_PREFIX=
|
||||
;;
|
||||
* )
|
||||
ON_WINDOWS="#"
|
||||
NOT_ON_WINDOWS=
|
||||
POSIX_OBJ=
|
||||
POSIX_LOBJ=
|
||||
POSIX_LIB=libpcreposix.la
|
||||
WIN_PREFIX=
|
||||
;;
|
||||
esac
|
||||
AC_SUBST(WIN_PREFIX)
|
||||
AC_SUBST(ON_WINDOWS)
|
||||
AC_SUBST(NOT_ON_WINDOWS)
|
||||
AC_SUBST(POSIX_OBJ)
|
||||
AC_SUBST(POSIX_LOBJ)
|
||||
AC_SUBST(POSIX_LIB)
|
||||
|
||||
if test "x$enable_shared" = "xno" ; then
|
||||
AC_DEFINE([PCRE_STATIC],[1],[to link statically])
|
||||
fi
|
||||
|
||||
dnl This must be last; it determines what files are written as well as config.h
|
||||
AC_OUTPUT(Makefile pcre-config:pcre-config.in libpcre.pc:libpcre.pc.in pcrecpparg.h:pcrecpparg.h.in pcre_stringpiece.h:pcre_stringpiece.h.in RunGrepTest:RunGrepTest.in RunTest:RunTest.in,[chmod a+x RunTest RunGrepTest pcre-config])
|
172
libs/pcre/dftables.c
Normal file
172
libs/pcre/dftables.c
Normal file
@ -0,0 +1,172 @@
|
||||
/*************************************************
|
||||
* Perl-Compatible Regular Expressions *
|
||||
*************************************************/
|
||||
|
||||
/* PCRE is a library of functions to support regular expressions whose syntax
|
||||
and semantics are as close as possible to those of the Perl 5 language.
|
||||
|
||||
Written by Philip Hazel
|
||||
Copyright (c) 1997-2006 University of Cambridge
|
||||
|
||||
-----------------------------------------------------------------------------
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
* Redistributions of source code must retain the above copyright notice,
|
||||
this list of conditions and the following disclaimer.
|
||||
|
||||
* Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
|
||||
* Neither the name of the University of Cambridge nor the names of its
|
||||
contributors may be used to endorse or promote products derived from
|
||||
this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||
POSSIBILITY OF SUCH DAMAGE.
|
||||
-----------------------------------------------------------------------------
|
||||
*/
|
||||
|
||||
|
||||
/* This is a freestanding support program to generate a file containing default
|
||||
character tables for PCRE. The tables are built according to the default C
|
||||
locale. Now that pcre_maketables is a function visible to the outside world, we
|
||||
make use of its code from here in order to be consistent. */
|
||||
|
||||
#include <ctype.h>
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
|
||||
#include "pcre_internal.h"
|
||||
|
||||
#define DFTABLES /* pcre_maketables.c notices this */
|
||||
#include "pcre_maketables.c"
|
||||
|
||||
|
||||
int main(int argc, char **argv)
|
||||
{
|
||||
int i;
|
||||
FILE *f;
|
||||
const unsigned char *tables = pcre_maketables();
|
||||
const unsigned char *base_of_tables = tables;
|
||||
|
||||
if (argc != 2)
|
||||
{
|
||||
fprintf(stderr, "dftables: one filename argument is required\n");
|
||||
return 1;
|
||||
}
|
||||
|
||||
f = fopen(argv[1], "wb");
|
||||
if (f == NULL)
|
||||
{
|
||||
fprintf(stderr, "dftables: failed to open %s for writing\n", argv[1]);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* There are two fprintf() calls here, because gcc in pedantic mode complains
|
||||
about the very long string otherwise. */
|
||||
|
||||
fprintf(f,
|
||||
"/*************************************************\n"
|
||||
"* Perl-Compatible Regular Expressions *\n"
|
||||
"*************************************************/\n\n"
|
||||
"/* This file is automatically written by the dftables auxiliary \n"
|
||||
"program. If you edit it by hand, you might like to edit the Makefile to \n"
|
||||
"prevent its ever being regenerated.\n\n");
|
||||
fprintf(f,
|
||||
"This file contains the default tables for characters with codes less than\n"
|
||||
"128 (ASCII characters). These tables are used when no external tables are\n"
|
||||
"passed to PCRE. */\n\n"
|
||||
"const unsigned char _pcre_default_tables[] = {\n\n"
|
||||
"/* This table is a lower casing table. */\n\n");
|
||||
|
||||
fprintf(f, " ");
|
||||
for (i = 0; i < 256; i++)
|
||||
{
|
||||
if ((i & 7) == 0 && i != 0) fprintf(f, "\n ");
|
||||
fprintf(f, "%3d", *tables++);
|
||||
if (i != 255) fprintf(f, ",");
|
||||
}
|
||||
fprintf(f, ",\n\n");
|
||||
|
||||
fprintf(f, "/* This table is a case flipping table. */\n\n");
|
||||
|
||||
fprintf(f, " ");
|
||||
for (i = 0; i < 256; i++)
|
||||
{
|
||||
if ((i & 7) == 0 && i != 0) fprintf(f, "\n ");
|
||||
fprintf(f, "%3d", *tables++);
|
||||
if (i != 255) fprintf(f, ",");
|
||||
}
|
||||
fprintf(f, ",\n\n");
|
||||
|
||||
fprintf(f,
|
||||
"/* This table contains bit maps for various character classes.\n"
|
||||
"Each map is 32 bytes long and the bits run from the least\n"
|
||||
"significant end of each byte. The classes that have their own\n"
|
||||
"maps are: space, xdigit, digit, upper, lower, word, graph\n"
|
||||
"print, punct, and cntrl. Other classes are built from combinations. */\n\n");
|
||||
|
||||
fprintf(f, " ");
|
||||
for (i = 0; i < cbit_length; i++)
|
||||
{
|
||||
if ((i & 7) == 0 && i != 0)
|
||||
{
|
||||
if ((i & 31) == 0) fprintf(f, "\n");
|
||||
fprintf(f, "\n ");
|
||||
}
|
||||
fprintf(f, "0x%02x", *tables++);
|
||||
if (i != cbit_length - 1) fprintf(f, ",");
|
||||
}
|
||||
fprintf(f, ",\n\n");
|
||||
|
||||
fprintf(f,
|
||||
"/* This table identifies various classes of character by individual bits:\n"
|
||||
" 0x%02x white space character\n"
|
||||
" 0x%02x letter\n"
|
||||
" 0x%02x decimal digit\n"
|
||||
" 0x%02x hexadecimal digit\n"
|
||||
" 0x%02x alphanumeric or '_'\n"
|
||||
" 0x%02x regular expression metacharacter or binary zero\n*/\n\n",
|
||||
ctype_space, ctype_letter, ctype_digit, ctype_xdigit, ctype_word,
|
||||
ctype_meta);
|
||||
|
||||
fprintf(f, " ");
|
||||
for (i = 0; i < 256; i++)
|
||||
{
|
||||
if ((i & 7) == 0 && i != 0)
|
||||
{
|
||||
fprintf(f, " /* ");
|
||||
if (isprint(i-8)) fprintf(f, " %c -", i-8);
|
||||
else fprintf(f, "%3d-", i-8);
|
||||
if (isprint(i-1)) fprintf(f, " %c ", i-1);
|
||||
else fprintf(f, "%3d", i-1);
|
||||
fprintf(f, " */\n ");
|
||||
}
|
||||
fprintf(f, "0x%02x", *tables++);
|
||||
if (i != 255) fprintf(f, ",");
|
||||
}
|
||||
|
||||
fprintf(f, "};/* ");
|
||||
if (isprint(i-8)) fprintf(f, " %c -", i-8);
|
||||
else fprintf(f, "%3d-", i-8);
|
||||
if (isprint(i-1)) fprintf(f, " %c ", i-1);
|
||||
else fprintf(f, "%3d", i-1);
|
||||
fprintf(f, " */\n\n/* End of chartables.c */\n");
|
||||
|
||||
fclose(f);
|
||||
free((void *)base_of_tables);
|
||||
return 0;
|
||||
}
|
||||
|
||||
/* End of dftables.c */
|
348
libs/pcre/doc/Tech.Notes
Normal file
348
libs/pcre/doc/Tech.Notes
Normal file
@ -0,0 +1,348 @@
|
||||
Technical Notes about PCRE
|
||||
--------------------------
|
||||
|
||||
These are very rough technical notes that record potentially useful information
|
||||
about PCRE internals.
|
||||
|
||||
Historical note 1
|
||||
-----------------
|
||||
|
||||
Many years ago I implemented some regular expression functions to an algorithm
|
||||
suggested by Martin Richards. These were not Unix-like in form, and were quite
|
||||
restricted in what they could do by comparison with Perl. The interesting part
|
||||
about the algorithm was that the amount of space required to hold the compiled
|
||||
form of an expression was known in advance. The code to apply an expression did
|
||||
not operate by backtracking, as the original Henry Spencer code and current
|
||||
Perl code does, but instead checked all possibilities simultaneously by keeping
|
||||
a list of current states and checking all of them as it advanced through the
|
||||
subject string. In the terminology of Jeffrey Friedl's book, it was a "DFA
|
||||
algorithm". When the pattern was all used up, all remaining states were
|
||||
possible matches, and the one matching the longest subset of the subject string
|
||||
was chosen. This did not necessarily maximize the individual wild portions of
|
||||
the pattern, as is expected in Unix and Perl-style regular expressions.
|
||||
|
||||
Historical note 2
|
||||
-----------------
|
||||
|
||||
By contrast, the code originally written by Henry Spencer (which was
|
||||
subsequently heavily modified for Perl) compiles the expression twice: once in
|
||||
a dummy mode in order to find out how much store will be needed, and then for
|
||||
real. (The Perl version probably doesn't do this any more; I'm talking about
|
||||
the original library.) The execution function operates by backtracking and
|
||||
maximizing (or, optionally, minimizing in Perl) the amount of the subject that
|
||||
matches individual wild portions of the pattern. This is an "NFA algorithm" in
|
||||
Friedl's terminology.
|
||||
|
||||
OK, here's the real stuff
|
||||
-------------------------
|
||||
|
||||
For the set of functions that form the "basic" PCRE library (which are
|
||||
unrelated to those mentioned above), I tried at first to invent an algorithm
|
||||
that used an amount of store bounded by a multiple of the number of characters
|
||||
in the pattern, to save on compiling time. However, because of the greater
|
||||
complexity in Perl regular expressions, I couldn't do this. In any case, a
|
||||
first pass through the pattern is needed, for a number of reasons. PCRE works
|
||||
by running a very degenerate first pass to calculate a maximum store size, and
|
||||
then a second pass to do the real compile - which may use a bit less than the
|
||||
predicted amount of store. The idea is that this is going to turn out faster
|
||||
because the first pass is degenerate and the second pass can just store stuff
|
||||
straight into the vector, which it knows is big enough. It does make the
|
||||
compiling functions bigger, of course, but they have become quite big anyway to
|
||||
handle all the Perl stuff.
|
||||
|
||||
Traditional matching function
|
||||
-----------------------------
|
||||
|
||||
The "traditional", and original, matching function is called pcre_exec(), and
|
||||
it implements an NFA algorithm, similar to the original Henry Spencer algorithm
|
||||
and the way that Perl works. Not surprising, since it is intended to be as
|
||||
compatible with Perl as possible. This is the function most users of PCRE will
|
||||
use most of the time.
|
||||
|
||||
Supplementary matching function
|
||||
-------------------------------
|
||||
|
||||
From PCRE 6.0, there is also a supplementary matching function called
|
||||
pcre_dfa_exec(). This implements a DFA matching algorithm that searches
|
||||
simultaneously for all possible matches that start at one point in the subject
|
||||
string. (Going back to my roots: see Historical Note 1 above.) This function
|
||||
intreprets the same compiled pattern data as pcre_exec(); however, not all the
|
||||
facilities are available, and those that are do not always work in quite the
|
||||
same way. See the user documentation for details.
|
||||
|
||||
Format of compiled patterns
|
||||
---------------------------
|
||||
|
||||
The compiled form of a pattern is a vector of bytes, containing items of
|
||||
variable length. The first byte in an item is an opcode, and the length of the
|
||||
item is either implicit in the opcode or contained in the data bytes that
|
||||
follow it.
|
||||
|
||||
In many cases below "two-byte" data values are specified. This is in fact just
|
||||
a default. PCRE can be compiled to use 3-byte or 4-byte values (impairing the
|
||||
performance). This is necessary only when patterns whose compiled length is
|
||||
greater than 64K are going to be processed. In this description, we assume the
|
||||
"normal" compilation options.
|
||||
|
||||
A list of all the opcodes follows:
|
||||
|
||||
Opcodes with no following data
|
||||
------------------------------
|
||||
|
||||
These items are all just one byte long
|
||||
|
||||
OP_END end of pattern
|
||||
OP_ANY match any character
|
||||
OP_ANYBYTE match any single byte, even in UTF-8 mode
|
||||
OP_SOD match start of data: \A
|
||||
OP_SOM, start of match (subject + offset): \G
|
||||
OP_CIRC ^ (start of data, or after \n in multiline)
|
||||
OP_NOT_WORD_BOUNDARY \W
|
||||
OP_WORD_BOUNDARY \w
|
||||
OP_NOT_DIGIT \D
|
||||
OP_DIGIT \d
|
||||
OP_NOT_WHITESPACE \S
|
||||
OP_WHITESPACE \s
|
||||
OP_NOT_WORDCHAR \W
|
||||
OP_WORDCHAR \w
|
||||
OP_EODN match end of data or \n at end: \Z
|
||||
OP_EOD match end of data: \z
|
||||
OP_DOLL $ (end of data, or before \n in multiline)
|
||||
OP_EXTUNI match an extended Unicode character
|
||||
|
||||
|
||||
Repeating single characters
|
||||
---------------------------
|
||||
|
||||
The common repeats (*, +, ?) when applied to a single character use the
|
||||
following opcodes:
|
||||
|
||||
OP_STAR
|
||||
OP_MINSTAR
|
||||
OP_PLUS
|
||||
OP_MINPLUS
|
||||
OP_QUERY
|
||||
OP_MINQUERY
|
||||
|
||||
In ASCII mode, these are two-byte items; in UTF-8 mode, the length is variable.
|
||||
Those with "MIN" in their name are the minimizing versions. Each is followed by
|
||||
the character that is to be repeated. Other repeats make use of
|
||||
|
||||
OP_UPTO
|
||||
OP_MINUPTO
|
||||
OP_EXACT
|
||||
|
||||
which are followed by a two-byte count (most significant first) and the
|
||||
repeated character. OP_UPTO matches from 0 to the given number. A repeat with a
|
||||
non-zero minimum and a fixed maximum is coded as an OP_EXACT followed by an
|
||||
OP_UPTO (or OP_MINUPTO).
|
||||
|
||||
|
||||
Repeating character types
|
||||
-------------------------
|
||||
|
||||
Repeats of things like \d are done exactly as for single characters, except
|
||||
that instead of a character, the opcode for the type is stored in the data
|
||||
byte. The opcodes are:
|
||||
|
||||
OP_TYPESTAR
|
||||
OP_TYPEMINSTAR
|
||||
OP_TYPEPLUS
|
||||
OP_TYPEMINPLUS
|
||||
OP_TYPEQUERY
|
||||
OP_TYPEMINQUERY
|
||||
OP_TYPEUPTO
|
||||
OP_TYPEMINUPTO
|
||||
OP_TYPEEXACT
|
||||
|
||||
|
||||
Match by Unicode property
|
||||
-------------------------
|
||||
|
||||
OP_PROP and OP_NOTPROP are used for positive and negative matches of a
|
||||
character by testing its Unicode property (the \p and \P escape sequences).
|
||||
Each is followed by two bytes that encode the desired property as a type and a
|
||||
value.
|
||||
|
||||
Repeats of these items use the OP_TYPESTAR etc. set of opcodes, followed by
|
||||
three bytes: OP_PROP or OP_NOTPROP and then the desired property type and
|
||||
value.
|
||||
|
||||
|
||||
Matching literal characters
|
||||
---------------------------
|
||||
|
||||
The OP_CHAR opcode is followed by a single character that is to be matched
|
||||
casefully. For caseless matching, OP_CHARNC is used. In UTF-8 mode, the
|
||||
character may be more than one byte long. (Earlier versions of PCRE used
|
||||
multi-character strings, but this was changed to allow some new features to be
|
||||
added.)
|
||||
|
||||
|
||||
Character classes
|
||||
-----------------
|
||||
|
||||
If there is only one character, OP_CHAR or OP_CHARNC is used for a positive
|
||||
class, and OP_NOT for a negative one (that is, for something like [^a]).
|
||||
However, in UTF-8 mode, the use of OP_NOT applies only to characters with
|
||||
values < 128, because OP_NOT is confined to single bytes.
|
||||
|
||||
Another set of repeating opcodes (OP_NOTSTAR etc.) are used for a repeated,
|
||||
negated, single-character class. The normal ones (OP_STAR etc.) are used for a
|
||||
repeated positive single-character class.
|
||||
|
||||
When there's more than one character in a class and all the characters are less
|
||||
than 256, OP_CLASS is used for a positive class, and OP_NCLASS for a negative
|
||||
one. In either case, the opcode is followed by a 32-byte bit map containing a 1
|
||||
bit for every character that is acceptable. The bits are counted from the least
|
||||
significant end of each byte.
|
||||
|
||||
The reason for having both OP_CLASS and OP_NCLASS is so that, in UTF-8 mode,
|
||||
subject characters with values greater than 256 can be handled correctly. For
|
||||
OP_CLASS they don't match, whereas for OP_NCLASS they do.
|
||||
|
||||
For classes containing characters with values > 255, OP_XCLASS is used. It
|
||||
optionally uses a bit map (if any characters lie within it), followed by a list
|
||||
of pairs and single characters. There is a flag character than indicates
|
||||
whether it's a positive or a negative class.
|
||||
|
||||
|
||||
Back references
|
||||
---------------
|
||||
|
||||
OP_REF is followed by two bytes containing the reference number.
|
||||
|
||||
|
||||
Repeating character classes and back references
|
||||
-----------------------------------------------
|
||||
|
||||
Single-character classes are handled specially (see above). This applies to
|
||||
OP_CLASS and OP_REF. In both cases, the repeat information follows the base
|
||||
item. The matching code looks at the following opcode to see if it is one of
|
||||
|
||||
OP_CRSTAR
|
||||
OP_CRMINSTAR
|
||||
OP_CRPLUS
|
||||
OP_CRMINPLUS
|
||||
OP_CRQUERY
|
||||
OP_CRMINQUERY
|
||||
OP_CRRANGE
|
||||
OP_CRMINRANGE
|
||||
|
||||
All but the last two are just single-byte items. The others are followed by
|
||||
four bytes of data, comprising the minimum and maximum repeat counts.
|
||||
|
||||
|
||||
Brackets and alternation
|
||||
------------------------
|
||||
|
||||
A pair of non-capturing (round) brackets is wrapped round each expression at
|
||||
compile time, so alternation always happens in the context of brackets.
|
||||
|
||||
Non-capturing brackets use the opcode OP_BRA, while capturing brackets use
|
||||
OP_BRA+1, OP_BRA+2, etc. [Note for North Americans: "bracket" to some English
|
||||
speakers, including myself, can be round, square, curly, or pointy. Hence this
|
||||
usage.]
|
||||
|
||||
Originally PCRE was limited to 99 capturing brackets (so as not to use up all
|
||||
the opcodes). From release 3.5, there is no limit. What happens is that the
|
||||
first ones, up to EXTRACT_BASIC_MAX are handled with separate opcodes, as
|
||||
above. If there are more, the opcode is set to EXTRACT_BASIC_MAX+1, and the
|
||||
first operation in the bracket is OP_BRANUMBER, followed by a 2-byte bracket
|
||||
number. This opcode is ignored while matching, but is fished out when handling
|
||||
the bracket itself. (They could have all been done like this, but I was making
|
||||
minimal changes.)
|
||||
|
||||
A bracket opcode is followed by LINK_SIZE bytes which give the offset to the
|
||||
next alternative OP_ALT or, if there aren't any branches, to the matching
|
||||
OP_KET opcode. Each OP_ALT is followed by LINK_SIZE bytes giving the offset to
|
||||
the next one, or to the OP_KET opcode.
|
||||
|
||||
OP_KET is used for subpatterns that do not repeat indefinitely, while
|
||||
OP_KETRMIN and OP_KETRMAX are used for indefinite repetitions, minimally or
|
||||
maximally respectively. All three are followed by LINK_SIZE bytes giving (as a
|
||||
positive number) the offset back to the matching OP_BRA opcode.
|
||||
|
||||
If a subpattern is quantified such that it is permitted to match zero times, it
|
||||
is preceded by one of OP_BRAZERO or OP_BRAMINZERO. These are single-byte
|
||||
opcodes which tell the matcher that skipping this subpattern entirely is a
|
||||
valid branch.
|
||||
|
||||
A subpattern with an indefinite maximum repetition is replicated in the
|
||||
compiled data its minimum number of times (or once with OP_BRAZERO if the
|
||||
minimum is zero), with the final copy terminating with OP_KETRMIN or OP_KETRMAX
|
||||
as appropriate.
|
||||
|
||||
A subpattern with a bounded maximum repetition is replicated in a nested
|
||||
fashion up to the maximum number of times, with OP_BRAZERO or OP_BRAMINZERO
|
||||
before each replication after the minimum, so that, for example, (abc){2,5} is
|
||||
compiled as (abc)(abc)((abc)((abc)(abc)?)?)?.
|
||||
|
||||
|
||||
Assertions
|
||||
----------
|
||||
|
||||
Forward assertions are just like other subpatterns, but starting with one of
|
||||
the opcodes OP_ASSERT or OP_ASSERT_NOT. Backward assertions use the opcodes
|
||||
OP_ASSERTBACK and OP_ASSERTBACK_NOT, and the first opcode inside the assertion
|
||||
is OP_REVERSE, followed by a two byte count of the number of characters to move
|
||||
back the pointer in the subject string. When operating in UTF-8 mode, the count
|
||||
is a character count rather than a byte count. A separate count is present in
|
||||
each alternative of a lookbehind assertion, allowing them to have different
|
||||
fixed lengths.
|
||||
|
||||
|
||||
Once-only subpatterns
|
||||
---------------------
|
||||
|
||||
These are also just like other subpatterns, but they start with the opcode
|
||||
OP_ONCE.
|
||||
|
||||
|
||||
Conditional subpatterns
|
||||
-----------------------
|
||||
|
||||
These are like other subpatterns, but they start with the opcode OP_COND. If
|
||||
the condition is a back reference, this is stored at the start of the
|
||||
subpattern using the opcode OP_CREF followed by two bytes containing the
|
||||
reference number. If the condition is "in recursion" (coded as "(?(R)"), the
|
||||
same scheme is used, with a "reference number" of 0xffff. Otherwise, a
|
||||
conditional subpattern always starts with one of the assertions.
|
||||
|
||||
|
||||
Recursion
|
||||
---------
|
||||
|
||||
Recursion either matches the current regex, or some subexpression. The opcode
|
||||
OP_RECURSE is followed by an value which is the offset to the starting bracket
|
||||
from the start of the whole pattern. From release 6.5, OP_RECURSE is
|
||||
automatically wrapped inside OP_ONCE brackets (because otherwise some patterns
|
||||
broke it). OP_RECURSE is also used for "subroutine" calls, even though they
|
||||
are not strictly a recursion.
|
||||
|
||||
|
||||
Callout
|
||||
-------
|
||||
|
||||
OP_CALLOUT is followed by one byte of data that holds a callout number in the
|
||||
range 0 to 254 for manual callouts, or 255 for an automatic callout. In both
|
||||
cases there follows a two-byte value giving the offset in the pattern to the
|
||||
start of the following item, and another two-byte item giving the length of the
|
||||
next item.
|
||||
|
||||
|
||||
Changing options
|
||||
----------------
|
||||
|
||||
If any of the /i, /m, or /s options are changed within a pattern, an OP_OPT
|
||||
opcode is compiled, followed by one byte containing the new settings of these
|
||||
flags. If there are several alternatives, there is an occurrence of OP_OPT at
|
||||
the start of all those following the first options change, to set appropriate
|
||||
options for the start of the alternative. Immediately after the end of the
|
||||
group there is another such item to reset the flags to their previous values. A
|
||||
change of flag right at the very start of the pattern can be handled entirely
|
||||
at compile time, and so does not cause anything to be put into the compiled
|
||||
data.
|
||||
|
||||
Philip Hazel
|
||||
June 2006
|
128
libs/pcre/doc/html/index.html
Normal file
128
libs/pcre/doc/html/index.html
Normal file
@ -0,0 +1,128 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>PCRE specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>Perl-compatible Regular Expressions (PCRE)</h1>
|
||||
<p>
|
||||
The HTML documentation for PCRE comprises the following pages:
|
||||
</p>
|
||||
|
||||
<table>
|
||||
<tr><td><a href="pcre.html">pcre</a></td>
|
||||
<td> Introductory page</td></tr>
|
||||
|
||||
<tr><td><a href="pcreapi.html">pcreapi</a></td>
|
||||
<td> PCRE's native API</td></tr>
|
||||
|
||||
<tr><td><a href="pcrebuild.html">pcrebuild</a></td>
|
||||
<td> Options for building PCRE</td></tr>
|
||||
|
||||
<tr><td><a href="pcrecallout.html">pcrecallout</a></td>
|
||||
<td> The <i>callout</i> facility</td></tr>
|
||||
|
||||
<tr><td><a href="pcrecompat.html">pcrecompat</a></td>
|
||||
<td> Compability with Perl</td></tr>
|
||||
|
||||
<tr><td><a href="pcrecpp.html">pcrecpp</a></td>
|
||||
<td> The C++ wrapper for the PCRE library</td></tr>
|
||||
|
||||
<tr><td><a href="pcregrep.html">pcregrep</a></td>
|
||||
<td> The <b>pcregrep</b> command</td></tr>
|
||||
|
||||
<tr><td><a href="pcrematching.html">pcrematching</a></td>
|
||||
<td> Discussion of the two matching algorithms</td></tr>
|
||||
|
||||
<tr><td><a href="pcrepartial.html">pcrepartial</a></td>
|
||||
<td> Using PCRE for partial matching</td></tr>
|
||||
|
||||
<tr><td><a href="pcrepattern.html">pcrepattern</a></td>
|
||||
<td> Specification of the regular expressions supported by PCRE</td></tr>
|
||||
|
||||
<tr><td><a href="pcreperform.html">pcreperform</a></td>
|
||||
<td> Some comments on performance</td></tr>
|
||||
|
||||
<tr><td><a href="pcreposix.html">pcreposix</a></td>
|
||||
<td> The POSIX API to the PCRE library</td></tr>
|
||||
|
||||
<tr><td><a href="pcreprecompile.html">pcreprecompile</a></td>
|
||||
<td> How to save and re-use compiled patterns</td></tr>
|
||||
|
||||
<tr><td><a href="pcresample.html">pcresample</a></td>
|
||||
<td> Description of the sample program</td></tr>
|
||||
|
||||
<tr><td><a href="pcrestack.html">pcrestack</a></td>
|
||||
<td> Discussion of PCRE's stack usage</td></tr>
|
||||
|
||||
<tr><td><a href="pcretest.html">pcretest</a></td>
|
||||
<td> The <b>pcretest</b> command for testing PCRE</td></tr>
|
||||
</table>
|
||||
|
||||
<p>
|
||||
There are also individual pages that summarize the interface for each function
|
||||
in the library:
|
||||
</p>
|
||||
|
||||
<table>
|
||||
|
||||
<tr><td><a href="pcre_compile.html">pcre_compile</a></td>
|
||||
<td> Compile a regular expression</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_compile2.html">pcre_compile2</a></td>
|
||||
<td> Compile a regular expression (alternate interface)</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_config.html">pcre_config</a></td>
|
||||
<td> Show build-time configuration options</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_copy_named_substring.html">pcre_copy_named_substring</a></td>
|
||||
<td> Extract named substring into given buffer</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_copy_substring.html">pcre_copy_substring</a></td>
|
||||
<td> Extract numbered substring into given buffer</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_dfa_exec.html">pcre_dfa_exec</a></td>
|
||||
<td> Match a compiled pattern to a subject string
|
||||
(DFA algorithm; <i>not</i> Perl compatible)</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_exec.html">pcre_exec</a></td>
|
||||
<td> Match a compiled pattern to a subject string
|
||||
(Perl compatible)</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_free_substring.html">pcre_free_substring</a></td>
|
||||
<td> Free extracted substring</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_free_substring_list.html">pcre_free_substring_list</a></td>
|
||||
<td> Free list of extracted substrings</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_fullinfo.html">pcre_fullinfo</a></td>
|
||||
<td> Extract information about a pattern</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_named_substring.html">pcre_get_named_substring</a></td>
|
||||
<td> Extract named substring into new memory</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_stringnumber.html">pcre_get_stringnumber</a></td>
|
||||
<td> Convert captured string name to number</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_substring.html">pcre_get_substring</a></td>
|
||||
<td> Extract numbered substring into new memory</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_get_substring_list.html">pcre_get_substring_list</a></td>
|
||||
<td> Extract all substrings into new memory</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_info.html">pcre_info</a></td>
|
||||
<td> Obsolete information extraction function</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_maketables.html">pcre_maketables</a></td>
|
||||
<td> Build character tables in current locale</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_refcount.html">pcre_refcount</a></td>
|
||||
<td> Maintain reference count in compiled pattern</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_study.html">pcre_study</a></td>
|
||||
<td> Study a compiled pattern</td></tr>
|
||||
|
||||
<tr><td><a href="pcre_version.html">pcre_version</a></td>
|
||||
<td> Return PCRE version and release date</td></tr>
|
||||
</table>
|
||||
|
||||
</html>
|
252
libs/pcre/doc/html/pcre.html
Normal file
252
libs/pcre/doc/html/pcre.html
Normal file
@ -0,0 +1,252 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">INTRODUCTION</a>
|
||||
<li><a name="TOC2" href="#SEC2">USER DOCUMENTATION</a>
|
||||
<li><a name="TOC3" href="#SEC3">LIMITATIONS</a>
|
||||
<li><a name="TOC4" href="#SEC4">UTF-8 AND UNICODE PROPERTY SUPPORT</a>
|
||||
<li><a name="TOC5" href="#SEC5">AUTHOR</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">INTRODUCTION</a><br>
|
||||
<P>
|
||||
The PCRE library is a set of functions that implement regular expression
|
||||
pattern matching using the same syntax and semantics as Perl, with just a few
|
||||
differences. The current implementation of PCRE (release 6.x) corresponds
|
||||
approximately with Perl 5.8, including support for UTF-8 encoded strings and
|
||||
Unicode general category properties. However, this support has to be explicitly
|
||||
enabled; it is not the default.
|
||||
</P>
|
||||
<P>
|
||||
In addition to the Perl-compatible matching function, PCRE also contains an
|
||||
alternative matching function that matches the same compiled patterns in a
|
||||
different way. In certain circumstances, the alternative function has some
|
||||
advantages. For a discussion of the two matching algorithms, see the
|
||||
<a href="pcrematching.html"><b>pcrematching</b></a>
|
||||
page.
|
||||
</P>
|
||||
<P>
|
||||
PCRE is written in C and released as a C library. A number of people have
|
||||
written wrappers and interfaces of various kinds. In particular, Google Inc.
|
||||
have provided a comprehensive C++ wrapper. This is now included as part of the
|
||||
PCRE distribution. The
|
||||
<a href="pcrecpp.html"><b>pcrecpp</b></a>
|
||||
page has details of this interface. Other people's contributions can be found
|
||||
in the <i>Contrib</i> directory at the primary FTP site, which is:
|
||||
<a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre</a>
|
||||
</P>
|
||||
<P>
|
||||
Details of exactly which Perl regular expression features are and are not
|
||||
supported by PCRE are given in separate documents. See the
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b></a>
|
||||
and
|
||||
<a href="pcrecompat.html"><b>pcrecompat</b></a>
|
||||
pages.
|
||||
</P>
|
||||
<P>
|
||||
Some features of PCRE can be included, excluded, or changed when the library is
|
||||
built. The
|
||||
<a href="pcre_config.html"><b>pcre_config()</b></a>
|
||||
function makes it possible for a client to discover which features are
|
||||
available. The features themselves are described in the
|
||||
<a href="pcrebuild.html"><b>pcrebuild</b></a>
|
||||
page. Documentation about building PCRE for various operating systems can be
|
||||
found in the <b>README</b> file in the source distribution.
|
||||
</P>
|
||||
<P>
|
||||
The library contains a number of undocumented internal functions and data
|
||||
tables that are used by more than one of the exported external functions, but
|
||||
which are not intended for use by external callers. Their names all begin with
|
||||
"_pcre_", which hopefully will not provoke any name clashes. In some
|
||||
environments, it is possible to control which external symbols are exported
|
||||
when a shared library is built, and in these cases the undocumented symbols are
|
||||
not exported.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">USER DOCUMENTATION</a><br>
|
||||
<P>
|
||||
The user documentation for PCRE comprises a number of different sections. In
|
||||
the "man" format, each of these is a separate "man page". In the HTML format,
|
||||
each is a separate page, linked from the index page. In the plain text format,
|
||||
all the sections are concatenated, for ease of searching. The sections are as
|
||||
follows:
|
||||
<pre>
|
||||
pcre this document
|
||||
pcreapi details of PCRE's native C API
|
||||
pcrebuild options for building PCRE
|
||||
pcrecallout details of the callout feature
|
||||
pcrecompat discussion of Perl compatibility
|
||||
pcrecpp details of the C++ wrapper
|
||||
pcregrep description of the <b>pcregrep</b> command
|
||||
pcrematching discussion of the two matching algorithms
|
||||
pcrepartial details of the partial matching facility
|
||||
pcrepattern syntax and semantics of supported regular expressions
|
||||
pcreperform discussion of performance issues
|
||||
pcreposix the POSIX-compatible C API
|
||||
pcreprecompile details of saving and re-using precompiled patterns
|
||||
pcresample discussion of the sample program
|
||||
pcrestack discussion of stack usage
|
||||
pcretest description of the <b>pcretest</b> testing command
|
||||
</pre>
|
||||
In addition, in the "man" and HTML formats, there is a short page for each
|
||||
C library function, listing its arguments and results.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">LIMITATIONS</a><br>
|
||||
<P>
|
||||
There are some size limitations in PCRE but it is hoped that they will never in
|
||||
practice be relevant.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is
|
||||
compiled with the default internal linkage size of 2. If you want to process
|
||||
regular expressions that are truly enormous, you can compile PCRE with an
|
||||
internal linkage size of 3 or 4 (see the <b>README</b> file in the source
|
||||
distribution and the
|
||||
<a href="pcrebuild.html"><b>pcrebuild</b></a>
|
||||
documentation for details). In these cases the limit is substantially larger.
|
||||
However, the speed of execution will be slower.
|
||||
</P>
|
||||
<P>
|
||||
All values in repeating quantifiers must be less than 65536. The maximum
|
||||
compiled length of subpattern with an explicit repeat count is 30000 bytes. The
|
||||
maximum number of capturing subpatterns is 65535.
|
||||
</P>
|
||||
<P>
|
||||
There is no limit to the number of non-capturing subpatterns, but the maximum
|
||||
depth of nesting of all kinds of parenthesized subpattern, including capturing
|
||||
subpatterns, assertions, and other types of subpattern, is 200.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of name for a named subpattern is 32, and the maximum number
|
||||
of named subpatterns is 10000.
|
||||
</P>
|
||||
<P>
|
||||
The maximum length of a subject string is the largest positive number that an
|
||||
integer variable can hold. However, when using the traditional matching
|
||||
function, PCRE uses recursion to handle subpatterns and indefinite repetition.
|
||||
This means that the available stack space may limit the size of a subject
|
||||
string that can be processed by certain patterns. For a discussion of stack
|
||||
issues, see the
|
||||
<a href="pcrestack.html"><b>pcrestack</b></a>
|
||||
documentation.
|
||||
<a name="utf8support"></a></P>
|
||||
<br><a name="SEC4" href="#TOC1">UTF-8 AND UNICODE PROPERTY SUPPORT</a><br>
|
||||
<P>
|
||||
From release 3.3, PCRE has had some support for character strings encoded in
|
||||
the UTF-8 format. For release 4.0 this was greatly extended to cover most
|
||||
common requirements, and in release 5.0 additional support for Unicode general
|
||||
category properties was added.
|
||||
</P>
|
||||
<P>
|
||||
In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
|
||||
the code, and, in addition, you must call
|
||||
<a href="pcre_compile.html"><b>pcre_compile()</b></a>
|
||||
with the PCRE_UTF8 option flag. When you do this, both the pattern and any
|
||||
subject strings that are matched against it are treated as UTF-8 strings
|
||||
instead of just strings of bytes.
|
||||
</P>
|
||||
<P>
|
||||
If you compile PCRE with UTF-8 support, but do not use it at run time, the
|
||||
library will be a bit bigger, but the additional run time overhead is limited
|
||||
to testing the PCRE_UTF8 flag in several places, so should not be very large.
|
||||
</P>
|
||||
<P>
|
||||
If PCRE is built with Unicode character property support (which implies UTF-8
|
||||
support), the escape sequences \p{..}, \P{..}, and \X are supported.
|
||||
The available properties that can be tested are limited to the general
|
||||
category properties such as Lu for an upper case letter or Nd for a decimal
|
||||
number, the Unicode script names such as Arabic or Han, and the derived
|
||||
properties Any and L&. A full list is given in the
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b></a>
|
||||
documentation. Only the short names for properties are supported. For example,
|
||||
\p{L} matches a letter. Its Perl synonym, \p{Letter}, is not supported.
|
||||
Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
|
||||
compatibility with Perl 5.6. PCRE does not support this.
|
||||
</P>
|
||||
<P>
|
||||
The following comments apply when PCRE is running in UTF-8 mode:
|
||||
</P>
|
||||
<P>
|
||||
1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
|
||||
are checked for validity on entry to the relevant functions. If an invalid
|
||||
UTF-8 string is passed, an error return is given. In some situations, you may
|
||||
already know that your strings are valid, and therefore want to skip these
|
||||
checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag
|
||||
at compile time or at run time, PCRE assumes that the pattern or subject it
|
||||
is given (respectively) contains only valid UTF-8 codes. In this case, it does
|
||||
not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to
|
||||
PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
|
||||
may crash.
|
||||
</P>
|
||||
<P>
|
||||
2. An unbraced hexadecimal escape sequence (such as \xb3) matches a two-byte
|
||||
UTF-8 character if the value is greater than 127.
|
||||
</P>
|
||||
<P>
|
||||
3. Octal numbers up to \777 are recognized, and match two-byte UTF-8
|
||||
characters for values greater than \177.
|
||||
</P>
|
||||
<P>
|
||||
4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
|
||||
bytes, for example: \x{100}{3}.
|
||||
</P>
|
||||
<P>
|
||||
5. The dot metacharacter matches one UTF-8 character instead of a single byte.
|
||||
</P>
|
||||
<P>
|
||||
6. The escape sequence \C can be used to match a single byte in UTF-8 mode,
|
||||
but its use can lead to some strange effects. This facility is not available in
|
||||
the alternative matching function, <b>pcre_dfa_exec()</b>.
|
||||
</P>
|
||||
<P>
|
||||
7. The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly
|
||||
test characters of any code value, but the characters that PCRE recognizes as
|
||||
digits, spaces, or word characters remain the same set as before, all with
|
||||
values less than 256. This remains true even when PCRE includes Unicode
|
||||
property support, because to do otherwise would slow down PCRE in many common
|
||||
cases. If you really want to test for a wider sense of, say, "digit", you
|
||||
must use Unicode property tests such as \p{Nd}.
|
||||
</P>
|
||||
<P>
|
||||
8. Similarly, characters that match the POSIX named character classes are all
|
||||
low-valued characters.
|
||||
</P>
|
||||
<P>
|
||||
9. Case-insensitive matching applies only to characters whose values are less
|
||||
than 128, unless PCRE is built with Unicode property support. Even when Unicode
|
||||
property support is available, PCRE still uses its own character tables when
|
||||
checking the case of low-valued characters, so as not to degrade performance.
|
||||
The Unicode property information is used only for characters with higher
|
||||
values. Even when Unicode property support is available, PCRE supports
|
||||
case-insensitive matching only when there is a one-to-one mapping between a
|
||||
letter's cases. There are a small number of many-to-one mappings in Unicode;
|
||||
these are not supported by PCRE.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
University Computing Service,
|
||||
<br>
|
||||
Cambridge CB2 3QG, England.
|
||||
</P>
|
||||
<P>
|
||||
Putting an actual email address here seems to have been a spam magnet, so I've
|
||||
taken it away. If you want to email me, use my initial and surname, separated
|
||||
by a dot, at the domain ucs.cam.ac.uk.
|
||||
Last updated: 05 June 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
80
libs/pcre/doc/html/pcre_compile.html
Normal file
80
libs/pcre/doc/html/pcre_compile.html
Normal file
@ -0,0 +1,80 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_compile specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_compile man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>pcre *pcre_compile(const char *<i>pattern</i>, int <i>options</i>,</b>
|
||||
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b>const unsigned char *<i>tableptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function compiles a regular expression into an internal form. Its
|
||||
arguments are:
|
||||
<pre>
|
||||
<i>pattern</i> A zero-terminated string containing the
|
||||
regular expression to be compiled
|
||||
<i>options</i> Zero or more option bits
|
||||
<i>errptr</i> Where to put an error message
|
||||
<i>erroffset</i> Offset in pattern where error was found
|
||||
<i>tableptr</i> Pointer to character tables, or NULL to
|
||||
use the built-in default
|
||||
</pre>
|
||||
The option bits are:
|
||||
<pre>
|
||||
PCRE_ANCHORED Force pattern anchoring
|
||||
PCRE_AUTO_CALLOUT Compile automatic callouts
|
||||
PCRE_CASELESS Do caseless matching
|
||||
PCRE_DOLLAR_ENDONLY $ not to match newline at end
|
||||
PCRE_DOTALL . matches anything including NL
|
||||
PCRE_DUPNAMES Allow duplicate names for subpatterns
|
||||
PCRE_EXTENDED Ignore whitespace and # comments
|
||||
PCRE_EXTRA PCRE extra features
|
||||
(not much use currently)
|
||||
PCRE_FIRSTLINE Force matching to be before newline
|
||||
PCRE_MULTILINE ^ and $ match newlines within data
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||
theses (named ones available)
|
||||
PCRE_UNGREEDY Invert greediness of quantifiers
|
||||
PCRE_UTF8 Run in UTF-8 mode
|
||||
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
|
||||
validity (only relevant if
|
||||
PCRE_UTF8 is set)
|
||||
</pre>
|
||||
PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
|
||||
PCRE_NO_UTF8_CHECK.
|
||||
</P>
|
||||
<P>
|
||||
The yield of the function is a pointer to a private data structure that
|
||||
contains the compiled pattern, or NULL if an error was detected.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
85
libs/pcre/doc/html/pcre_compile2.html
Normal file
85
libs/pcre/doc/html/pcre_compile2.html
Normal file
@ -0,0 +1,85 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_compile2 specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_compile2 man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>pcre *pcre_compile2(const char *<i>pattern</i>, int <i>options</i>,</b>
|
||||
<b>int *<i>errorcodeptr</i>,</b>
|
||||
<b>const char **<i>errptr</i>, int *<i>erroffset</i>,</b>
|
||||
<b>const unsigned char *<i>tableptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function compiles a regular expression into an internal form. It is the
|
||||
same as <b>pcre_compile()</b>, except for the addition of the <i>errorcodeptr</i>
|
||||
argument. The arguments are:
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
<i>pattern</i> A zero-terminated string containing the
|
||||
regular expression to be compiled
|
||||
<i>options</i> Zero or more option bits
|
||||
<i>errorcodeptr</i> Where to put an error code
|
||||
<i>errptr</i> Where to put an error message
|
||||
<i>erroffset</i> Offset in pattern where error was found
|
||||
<i>tableptr</i> Pointer to character tables, or NULL to
|
||||
use the built-in default
|
||||
</pre>
|
||||
The option bits are:
|
||||
<pre>
|
||||
PCRE_ANCHORED Force pattern anchoring
|
||||
PCRE_AUTO_CALLOUT Compile automatic callouts
|
||||
PCRE_CASELESS Do caseless matching
|
||||
PCRE_DOLLAR_ENDONLY $ not to match newline at end
|
||||
PCRE_DOTALL . matches anything including NL
|
||||
PCRE_DUPNAMES Allow duplicate names for subpatterns
|
||||
PCRE_EXTENDED Ignore whitespace and # comments
|
||||
PCRE_EXTRA PCRE extra features
|
||||
(not much use currently)
|
||||
PCRE_FIRSTLINE Force matching to be before newline
|
||||
PCRE_MULTILINE ^ and $ match newlines within data
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||
theses (named ones available)
|
||||
PCRE_UNGREEDY Invert greediness of quantifiers
|
||||
PCRE_UTF8 Run in UTF-8 mode
|
||||
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
|
||||
validity (only relevant if
|
||||
PCRE_UTF8 is set)
|
||||
</pre>
|
||||
PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
|
||||
PCRE_NO_UTF8_CHECK.
|
||||
</P>
|
||||
<P>
|
||||
The yield of the function is a pointer to a private data structure that
|
||||
contains the compiled pattern, or NULL if an error was detected.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
62
libs/pcre/doc/html/pcre_config.html
Normal file
62
libs/pcre/doc/html/pcre_config.html
Normal file
@ -0,0 +1,62 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_config specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_config man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_config(int <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function makes it possible for a client program to find out which optional
|
||||
features are available in the version of the PCRE library it is using. Its
|
||||
arguments are as follows:
|
||||
<pre>
|
||||
<i>what</i> A code specifying what information is required
|
||||
<i>where</i> Points to where to put the data
|
||||
</pre>
|
||||
The available codes are:
|
||||
<pre>
|
||||
PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
|
||||
PCRE_CONFIG_MATCH_LIMIT Internal resource limit
|
||||
PCRE_CONFIG_MATCH_LIMIT_RECURSION
|
||||
Internal recursion depth limit
|
||||
PCRE_CONFIG_NEWLINE Value of the newline sequence
|
||||
PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
|
||||
Threshold of return slots, above
|
||||
which <b>malloc()</b> is used by
|
||||
the POSIX API
|
||||
PCRE_CONFIG_STACKRECURSE Recursion implementation (1=stack 0=heap)
|
||||
PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no)
|
||||
PCRE_CONFIG_UNICODE_PROPERTIES
|
||||
Availability of Unicode property support
|
||||
(1=yes 0=no)
|
||||
</pre>
|
||||
The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
53
libs/pcre/doc/html/pcre_copy_named_substring.html
Normal file
53
libs/pcre/doc/html/pcre_copy_named_substring.html
Normal file
@ -0,0 +1,53 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_copy_named_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_copy_named_substring man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_copy_named_substring(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
|
||||
<b>char *<i>buffer</i>, int <i>buffersize</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a captured substring, identified
|
||||
by name, into a given buffer. The arguments are:
|
||||
<pre>
|
||||
<i>code</i> Pattern that was successfully matched
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
|
||||
<i>stringname</i> Name of the required substring
|
||||
<i>buffer</i> Buffer to receive the string
|
||||
<i>buffersize</i> Size of buffer
|
||||
</pre>
|
||||
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if the buffer was
|
||||
too small, or PCRE_ERROR_NOSUBSTRING if the string name is invalid.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
51
libs/pcre/doc/html/pcre_copy_substring.html
Normal file
51
libs/pcre/doc/html/pcre_copy_substring.html
Normal file
@ -0,0 +1,51 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_copy_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_copy_substring man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_copy_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, int <i>stringnumber</i>, char *<i>buffer</i>,</b>
|
||||
<b>int <i>buffersize</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a captured substring into a given
|
||||
buffer. The arguments are:
|
||||
<pre>
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
|
||||
<i>stringnumber</i> Number of the required substring
|
||||
<i>buffer</i> Buffer to receive the string
|
||||
<i>buffersize</i> Size of buffer
|
||||
</pre>
|
||||
The yield is the legnth of the string, PCRE_ERROR_NOMEMORY if the buffer was
|
||||
too small, or PCRE_ERROR_NOSUBSTRING if the string number is invalid.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
93
libs/pcre/doc/html/pcre_dfa_exec.html
Normal file
93
libs/pcre/doc/html/pcre_dfa_exec.html
Normal file
@ -0,0 +1,93 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_dfa_exec specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_dfa_exec man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_dfa_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
|
||||
<b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>,</b>
|
||||
<b>int *<i>workspace</i>, int <i>wscount</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function matches a compiled regular expression against a given subject
|
||||
string, using a DFA matching algorithm (<i>not</i> Perl-compatible). Note that
|
||||
the main, Perl-compatible, matching function is <b>pcre_exec()</b>. The
|
||||
arguments for this function are:
|
||||
<pre>
|
||||
<i>code</i> Points to the compiled pattern
|
||||
<i>extra</i> Points to an associated <b>pcre_extra</b> structure,
|
||||
or is NULL
|
||||
<i>subject</i> Points to the subject string
|
||||
<i>length</i> Length of the subject string, in bytes
|
||||
<i>startoffset</i> Offset in bytes in the subject at which to
|
||||
start matching
|
||||
<i>options</i> Option bits
|
||||
<i>ovector</i> Points to a vector of ints for result offsets
|
||||
<i>ovecsize</i> Number of elements in the vector
|
||||
<i>workspace</i> Points to a vector of ints used as working space
|
||||
<i>wscount</i> Number of elements in the vector
|
||||
</pre>
|
||||
The options are:
|
||||
<pre>
|
||||
PCRE_ANCHORED Match only at the first position
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NOTBOL Subject is not the beginning of a line
|
||||
PCRE_NOTEOL Subject is not the end of a line
|
||||
PCRE_NOTEMPTY An empty string is not a valid match
|
||||
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
|
||||
validity (only relevant if PCRE_UTF8
|
||||
was set at compile time)
|
||||
PCRE_PARTIAL Return PCRE_ERROR_PARTIAL for a partial match
|
||||
PCRE_DFA_SHORTEST Return only the shortest match
|
||||
PCRE_DFA_RESTART This is a restart after a partial match
|
||||
</pre>
|
||||
There are restrictions on what may appear in a pattern when matching using the
|
||||
DFA algorithm is requested. Details are given in the
|
||||
<a href="pcrematching.html"><b>pcrematching</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
A <b>pcre_extra</b> structure contains the following fields:
|
||||
<pre>
|
||||
<i>flags</i> Bits indicating which fields are set
|
||||
<i>study_data</i> Opaque data from <b>pcre_study()</b>
|
||||
<i>match_limit</i> Limit on internal resource use
|
||||
<i>match_limit_recursion</i> Limit on internal recursion depth
|
||||
<i>callout_data</i> Opaque data passed back to callouts
|
||||
<i>tables</i> Points to character tables or is NULL
|
||||
</pre>
|
||||
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
|
||||
PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
|
||||
PCRE_EXTRA_TABLES. For DFA matching, the <i>match_limit</i> and
|
||||
<i>match_limit_recursion</i> fields are not used, and must not be set.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
84
libs/pcre/doc/html/pcre_exec.html
Normal file
84
libs/pcre/doc/html/pcre_exec.html
Normal file
@ -0,0 +1,84 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_exec specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_exec man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_exec(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
|
||||
<b>const char *<i>subject</i>, int <i>length</i>, int <i>startoffset</i>,</b>
|
||||
<b>int <i>options</i>, int *<i>ovector</i>, int <i>ovecsize</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function matches a compiled regular expression against a given subject
|
||||
string, using a matching algorithm that is similar to Perl's. It returns
|
||||
offsets to captured substrings. Its arguments are:
|
||||
<pre>
|
||||
<i>code</i> Points to the compiled pattern
|
||||
<i>extra</i> Points to an associated <b>pcre_extra</b> structure,
|
||||
or is NULL
|
||||
<i>subject</i> Points to the subject string
|
||||
<i>length</i> Length of the subject string, in bytes
|
||||
<i>startoffset</i> Offset in bytes in the subject at which to
|
||||
start matching
|
||||
<i>options</i> Option bits
|
||||
<i>ovector</i> Points to a vector of ints for result offsets
|
||||
<i>ovecsize</i> Number of elements in the vector (a multiple of 3)
|
||||
</pre>
|
||||
The options are:
|
||||
<pre>
|
||||
PCRE_ANCHORED Match only at the first position
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NOTBOL Subject is not the beginning of a line
|
||||
PCRE_NOTEOL Subject is not the end of a line
|
||||
PCRE_NOTEMPTY An empty string is not a valid match
|
||||
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
|
||||
validity (only relevant if PCRE_UTF8
|
||||
was set at compile time)
|
||||
PCRE_PARTIAL Return PCRE_ERROR_PARTIAL for a partial match
|
||||
</pre>
|
||||
There are restrictions on what may appear in a pattern when partial matching is
|
||||
requested.
|
||||
</P>
|
||||
<P>
|
||||
A <b>pcre_extra</b> structure contains the following fields:
|
||||
<pre>
|
||||
<i>flags</i> Bits indicating which fields are set
|
||||
<i>study_data</i> Opaque data from <b>pcre_study()</b>
|
||||
<i>match_limit</i> Limit on internal resource use
|
||||
<i>match_limit_recursion</i> Limit on internal recursion depth
|
||||
<i>callout_data</i> Opaque data passed back to callouts
|
||||
<i>tables</i> Points to character tables or is NULL
|
||||
</pre>
|
||||
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
|
||||
PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
|
||||
PCRE_EXTRA_TABLES.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
40
libs/pcre/doc/html/pcre_free_substring.html
Normal file
40
libs/pcre/doc/html/pcre_free_substring.html
Normal file
@ -0,0 +1,40 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_free_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_free_substring man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>void pcre_free_substring(const char *<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for freeing the store obtained by a previous
|
||||
call to <b>pcre_get_substring()</b> or <b>pcre_get_named_substring()</b>. Its
|
||||
only argument is a pointer to the string.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
40
libs/pcre/doc/html/pcre_free_substring_list.html
Normal file
40
libs/pcre/doc/html/pcre_free_substring_list.html
Normal file
@ -0,0 +1,40 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_free_substring_list specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_free_substring_list man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>void pcre_free_substring_list(const char **<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for freeing the store obtained by a previous
|
||||
call to <b>pcre_get_substring_list()</b>. Its only argument is a pointer to the
|
||||
list of string pointers.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
71
libs/pcre/doc/html/pcre_fullinfo.html
Normal file
71
libs/pcre/doc/html/pcre_fullinfo.html
Normal file
@ -0,0 +1,71 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_fullinfo specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_fullinfo man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_fullinfo(const pcre *<i>code</i>, const pcre_extra *<i>extra</i>,</b>
|
||||
<b>int <i>what</i>, void *<i>where</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function returns information about a compiled pattern. Its arguments are:
|
||||
<pre>
|
||||
<i>code</i> Compiled regular expression
|
||||
<i>extra</i> Result of <b>pcre_study()</b> or NULL
|
||||
<i>what</i> What information is required
|
||||
<i>where</i> Where to put the information
|
||||
</pre>
|
||||
The following information is available:
|
||||
<pre>
|
||||
PCRE_INFO_BACKREFMAX Number of highest back reference
|
||||
PCRE_INFO_CAPTURECOUNT Number of capturing subpatterns
|
||||
PCRE_INFO_DEFAULT_TABLES Pointer to default tables
|
||||
PCRE_INFO_FIRSTBYTE Fixed first byte for a match, or
|
||||
-1 for start of string
|
||||
or after newline, or
|
||||
-2 otherwise
|
||||
PCRE_INFO_FIRSTTABLE Table of first bytes
|
||||
(after studying)
|
||||
PCRE_INFO_LASTLITERAL Literal last byte required
|
||||
PCRE_INFO_NAMECOUNT Number of named subpatterns
|
||||
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
|
||||
PCRE_INFO_NAMETABLE Pointer to name table
|
||||
PCRE_INFO_OPTIONS Options used for compilation
|
||||
PCRE_INFO_SIZE Size of compiled pattern
|
||||
PCRE_INFO_STUDYSIZE Size of study data
|
||||
</pre>
|
||||
The yield of the function is zero on success or:
|
||||
<pre>
|
||||
PCRE_ERROR_NULL the argument <i>code</i> was NULL
|
||||
the argument <i>where</i> was NULL
|
||||
PCRE_ERROR_BADMAGIC the "magic number" was not found
|
||||
PCRE_ERROR_BADOPTION the value of <i>what</i> was invalid
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
54
libs/pcre/doc/html/pcre_get_named_substring.html
Normal file
54
libs/pcre/doc/html/pcre_get_named_substring.html
Normal file
@ -0,0 +1,54 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_get_named_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_get_named_substring man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_named_substring(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, const char *<i>stringname</i>,</b>
|
||||
<b>const char **<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a captured substring by name. The
|
||||
arguments are:
|
||||
<pre>
|
||||
<i>code</i> Compiled pattern
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
|
||||
<i>stringname</i> Name of the required substring
|
||||
<i>stringptr</i> Where to put the string pointer
|
||||
</pre>
|
||||
The memory in which the substring is placed is obtained by calling
|
||||
<b>pcre_malloc()</b>. The yield of the function is the length of the extracted
|
||||
substring, PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
|
||||
PCRE_ERROR_NOSUBSTRING if the string name is invalid.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
46
libs/pcre/doc/html/pcre_get_stringnumber.html
Normal file
46
libs/pcre/doc/html/pcre_get_stringnumber.html
Normal file
@ -0,0 +1,46 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_get_stringnumber specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_get_stringnumber man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_stringnumber(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>name</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This convenience function finds the number of a named substring capturing
|
||||
parenthesis in a compiled pattern. Its arguments are:
|
||||
<pre>
|
||||
<i>code</i> Compiled regular expression
|
||||
<i>name</i> Name whose number is required
|
||||
</pre>
|
||||
The yield of the function is the number of the parenthesis if the name is
|
||||
found, or PCRE_ERROR_NOSUBSTRING otherwise.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
52
libs/pcre/doc/html/pcre_get_stringtable_entries.html
Normal file
52
libs/pcre/doc/html/pcre_get_stringtable_entries.html
Normal file
@ -0,0 +1,52 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_get_stringtable_entries specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_get_stringtable_entries man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_stringtable_entries(const pcre *<i>code</i>,</b>
|
||||
<b>const char *<i>name</i>, char **<i>first</i>, char **<i>last</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This convenience function finds, for a compiled pattern, the first and last
|
||||
entries for a given name in the table that translates capturing parenthesis
|
||||
names into numbers. When names are required to be unique (PCRE_DUPNAMES is
|
||||
<i>not</i> set), it is usually easier to use <b>pcre_get_stringnumber()</b>
|
||||
instead.
|
||||
<pre>
|
||||
<i>code</i> Compiled regular expression
|
||||
<i>name</i> Name whose entries required
|
||||
<i>first</i> Where to return a pointer to the first entry
|
||||
<i>last</i> Where to return a pointer to the last entry
|
||||
</pre>
|
||||
The yield of the function is the length of each entry, or
|
||||
PCRE_ERROR_NOSUBSTRING if none are found.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API, including the format of
|
||||
the table entries, in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
52
libs/pcre/doc/html/pcre_get_substring.html
Normal file
52
libs/pcre/doc/html/pcre_get_substring.html
Normal file
@ -0,0 +1,52 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_get_substring specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_get_substring man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_substring(const char *<i>subject</i>, int *<i>ovector</i>,</b>
|
||||
<b>int <i>stringcount</i>, int <i>stringnumber</i>,</b>
|
||||
<b>const char **<i>stringptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a captured substring. The
|
||||
arguments are:
|
||||
<pre>
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec()</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec()</b>
|
||||
<i>stringnumber</i> Number of the required substring
|
||||
<i>stringptr</i> Where to put the string pointer
|
||||
</pre>
|
||||
The memory in which the substring is placed is obtained by calling
|
||||
<b>pcre_malloc()</b>. The yield of the function is the length of the substring,
|
||||
PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
|
||||
PCRE_ERROR_NOSUBSTRING if the string number is invalid.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
51
libs/pcre/doc/html/pcre_get_substring_list.html
Normal file
51
libs/pcre/doc/html/pcre_get_substring_list.html
Normal file
@ -0,0 +1,51 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_get_substring_list specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_get_substring_list man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_get_substring_list(const char *<i>subject</i>,</b>
|
||||
<b>int *<i>ovector</i>, int <i>stringcount</i>, const char ***<i>listptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This is a convenience function for extracting a list of all the captured
|
||||
substrings. The arguments are:
|
||||
<pre>
|
||||
<i>subject</i> Subject that has been successfully matched
|
||||
<i>ovector</i> Offset vector that <b>pcre_exec</b> used
|
||||
<i>stringcount</i> Value returned by <b>pcre_exec</b>
|
||||
<i>listptr</i> Where to put a pointer to the list
|
||||
</pre>
|
||||
The memory in which the substrings and the list are placed is obtained by
|
||||
calling <b>pcre_malloc()</b>. A pointer to a list of pointers is put in
|
||||
the variable whose address is in <i>listptr</i>. The list is terminated by a
|
||||
NULL pointer. The yield of the function is zero on success or
|
||||
PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
39
libs/pcre/doc/html/pcre_info.html
Normal file
39
libs/pcre/doc/html/pcre_info.html
Normal file
@ -0,0 +1,39 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_info specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_info man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_info(const pcre *<i>code</i>, int *<i>optptr</i>, int</b>
|
||||
<b>*<i>firstcharptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function is obsolete. You should be using <b>pcre_fullinfo()</b> instead.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
42
libs/pcre/doc/html/pcre_maketables.html
Normal file
42
libs/pcre/doc/html/pcre_maketables.html
Normal file
@ -0,0 +1,42 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_maketables specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_maketables man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>const unsigned char *pcre_maketables(void);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function builds a set of character tables for character values less than
|
||||
256. These can be passed to <b>pcre_compile()</b> to override PCRE's internal,
|
||||
built-in tables (which were made by <b>pcre_maketables()</b> when PCRE was
|
||||
compiled). You might want to do this if you are using a non-standard locale.
|
||||
The function yields a pointer to the tables.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
45
libs/pcre/doc/html/pcre_refcount.html
Normal file
45
libs/pcre/doc/html/pcre_refcount.html
Normal file
@ -0,0 +1,45 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_refcount specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_refcount man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int pcre_refcount(pcre *<i>code</i>, int <i>adjust</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function is used to maintain a reference count inside a data block that
|
||||
contains a compiled pattern. Its arguments are:
|
||||
<pre>
|
||||
<i>code</i> Compiled regular expression
|
||||
<i>adjust</i> Adjustment to reference value
|
||||
</pre>
|
||||
The yield of the function is the adjusted reference value, which is constrained
|
||||
to lie between 0 and 65535.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
56
libs/pcre/doc/html/pcre_study.html
Normal file
56
libs/pcre/doc/html/pcre_study.html
Normal file
@ -0,0 +1,56 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_study specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_study man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>pcre_extra *pcre_study(const pcre *<i>code</i>, int <i>options</i>,</b>
|
||||
<b>const char **<i>errptr</i>);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function studies a compiled pattern, to see if additional information can
|
||||
be extracted that might speed up matching. Its arguments are:
|
||||
<pre>
|
||||
<i>code</i> A compiled regular expression
|
||||
<i>options</i> Options for <b>pcre_study()</b>
|
||||
<i>errptr</i> Where to put an error message
|
||||
</pre>
|
||||
If the function succeeds, it returns a value that can be passed to
|
||||
<b>pcre_exec()</b> via its <i>extra</i> argument.
|
||||
</P>
|
||||
<P>
|
||||
If the function returns NULL, either it could not find any additional
|
||||
information, or there was an error. You can tell the difference by looking at
|
||||
the error value. It is NULL in first case.
|
||||
</P>
|
||||
<P>
|
||||
There are currently no options defined; the value of the second argument should
|
||||
always be zero.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
39
libs/pcre/doc/html/pcre_version.html
Normal file
39
libs/pcre/doc/html/pcre_version.html
Normal file
@ -0,0 +1,39 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcre_version specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcre_version man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
SYNOPSIS
|
||||
</b><br>
|
||||
<P>
|
||||
<b>#include <pcre.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>char *pcre_version(void);</b>
|
||||
</P>
|
||||
<br><b>
|
||||
DESCRIPTION
|
||||
</b><br>
|
||||
<P>
|
||||
This function returns a character string that gives the version number of the
|
||||
PCRE library and the date of its release.
|
||||
</P>
|
||||
<P>
|
||||
There is a complete description of the PCRE native API in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
page and a description of the POSIX API in the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
page.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
1770
libs/pcre/doc/html/pcreapi.html
Normal file
1770
libs/pcre/doc/html/pcreapi.html
Normal file
File diff suppressed because it is too large
Load Diff
225
libs/pcre/doc/html/pcrebuild.html
Normal file
225
libs/pcre/doc/html/pcrebuild.html
Normal file
@ -0,0 +1,225 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcrebuild specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcrebuild man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">PCRE BUILD-TIME OPTIONS</a>
|
||||
<li><a name="TOC2" href="#SEC2">C++ SUPPORT</a>
|
||||
<li><a name="TOC3" href="#SEC3">UTF-8 SUPPORT</a>
|
||||
<li><a name="TOC4" href="#SEC4">UNICODE CHARACTER PROPERTY SUPPORT</a>
|
||||
<li><a name="TOC5" href="#SEC5">CODE VALUE OF NEWLINE</a>
|
||||
<li><a name="TOC6" href="#SEC6">BUILDING SHARED AND STATIC LIBRARIES</a>
|
||||
<li><a name="TOC7" href="#SEC7">POSIX MALLOC USAGE</a>
|
||||
<li><a name="TOC8" href="#SEC8">HANDLING VERY LARGE PATTERNS</a>
|
||||
<li><a name="TOC9" href="#SEC9">AVOIDING EXCESSIVE STACK USAGE</a>
|
||||
<li><a name="TOC10" href="#SEC10">LIMITING PCRE RESOURCE USAGE</a>
|
||||
<li><a name="TOC11" href="#SEC11">USING EBCDIC CODE</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">PCRE BUILD-TIME OPTIONS</a><br>
|
||||
<P>
|
||||
This document describes the optional features of PCRE that can be selected when
|
||||
the library is compiled. They are all selected, or deselected, by providing
|
||||
options to the <b>configure</b> script that is run before the <b>make</b>
|
||||
command. The complete list of options for <b>configure</b> (which includes the
|
||||
standard ones such as the selection of the installation directory) can be
|
||||
obtained by running
|
||||
<pre>
|
||||
./configure --help
|
||||
</pre>
|
||||
The following sections describe certain options whose names begin with --enable
|
||||
or --disable. These settings specify changes to the defaults for the
|
||||
<b>configure</b> command. Because of the way that <b>configure</b> works,
|
||||
--enable and --disable always come in pairs, so the complementary option always
|
||||
exists as well, but as it specifies the default, it is not described.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">C++ SUPPORT</a><br>
|
||||
<P>
|
||||
By default, the <b>configure</b> script will search for a C++ compiler and C++
|
||||
header files. If it finds them, it automatically builds the C++ wrapper library
|
||||
for PCRE. You can disable this by adding
|
||||
<pre>
|
||||
--disable-cpp
|
||||
</pre>
|
||||
to the <b>configure</b> command.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">UTF-8 SUPPORT</a><br>
|
||||
<P>
|
||||
To build PCRE with support for UTF-8 character strings, add
|
||||
<pre>
|
||||
--enable-utf8
|
||||
</pre>
|
||||
to the <b>configure</b> command. Of itself, this does not make PCRE treat
|
||||
strings as UTF-8. As well as compiling PCRE with this option, you also have
|
||||
have to set the PCRE_UTF8 option when you call the <b>pcre_compile()</b>
|
||||
function.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">UNICODE CHARACTER PROPERTY SUPPORT</a><br>
|
||||
<P>
|
||||
UTF-8 support allows PCRE to process character values greater than 255 in the
|
||||
strings that it handles. On its own, however, it does not provide any
|
||||
facilities for accessing the properties of such characters. If you want to be
|
||||
able to use the pattern escapes \P, \p, and \X, which refer to Unicode
|
||||
character properties, you must add
|
||||
<pre>
|
||||
--enable-unicode-properties
|
||||
</pre>
|
||||
to the <b>configure</b> command. This implies UTF-8 support, even if you have
|
||||
not explicitly requested it.
|
||||
</P>
|
||||
<P>
|
||||
Including Unicode property support adds around 90K of tables to the PCRE
|
||||
library, approximately doubling its size. Only the general category properties
|
||||
such as <i>Lu</i> and <i>Nd</i> are supported. Details are given in the
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">CODE VALUE OF NEWLINE</a><br>
|
||||
<P>
|
||||
By default, PCRE interprets character 10 (linefeed, LF) as indicating the end
|
||||
of a line. This is the normal newline character on Unix-like systems. You can
|
||||
compile PCRE to use character 13 (carriage return, CR) instead, by adding
|
||||
<pre>
|
||||
--enable-newline-is-cr
|
||||
</pre>
|
||||
to the <b>configure</b> command. There is also a --enable-newline-is-lf option,
|
||||
which explicitly specifies linefeed as the newline character.
|
||||
<br>
|
||||
<br>
|
||||
Alternatively, you can specify that line endings are to be indicated by the two
|
||||
character sequence CRLF. If you want this, add
|
||||
<pre>
|
||||
--enable-newline-is-crlf
|
||||
</pre>
|
||||
to the <b>configure</b> command. Whatever line ending convention is selected
|
||||
when PCRE is built can be overridden when the library functions are called. At
|
||||
build time it is conventional to use the standard for your operating system.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">BUILDING SHARED AND STATIC LIBRARIES</a><br>
|
||||
<P>
|
||||
The PCRE building process uses <b>libtool</b> to build both shared and static
|
||||
Unix libraries by default. You can suppress one of these by adding one of
|
||||
<pre>
|
||||
--disable-shared
|
||||
--disable-static
|
||||
</pre>
|
||||
to the <b>configure</b> command, as required.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">POSIX MALLOC USAGE</a><br>
|
||||
<P>
|
||||
When PCRE is called through the POSIX interface (see the
|
||||
<a href="pcreposix.html"><b>pcreposix</b></a>
|
||||
documentation), additional working storage is required for holding the pointers
|
||||
to capturing substrings, because PCRE requires three integers per substring,
|
||||
whereas the POSIX interface provides only two. If the number of expected
|
||||
substrings is small, the wrapper function uses space on the stack, because this
|
||||
is faster than using <b>malloc()</b> for each call. The default threshold above
|
||||
which the stack is no longer used is 10; it can be changed by adding a setting
|
||||
such as
|
||||
<pre>
|
||||
--with-posix-malloc-threshold=20
|
||||
</pre>
|
||||
to the <b>configure</b> command.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
|
||||
<P>
|
||||
Within a compiled pattern, offset values are used to point from one part to
|
||||
another (for example, from an opening parenthesis to an alternation
|
||||
metacharacter). By default, two-byte values are used for these offsets, leading
|
||||
to a maximum size for a compiled pattern of around 64K. This is sufficient to
|
||||
handle all but the most gigantic patterns. Nevertheless, some people do want to
|
||||
process enormous patterns, so it is possible to compile PCRE to use three-byte
|
||||
or four-byte offsets by adding a setting such as
|
||||
<pre>
|
||||
--with-link-size=3
|
||||
</pre>
|
||||
to the <b>configure</b> command. The value given must be 2, 3, or 4. Using
|
||||
longer offsets slows down the operation of PCRE because it has to load
|
||||
additional bytes when handling them.
|
||||
</P>
|
||||
<P>
|
||||
If you build PCRE with an increased link size, test 2 (and test 5 if you are
|
||||
using UTF-8) will fail. Part of the output of these tests is a representation
|
||||
of the compiled pattern, and this changes with the link size.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
|
||||
<P>
|
||||
When matching with the <b>pcre_exec()</b> function, PCRE implements backtracking
|
||||
by making recursive calls to an internal function called <b>match()</b>. In
|
||||
environments where the size of the stack is limited, this can severely limit
|
||||
PCRE's operation. (The Unix environment does not usually suffer from this
|
||||
problem, but it may sometimes be necessary to increase the maximum stack size.
|
||||
There is a discussion in the
|
||||
<a href="pcrestack.html"><b>pcrestack</b></a>
|
||||
documentation.) An alternative approach to recursion that uses memory from the
|
||||
heap to remember data, instead of using recursive function calls, has been
|
||||
implemented to work round the problem of limited stack size. If you want to
|
||||
build a version of PCRE that works this way, add
|
||||
<pre>
|
||||
--disable-stack-for-recursion
|
||||
</pre>
|
||||
to the <b>configure</b> command. With this configuration, PCRE will use the
|
||||
<b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables to call memory
|
||||
management functions. Separate functions are provided because the usage is very
|
||||
predictable: the block sizes requested are always the same, and the blocks are
|
||||
always freed in reverse order. A calling program might be able to implement
|
||||
optimized functions that perform better than the standard <b>malloc()</b> and
|
||||
<b>free()</b> functions. PCRE runs noticeably more slowly when built in this
|
||||
way. This option affects only the <b>pcre_exec()</b> function; it is not
|
||||
relevant for the the <b>pcre_dfa_exec()</b> function.
|
||||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">LIMITING PCRE RESOURCE USAGE</a><br>
|
||||
<P>
|
||||
Internally, PCRE has a function called <b>match()</b>, which it calls repeatedly
|
||||
(sometimes recursively) when matching a pattern with the <b>pcre_exec()</b>
|
||||
function. By controlling the maximum number of times this function may be
|
||||
called during a single matching operation, a limit can be placed on the
|
||||
resources used by a single call to <b>pcre_exec()</b>. The limit can be changed
|
||||
at run time, as described in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation. The default is 10 million, but this can be changed by adding a
|
||||
setting such as
|
||||
<pre>
|
||||
--with-match-limit=500000
|
||||
</pre>
|
||||
to the <b>configure</b> command. This setting has no effect on the
|
||||
<b>pcre_dfa_exec()</b> matching function.
|
||||
</P>
|
||||
<P>
|
||||
In some environments it is desirable to limit the depth of recursive calls of
|
||||
<b>match()</b> more strictly than the total number of calls, in order to
|
||||
restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
|
||||
is specified) that is used. A second limit controls this; it defaults to the
|
||||
value that is set for --with-match-limit, which imposes no additional
|
||||
constraints. However, you can set a lower limit by adding, for example,
|
||||
<pre>
|
||||
--with-match-limit-recursion=10000
|
||||
</pre>
|
||||
to the <b>configure</b> command. This value can also be overridden at run time.
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">USING EBCDIC CODE</a><br>
|
||||
<P>
|
||||
PCRE assumes by default that it will run in an environment where the character
|
||||
code is ASCII (or Unicode, which is a superset of ASCII). PCRE can, however, be
|
||||
compiled to run in an EBCDIC environment by adding
|
||||
<pre>
|
||||
--enable-ebcdic
|
||||
</pre>
|
||||
to the <b>configure</b> command.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 06 June 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
186
libs/pcre/doc/html/pcrecallout.html
Normal file
186
libs/pcre/doc/html/pcrecallout.html
Normal file
@ -0,0 +1,186 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcrecallout specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcrecallout man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">PCRE CALLOUTS</a>
|
||||
<li><a name="TOC2" href="#SEC2">MISSING CALLOUTS</a>
|
||||
<li><a name="TOC3" href="#SEC3">THE CALLOUT INTERFACE</a>
|
||||
<li><a name="TOC4" href="#SEC4">RETURN VALUES</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">PCRE CALLOUTS</a><br>
|
||||
<P>
|
||||
<b>int (*pcre_callout)(pcre_callout_block *);</b>
|
||||
</P>
|
||||
<P>
|
||||
PCRE provides a feature called "callout", which is a means of temporarily
|
||||
passing control to the caller of PCRE in the middle of pattern matching. The
|
||||
caller of PCRE provides an external function by putting its entry point in the
|
||||
global variable <i>pcre_callout</i>. By default, this variable contains NULL,
|
||||
which disables all calling out.
|
||||
</P>
|
||||
<P>
|
||||
Within a regular expression, (?C) indicates the points at which the external
|
||||
function is to be called. Different callout points can be identified by putting
|
||||
a number less than 256 after the letter C. The default value is zero.
|
||||
For example, this pattern has two callout points:
|
||||
<pre>
|
||||
(?C1)\deabc(?C2)def
|
||||
</pre>
|
||||
If the PCRE_AUTO_CALLOUT option bit is set when <b>pcre_compile()</b> is called,
|
||||
PCRE automatically inserts callouts, all with number 255, before each item in
|
||||
the pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern
|
||||
<pre>
|
||||
A(\d{2}|--)
|
||||
</pre>
|
||||
it is processed as if it were
|
||||
<br>
|
||||
<br>
|
||||
(?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
|
||||
<br>
|
||||
<br>
|
||||
Notice that there is a callout before and after each parenthesis and
|
||||
alternation bar. Automatic callouts can be used for tracking the progress of
|
||||
pattern matching. The
|
||||
<a href="pcretest.html"><b>pcretest</b></a>
|
||||
command has an option that sets automatic callouts; when it is used, the output
|
||||
indicates how the pattern is matched. This is useful information when you are
|
||||
trying to optimize the performance of a particular pattern.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">MISSING CALLOUTS</a><br>
|
||||
<P>
|
||||
You should be aware that, because of optimizations in the way PCRE matches
|
||||
patterns, callouts sometimes do not happen. For example, if the pattern is
|
||||
<pre>
|
||||
ab(?C4)cd
|
||||
</pre>
|
||||
PCRE knows that any matching string must contain the letter "d". If the subject
|
||||
string is "abyz", the lack of "d" means that matching doesn't ever start, and
|
||||
the callout is never reached. However, with "abyd", though the result is still
|
||||
no match, the callout is obeyed.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">THE CALLOUT INTERFACE</a><br>
|
||||
<P>
|
||||
During matching, when PCRE reaches a callout point, the external function
|
||||
defined by <i>pcre_callout</i> is called (if it is set). This applies to both
|
||||
the <b>pcre_exec()</b> and the <b>pcre_dfa_exec()</b> matching functions. The
|
||||
only argument to the callout function is a pointer to a <b>pcre_callout</b>
|
||||
block. This structure contains the following fields:
|
||||
<pre>
|
||||
int <i>version</i>;
|
||||
int <i>callout_number</i>;
|
||||
int *<i>offset_vector</i>;
|
||||
const char *<i>subject</i>;
|
||||
int <i>subject_length</i>;
|
||||
int <i>start_match</i>;
|
||||
int <i>current_position</i>;
|
||||
int <i>capture_top</i>;
|
||||
int <i>capture_last</i>;
|
||||
void *<i>callout_data</i>;
|
||||
int <i>pattern_position</i>;
|
||||
int <i>next_item_length</i>;
|
||||
</pre>
|
||||
The <i>version</i> field is an integer containing the version number of the
|
||||
block format. The initial version was 0; the current version is 1. The version
|
||||
number will change again in future if additional fields are added, but the
|
||||
intention is never to remove any of the existing fields.
|
||||
</P>
|
||||
<P>
|
||||
The <i>callout_number</i> field contains the number of the callout, as compiled
|
||||
into the pattern (that is, the number after ?C for manual callouts, and 255 for
|
||||
automatically generated callouts).
|
||||
</P>
|
||||
<P>
|
||||
The <i>offset_vector</i> field is a pointer to the vector of offsets that was
|
||||
passed by the caller to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>. When
|
||||
<b>pcre_exec()</b> is used, the contents can be inspected in order to extract
|
||||
substrings that have been matched so far, in the same way as for extracting
|
||||
substrings after a match has completed. For <b>pcre_dfa_exec()</b> this field is
|
||||
not useful.
|
||||
</P>
|
||||
<P>
|
||||
The <i>subject</i> and <i>subject_length</i> fields contain copies of the values
|
||||
that were passed to <b>pcre_exec()</b>.
|
||||
</P>
|
||||
<P>
|
||||
The <i>start_match</i> field contains the offset within the subject at which the
|
||||
current match attempt started. If the pattern is not anchored, the callout
|
||||
function may be called several times from the same point in the pattern for
|
||||
different starting points in the subject.
|
||||
</P>
|
||||
<P>
|
||||
The <i>current_position</i> field contains the offset within the subject of the
|
||||
current match pointer.
|
||||
</P>
|
||||
<P>
|
||||
When the <b>pcre_exec()</b> function is used, the <i>capture_top</i> field
|
||||
contains one more than the number of the highest numbered captured substring so
|
||||
far. If no substrings have been captured, the value of <i>capture_top</i> is
|
||||
one. This is always the case when <b>pcre_dfa_exec()</b> is used, because it
|
||||
does not support captured substrings.
|
||||
</P>
|
||||
<P>
|
||||
The <i>capture_last</i> field contains the number of the most recently captured
|
||||
substring. If no substrings have been captured, its value is -1. This is always
|
||||
the case when <b>pcre_dfa_exec()</b> is used.
|
||||
</P>
|
||||
<P>
|
||||
The <i>callout_data</i> field contains a value that is passed to
|
||||
<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> specifically so that it can be
|
||||
passed back in callouts. It is passed in the <i>pcre_callout</i> field of the
|
||||
<b>pcre_extra</b> data structure. If no such data was passed, the value of
|
||||
<i>callout_data</i> in a <b>pcre_callout</b> block is NULL. There is a
|
||||
description of the <b>pcre_extra</b> structure in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
The <i>pattern_position</i> field is present from version 1 of the
|
||||
<i>pcre_callout</i> structure. It contains the offset to the next item to be
|
||||
matched in the pattern string.
|
||||
</P>
|
||||
<P>
|
||||
The <i>next_item_length</i> field is present from version 1 of the
|
||||
<i>pcre_callout</i> structure. It contains the length of the next item to be
|
||||
matched in the pattern string. When the callout immediately precedes an
|
||||
alternation bar, a closing parenthesis, or the end of the pattern, the length
|
||||
is zero. When the callout precedes an opening parenthesis, the length is that
|
||||
of the entire subpattern.
|
||||
</P>
|
||||
<P>
|
||||
The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
|
||||
help in distinguishing between different automatic callouts, which all have the
|
||||
same callout number. However, they are set for all callouts.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">RETURN VALUES</a><br>
|
||||
<P>
|
||||
The external callout function returns an integer to PCRE. If the value is zero,
|
||||
matching proceeds as normal. If the value is greater than zero, matching fails
|
||||
at the current point, but the testing of other matching possibilities goes
|
||||
ahead, just as if a lookahead assertion had failed. If the value is less than
|
||||
zero, the match is abandoned, and <b>pcre_exec()</b> (or <b>pcre_dfa_exec()</b>)
|
||||
returns the negative value.
|
||||
</P>
|
||||
<P>
|
||||
Negative values should normally be chosen from the set of PCRE_ERROR_xxx
|
||||
values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure.
|
||||
The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions;
|
||||
it will never be used by PCRE itself.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 28 February 2005
|
||||
<br>
|
||||
Copyright © 1997-2005 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
156
libs/pcre/doc/html/pcrecompat.html
Normal file
156
libs/pcre/doc/html/pcrecompat.html
Normal file
@ -0,0 +1,156 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcrecompat specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcrecompat man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
DIFFERENCES BETWEEN PCRE AND PERL
|
||||
</b><br>
|
||||
<P>
|
||||
This document describes the differences in the ways that PCRE and Perl handle
|
||||
regular expressions. The differences described here are with respect to Perl
|
||||
5.8.
|
||||
</P>
|
||||
<P>
|
||||
1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what
|
||||
it does have are given in the
|
||||
<a href="pcre.html#utf8support">section on UTF-8 support</a>
|
||||
in the main
|
||||
<a href="pcre.html"><b>pcre</b></a>
|
||||
page.
|
||||
</P>
|
||||
<P>
|
||||
2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits
|
||||
them, but they do not mean what you might think. For example, (?!a){3} does
|
||||
not assert that the next three characters are not "a". It just asserts that the
|
||||
next character is not "a" three times.
|
||||
</P>
|
||||
<P>
|
||||
3. Capturing subpatterns that occur inside negative lookahead assertions are
|
||||
counted, but their entries in the offsets vector are never set. Perl sets its
|
||||
numerical variables from any such patterns that are matched before the
|
||||
assertion fails to match something (thereby succeeding), but only if the
|
||||
negative lookahead assertion contains just one branch.
|
||||
</P>
|
||||
<P>
|
||||
4. Though binary zero characters are supported in the subject string, they are
|
||||
not allowed in a pattern string because it is passed as a normal C string,
|
||||
terminated by zero. The escape sequence \0 can be used in the pattern to
|
||||
represent a binary zero.
|
||||
</P>
|
||||
<P>
|
||||
5. The following Perl escape sequences are not supported: \l, \u, \L,
|
||||
\U, and \N. In fact these are implemented by Perl's general string-handling
|
||||
and are not part of its pattern matching engine. If any of these are
|
||||
encountered by PCRE, an error is generated.
|
||||
</P>
|
||||
<P>
|
||||
6. The Perl escape sequences \p, \P, and \X are supported only if PCRE is
|
||||
built with Unicode character property support. The properties that can be
|
||||
tested with \p and \P are limited to the general category properties such as
|
||||
Lu and Nd, script names such as Greek or Han, and the derived properties Any
|
||||
and L&.
|
||||
</P>
|
||||
<P>
|
||||
7. PCRE does support the \Q...\E escape for quoting substrings. Characters in
|
||||
between are treated as literals. This is slightly different from Perl in that $
|
||||
and @ are also handled as literals inside the quotes. In Perl, they cause
|
||||
variable interpolation (but of course PCRE does not have variables). Note the
|
||||
following examples:
|
||||
<pre>
|
||||
Pattern PCRE matches Perl matches
|
||||
|
||||
\Qabc$xyz\E abc$xyz abc followed by the contents of $xyz
|
||||
\Qabc\$xyz\E abc\$xyz abc\$xyz
|
||||
\Qabc\E\$\Qxyz\E abc$xyz abc$xyz
|
||||
</pre>
|
||||
The \Q...\E sequence is recognized both inside and outside character classes.
|
||||
</P>
|
||||
<P>
|
||||
8. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
|
||||
constructions. However, there is support for recursive patterns using the
|
||||
non-Perl items (?R), (?number), and (?P>name). Also, the PCRE "callout" feature
|
||||
allows an external function to be called during pattern matching. See the
|
||||
<a href="pcrecallout.html"><b>pcrecallout</b></a>
|
||||
documentation for details.
|
||||
</P>
|
||||
<P>
|
||||
9. There are some differences that are concerned with the settings of captured
|
||||
strings when part of a pattern is repeated. For example, matching "aba" against
|
||||
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
|
||||
</P>
|
||||
<P>
|
||||
10. PCRE provides some extensions to the Perl regular expression facilities:
|
||||
<br>
|
||||
<br>
|
||||
(a) Although lookbehind assertions must match fixed length strings, each
|
||||
alternative branch of a lookbehind assertion can match a different length of
|
||||
string. Perl requires them all to have the same length.
|
||||
<br>
|
||||
<br>
|
||||
(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
|
||||
meta-character matches only at the very end of the string.
|
||||
<br>
|
||||
<br>
|
||||
(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special
|
||||
meaning is faulted. Otherwise, like Perl, the backslash is ignored. (Perl can
|
||||
be made to issue a warning.)
|
||||
<br>
|
||||
<br>
|
||||
(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is
|
||||
inverted, that is, by default they are not greedy, but if followed by a
|
||||
question mark they are.
|
||||
<br>
|
||||
<br>
|
||||
(e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried
|
||||
only at the first matching position in the subject string.
|
||||
<br>
|
||||
<br>
|
||||
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE
|
||||
options for <b>pcre_exec()</b> have no Perl equivalents.
|
||||
<br>
|
||||
<br>
|
||||
(g) The (?R), (?number), and (?P>name) constructs allows for recursive pattern
|
||||
matching (Perl can do this using the (?p{code}) construct, which PCRE cannot
|
||||
support.)
|
||||
<br>
|
||||
<br>
|
||||
(h) PCRE supports named capturing substrings, using the Python syntax.
|
||||
<br>
|
||||
<br>
|
||||
(i) PCRE supports the possessive quantifier "++" syntax, taken from Sun's Java
|
||||
package.
|
||||
<br>
|
||||
<br>
|
||||
(j) The (R) condition, for testing recursion, is a PCRE extension.
|
||||
<br>
|
||||
<br>
|
||||
(k) The callout facility is PCRE-specific.
|
||||
<br>
|
||||
<br>
|
||||
(l) The partial matching facility is PCRE-specific.
|
||||
<br>
|
||||
<br>
|
||||
(m) Patterns compiled by PCRE can be saved and re-used at a later time, even on
|
||||
different hosts that have the other endianness.
|
||||
<br>
|
||||
<br>
|
||||
(n) The alternative matching function (<b>pcre_dfa_exec()</b>) matches in a
|
||||
different way and is not Perl-compatible.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 06 June 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
337
libs/pcre/doc/html/pcrecpp.html
Normal file
337
libs/pcre/doc/html/pcrecpp.html
Normal file
@ -0,0 +1,337 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcrecpp specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcrecpp man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF C++ WRAPPER</a>
|
||||
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
|
||||
<li><a name="TOC3" href="#SEC3">MATCHING INTERFACE</a>
|
||||
<li><a name="TOC4" href="#SEC4">PARTIAL MATCHES</a>
|
||||
<li><a name="TOC5" href="#SEC5">UTF-8 AND THE MATCHING INTERFACE</a>
|
||||
<li><a name="TOC6" href="#SEC6">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a>
|
||||
<li><a name="TOC7" href="#SEC7">SCANNING TEXT INCREMENTALLY</a>
|
||||
<li><a name="TOC8" href="#SEC8">PARSING HEX/OCTAL/C-RADIX NUMBERS</a>
|
||||
<li><a name="TOC9" href="#SEC9">REPLACING PARTS OF STRINGS</a>
|
||||
<li><a name="TOC10" href="#SEC10">AUTHOR</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF C++ WRAPPER</a><br>
|
||||
<P>
|
||||
<b>#include <pcrecpp.h></b>
|
||||
</P>
|
||||
<P>
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
The C++ wrapper for PCRE was provided by Google Inc. Some additional
|
||||
functionality was added by Giuseppe Maxia. This brief man page was constructed
|
||||
from the notes in the <i>pcrecpp.h</i> file, which should be consulted for
|
||||
further details.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">MATCHING INTERFACE</a><br>
|
||||
<P>
|
||||
The "FullMatch" operation checks that supplied text matches a supplied pattern
|
||||
exactly. If pointer arguments are supplied, it copies matched sub-strings that
|
||||
match sub-patterns into them.
|
||||
<pre>
|
||||
Example: successful match
|
||||
pcrecpp::RE re("h.*o");
|
||||
re.FullMatch("hello");
|
||||
|
||||
Example: unsuccessful match (requires full match):
|
||||
pcrecpp::RE re("e");
|
||||
!re.FullMatch("hello");
|
||||
|
||||
Example: creating a temporary RE object:
|
||||
pcrecpp::RE("h.*o").FullMatch("hello");
|
||||
</pre>
|
||||
You can pass in a "const char*" or a "string" for "text". The examples below
|
||||
tend to use a const char*. You can, as in the different examples above, store
|
||||
the RE object explicitly in a variable or use a temporary RE object. The
|
||||
examples below use one mode or the other arbitrarily. Either could correctly be
|
||||
used for any of these examples.
|
||||
</P>
|
||||
<P>
|
||||
You must supply extra pointer arguments to extract matched subpieces.
|
||||
<pre>
|
||||
Example: extracts "ruby" into "s" and 1234 into "i"
|
||||
int i;
|
||||
string s;
|
||||
pcrecpp::RE re("(\\w+):(\\d+)");
|
||||
re.FullMatch("ruby:1234", &s, &i);
|
||||
|
||||
Example: does not try to extract any extra sub-patterns
|
||||
re.FullMatch("ruby:1234", &s);
|
||||
|
||||
Example: does not try to extract into NULL
|
||||
re.FullMatch("ruby:1234", NULL, &i);
|
||||
|
||||
Example: integer overflow causes failure
|
||||
!re.FullMatch("ruby:1234567891234", NULL, &i);
|
||||
|
||||
Example: fails because there aren't enough sub-patterns:
|
||||
!pcrecpp::RE("\\w+:\\d+").FullMatch("ruby:1234", &s);
|
||||
|
||||
Example: fails because string cannot be stored in integer
|
||||
!pcrecpp::RE("(.*)").FullMatch("ruby", &i);
|
||||
</pre>
|
||||
The provided pointer arguments can be pointers to any scalar numeric
|
||||
type, or one of:
|
||||
<pre>
|
||||
string (matched piece is copied to string)
|
||||
StringPiece (StringPiece is mutated to point to matched piece)
|
||||
T (where "bool T::ParseFrom(const char*, int)" exists)
|
||||
NULL (the corresponding matched sub-pattern is not copied)
|
||||
</pre>
|
||||
The function returns true iff all of the following conditions are satisfied:
|
||||
<pre>
|
||||
a. "text" matches "pattern" exactly;
|
||||
|
||||
b. The number of matched sub-patterns is >= number of supplied
|
||||
pointers;
|
||||
|
||||
c. The "i"th argument has a suitable type for holding the
|
||||
string captured as the "i"th sub-pattern. If you pass in
|
||||
NULL for the "i"th argument, or pass fewer arguments than
|
||||
number of sub-patterns, "i"th captured sub-pattern is
|
||||
ignored.
|
||||
</pre>
|
||||
The matching interface supports at most 16 arguments per call.
|
||||
If you need more, consider using the more general interface
|
||||
<b>pcrecpp::RE::DoMatch</b>. See <b>pcrecpp.h</b> for the signature for
|
||||
<b>DoMatch</b>.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">PARTIAL MATCHES</a><br>
|
||||
<P>
|
||||
You can use the "PartialMatch" operation when you want the pattern
|
||||
to match any substring of the text.
|
||||
<pre>
|
||||
Example: simple search for a string:
|
||||
pcrecpp::RE("ell").PartialMatch("hello");
|
||||
|
||||
Example: find first number in a string:
|
||||
int number;
|
||||
pcrecpp::RE re("(\\d+)");
|
||||
re.PartialMatch("x*100 + 20", &number);
|
||||
assert(number == 100);
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">UTF-8 AND THE MATCHING INTERFACE</a><br>
|
||||
<P>
|
||||
By default, pattern and text are plain text, one byte per character. The UTF8
|
||||
flag, passed to the constructor, causes both pattern and string to be treated
|
||||
as UTF-8 text, still a byte stream but potentially multiple bytes per
|
||||
character. In practice, the text is likelier to be UTF-8 than the pattern, but
|
||||
the match returned may depend on the UTF8 flag, so always use it when matching
|
||||
UTF8 text. For example, "." will match one byte normally but with UTF8 set may
|
||||
match up to three bytes of a multi-byte character.
|
||||
<pre>
|
||||
Example:
|
||||
pcrecpp::RE_Options options;
|
||||
options.set_utf8();
|
||||
pcrecpp::RE re(utf8_pattern, options);
|
||||
re.FullMatch(utf8_string);
|
||||
|
||||
Example: using the convenience function UTF8():
|
||||
pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
|
||||
re.FullMatch(utf8_string);
|
||||
</pre>
|
||||
NOTE: The UTF8 flag is ignored if pcre was not configured with the
|
||||
<pre>
|
||||
--enable-utf8 flag.
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE</a><br>
|
||||
<P>
|
||||
PCRE defines some modifiers to change the behavior of the regular expression
|
||||
engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
|
||||
pass such modifiers to a RE class. Currently, the following modifiers are
|
||||
supported:
|
||||
<pre>
|
||||
modifier description Perl corresponding
|
||||
|
||||
PCRE_CASELESS case insensitive match /i
|
||||
PCRE_MULTILINE multiple lines match /m
|
||||
PCRE_DOTALL dot matches newlines /s
|
||||
PCRE_DOLLAR_ENDONLY $ matches only at end N/A
|
||||
PCRE_EXTRA strict escape parsing N/A
|
||||
PCRE_EXTENDED ignore whitespaces /x
|
||||
PCRE_UTF8 handles UTF8 chars built-in
|
||||
PCRE_UNGREEDY reverses * and *? N/A
|
||||
PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
|
||||
</pre>
|
||||
(*) Both Perl and PCRE allow non capturing parentheses by means of the
|
||||
"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
|
||||
capture, while (ab|cd) does.
|
||||
</P>
|
||||
<P>
|
||||
For a full account on how each modifier works, please check the
|
||||
PCRE API reference page.
|
||||
</P>
|
||||
<P>
|
||||
For each modifier, there are two member functions whose name is made
|
||||
out of the modifier in lowercase, without the "PCRE_" prefix. For
|
||||
instance, PCRE_CASELESS is handled by
|
||||
<pre>
|
||||
bool caseless()
|
||||
</pre>
|
||||
which returns true if the modifier is set, and
|
||||
<pre>
|
||||
RE_Options & set_caseless(bool)
|
||||
</pre>
|
||||
which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
|
||||
accessed through the <b>set_match_limit()</b> and <b>match_limit()</b> member
|
||||
functions. Setting <i>match_limit</i> to a non-zero value will limit the
|
||||
execution of pcre to keep it from doing bad things like blowing the stack or
|
||||
taking an eternity to return a result. A value of 5000 is good enough to stop
|
||||
stack blowup in a 2MB thread stack. Setting <i>match_limit</i> to zero disables
|
||||
match limiting. Alternatively, you can call <b>match_limit_recursion()</b>
|
||||
which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
|
||||
recurses. <b>match_limit()</b> limits the number of matches PCRE does;
|
||||
<b>match_limit_recursion()</b> limits the depth of internal recursion, and
|
||||
therefore the amount of stack that is used.
|
||||
</P>
|
||||
<P>
|
||||
Normally, to pass one or more modifiers to a RE class, you declare
|
||||
a <i>RE_Options</i> object, set the appropriate options, and pass this
|
||||
object to a RE constructor. Example:
|
||||
<pre>
|
||||
RE_options opt;
|
||||
opt.set_caseless(true);
|
||||
if (RE("HELLO", opt).PartialMatch("hello world")) ...
|
||||
</pre>
|
||||
RE_options has two constructors. The default constructor takes no arguments and
|
||||
creates a set of flags that are off by default. The optional parameter
|
||||
<i>option_flags</i> is to facilitate transfer of legacy code from C programs.
|
||||
This lets you do
|
||||
<pre>
|
||||
RE(pattern,
|
||||
RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
|
||||
</pre>
|
||||
However, new code is better off doing
|
||||
<pre>
|
||||
RE(pattern,
|
||||
RE_Options().set_caseless(true).set_multiline(true))
|
||||
.PartialMatch(str);
|
||||
</pre>
|
||||
If you are going to pass one of the most used modifiers, there are some
|
||||
convenience functions that return a RE_Options class with the
|
||||
appropriate modifier already set: <b>CASELESS()</b>, <b>UTF8()</b>,
|
||||
<b>MULTILINE()</b>, <b>DOTALL</b>(), and <b>EXTENDED()</b>.
|
||||
</P>
|
||||
<P>
|
||||
If you need to set several options at once, and you don't want to go through
|
||||
the pains of declaring a RE_Options object and setting several options, there
|
||||
is a parallel method that give you such ability on the fly. You can concatenate
|
||||
several <b>set_xxxxx()</b> member functions, since each of them returns a
|
||||
reference to its class object. For example, to pass PCRE_CASELESS,
|
||||
PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
|
||||
<pre>
|
||||
RE(" ^ xyz \\s+ .* blah$",
|
||||
RE_Options()
|
||||
.set_caseless(true)
|
||||
.set_extended(true)
|
||||
.set_multiline(true)).PartialMatch(sometext);
|
||||
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">SCANNING TEXT INCREMENTALLY</a><br>
|
||||
<P>
|
||||
The "Consume" operation may be useful if you want to repeatedly
|
||||
match regular expressions at the front of a string and skip over
|
||||
them as they match. This requires use of the "StringPiece" type,
|
||||
which represents a sub-range of a real string. Like RE, StringPiece
|
||||
is defined in the pcrecpp namespace.
|
||||
<pre>
|
||||
Example: read lines of the form "var = value" from a string.
|
||||
string contents = ...; // Fill string somehow
|
||||
pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
|
||||
</PRE>
|
||||
</P>
|
||||
<P>
|
||||
<pre>
|
||||
string var;
|
||||
int value;
|
||||
pcrecpp::RE re("(\\w+) = (\\d+)\n");
|
||||
while (re.Consume(&input, &var, &value)) {
|
||||
...;
|
||||
}
|
||||
</pre>
|
||||
Each successful call to "Consume" will set "var/value", and also
|
||||
advance "input" so it points past the matched text.
|
||||
</P>
|
||||
<P>
|
||||
The "FindAndConsume" operation is similar to "Consume" but does not
|
||||
anchor your match at the beginning of the string. For example, you
|
||||
could extract all words from a string by repeatedly calling
|
||||
<pre>
|
||||
pcrecpp::RE("(\\w+)").FindAndConsume(&input, &word)
|
||||
</PRE>
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">PARSING HEX/OCTAL/C-RADIX NUMBERS</a><br>
|
||||
<P>
|
||||
By default, if you pass a pointer to a numeric value, the
|
||||
corresponding text is interpreted as a base-10 number. You can
|
||||
instead wrap the pointer with a call to one of the operators Hex(),
|
||||
Octal(), or CRadix() to interpret the text in another base. The
|
||||
CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
|
||||
prefixes, but defaults to base-10.
|
||||
<pre>
|
||||
Example:
|
||||
int a, b, c, d;
|
||||
pcrecpp::RE re("(.*) (.*) (.*) (.*)");
|
||||
re.FullMatch("100 40 0100 0x40",
|
||||
pcrecpp::Octal(&a), pcrecpp::Hex(&b),
|
||||
pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
|
||||
</pre>
|
||||
will leave 64 in a, b, c, and d.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">REPLACING PARTS OF STRINGS</a><br>
|
||||
<P>
|
||||
You can replace the first match of "pattern" in "str" with "rewrite".
|
||||
Within "rewrite", backslash-escaped digits (\1 to \9) can be
|
||||
used to insert text matching corresponding parenthesized group
|
||||
from the pattern. \0 in "rewrite" refers to the entire matching
|
||||
text. For example:
|
||||
<pre>
|
||||
string s = "yabba dabba doo";
|
||||
pcrecpp::RE("b+").Replace("d", &s);
|
||||
</pre>
|
||||
will leave "s" containing "yada dabba doo". The result is true if the pattern
|
||||
matches and a replacement occurs, false otherwise.
|
||||
</P>
|
||||
<P>
|
||||
<b>GlobalReplace</b> is like <b>Replace</b> except that it replaces all
|
||||
occurrences of the pattern in the string with the rewrite. Replacements are
|
||||
not subject to re-matching. For example:
|
||||
<pre>
|
||||
string s = "yabba dabba doo";
|
||||
pcrecpp::RE("b+").GlobalReplace("d", &s);
|
||||
</pre>
|
||||
will leave "s" containing "yada dada doo". It returns the number of
|
||||
replacements made.
|
||||
</P>
|
||||
<P>
|
||||
<b>Extract</b> is like <b>Replace</b>, except that if the pattern matches,
|
||||
"rewrite" is copied into "out" (an additional argument) with substitutions.
|
||||
The non-matching portions of "text" are ignored. Returns true iff a match
|
||||
occurred and the extraction happened successfully; if no match occurs, the
|
||||
string is left unaffected.
|
||||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
The C++ wrapper was contributed by Google Inc.
|
||||
<br>
|
||||
Copyright © 2005 Google Inc.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
424
libs/pcre/doc/html/pcregrep.html
Normal file
424
libs/pcre/doc/html/pcregrep.html
Normal file
@ -0,0 +1,424 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcregrep specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcregrep man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
|
||||
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
|
||||
<li><a name="TOC3" href="#SEC3">OPTIONS</a>
|
||||
<li><a name="TOC4" href="#SEC4">ENVIRONMENT VARIABLES</a>
|
||||
<li><a name="TOC5" href="#SEC5">NEWLINES</a>
|
||||
<li><a name="TOC6" href="#SEC6">OPTIONS COMPATIBILITY</a>
|
||||
<li><a name="TOC7" href="#SEC7">OPTIONS WITH DATA</a>
|
||||
<li><a name="TOC8" href="#SEC8">MATCHING ERRORS</a>
|
||||
<li><a name="TOC9" href="#SEC9">DIAGNOSTICS</a>
|
||||
<li><a name="TOC10" href="#SEC10">AUTHOR</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
||||
<P>
|
||||
<b>pcregrep [options] [long options] [pattern] [path1 path2 ...]</b>
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
<b>pcregrep</b> searches files for character patterns, in the same way as other
|
||||
grep commands do, but it uses the PCRE regular expression library to support
|
||||
patterns that are compatible with the regular expressions of Perl 5. See
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b></a>
|
||||
for a full description of syntax and semantics of the regular expressions that
|
||||
PCRE supports.
|
||||
</P>
|
||||
<P>
|
||||
Patterns, whether supplied on the command line or in a separate file, are given
|
||||
without delimiters. For example:
|
||||
<pre>
|
||||
pcregrep Thursday /etc/motd
|
||||
</pre>
|
||||
If you attempt to use delimiters (for example, by surrounding a pattern with
|
||||
slashes, as is common in Perl scripts), they are interpreted as part of the
|
||||
pattern. Quotes can of course be used on the command line because they are
|
||||
interpreted by the shell, and indeed they are required if a pattern contains
|
||||
white space or shell metacharacters.
|
||||
</P>
|
||||
<P>
|
||||
The first argument that follows any option settings is treated as the single
|
||||
pattern to be matched when neither <b>-e</b> nor <b>-f</b> is present.
|
||||
Conversely, when one or both of these options are used to specify patterns, all
|
||||
arguments are treated as path names. At least one of <b>-e</b>, <b>-f</b>, or an
|
||||
argument pattern must be provided.
|
||||
</P>
|
||||
<P>
|
||||
If no files are specified, <b>pcregrep</b> reads the standard input. The
|
||||
standard input can also be referenced by a name consisting of a single hyphen.
|
||||
For example:
|
||||
<pre>
|
||||
pcregrep some-pattern /file1 - /file3
|
||||
</pre>
|
||||
By default, each line that matches the pattern is copied to the standard
|
||||
output, and if there is more than one file, the file name is output at the
|
||||
start of each line. However, there are options that can change how
|
||||
<b>pcregrep</b> behaves. In particular, the <b>-M</b> option makes it possible to
|
||||
search for patterns that span line boundaries. What defines a line boundary is
|
||||
controlled by the <b>-N</b> (<b>--newline</b>) option.
|
||||
</P>
|
||||
<P>
|
||||
Patterns are limited to 8K or BUFSIZ characters, whichever is the greater.
|
||||
BUFSIZ is defined in <b><stdio.h></b>.
|
||||
</P>
|
||||
<P>
|
||||
If the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variable is set,
|
||||
<b>pcregrep</b> uses the value to set a locale when calling the PCRE library.
|
||||
The <b>--locale</b> option can be used to override this.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">OPTIONS</a><br>
|
||||
<P>
|
||||
<b>--</b>
|
||||
This terminate the list of options. It is useful if the next item on the
|
||||
command line starts with a hyphen but is not an option. This allows for the
|
||||
processing of patterns and filenames that start with hyphens.
|
||||
</P>
|
||||
<P>
|
||||
<b>-A</b> <i>number</i>, <b>--after-context=</b><i>number</i>
|
||||
Output <i>number</i> lines of context after each matching line. If filenames
|
||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||
colon for the context lines. A line containing "--" is output between each
|
||||
group of lines, unless they are in fact contiguous in the input file. The value
|
||||
of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
|
||||
guarantees to have up to 8K of following text available for context output.
|
||||
</P>
|
||||
<P>
|
||||
<b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
|
||||
Output <i>number</i> lines of context before each matching line. If filenames
|
||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||
colon for the context lines. A line containing "--" is output between each
|
||||
group of lines, unless they are in fact contiguous in the input file. The value
|
||||
of <i>number</i> is expected to be relatively small. However, <b>pcregrep</b>
|
||||
guarantees to have up to 8K of preceding text available for context output.
|
||||
</P>
|
||||
<P>
|
||||
<b>-C</b> <i>number</i>, <b>--context=</b><i>number</i>
|
||||
Output <i>number</i> lines of context both before and after each matching line.
|
||||
This is equivalent to setting both <b>-A</b> and <b>-B</b> to the same value.
|
||||
</P>
|
||||
<P>
|
||||
<b>-c</b>, <b>--count</b>
|
||||
Do not output individual lines; instead just output a count of the number of
|
||||
lines that would otherwise have been output. If several files are given, a
|
||||
count is output for each of them. In this mode, the <b>-A</b>, <b>-B</b>, and
|
||||
<b>-C</b> options are ignored.
|
||||
</P>
|
||||
<P>
|
||||
<b>--colour</b>, <b>--color</b>
|
||||
If this option is given without any data, it is equivalent to "--colour=auto".
|
||||
If data is required, it must be given in the same shell item, separated by an
|
||||
equals sign.
|
||||
</P>
|
||||
<P>
|
||||
<b>--colour=</b><i>value</i>, <b>--color=</b><i>value</i>
|
||||
This option specifies under what circumstances the part of a line that matched
|
||||
a pattern should be coloured in the output. The value may be "never" (the
|
||||
default), "always", or "auto". In the latter case, colouring happens only if
|
||||
the standard output is connected to a terminal. The colour can be specified by
|
||||
setting the environment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value
|
||||
of this variable should be a string of two numbers, separated by a semicolon.
|
||||
They are copied directly into the control string for setting colour on a
|
||||
terminal, so it is your responsibility to ensure that they make sense. If
|
||||
neither of the environment variables is set, the default is "1;31", which gives
|
||||
red.
|
||||
</P>
|
||||
<P>
|
||||
<b>-D</b> <i>action</i>, <b>--devices=</b><i>action</i>
|
||||
If an input path is not a regular file or a directory, "action" specifies how
|
||||
it is to be processed. Valid values are "read" (the default) or "skip"
|
||||
(silently skip the path).
|
||||
</P>
|
||||
<P>
|
||||
<b>-d</b> <i>action</i>, <b>--directories=</b><i>action</i>
|
||||
If an input path is a directory, "action" specifies how it is to be processed.
|
||||
Valid values are "read" (the default), "recurse" (equivalent to the <b>-r</b>
|
||||
option), or "skip" (silently skip the path). In the default case, directories
|
||||
are read as if they were ordinary files. In some operating systems the effect
|
||||
of reading a directory like this is an immediate end-of-file.
|
||||
</P>
|
||||
<P>
|
||||
<b>-e</b> <i>pattern</i>, <b>--regex=</b><i>pattern</i>,
|
||||
<b>--regexp=</b><i>pattern</i> Specify a pattern to be matched. This option can
|
||||
be used multiple times in order to specify several patterns. It can also be
|
||||
used as a way of specifying a single pattern that starts with a hyphen. When
|
||||
<b>-e</b> is used, no argument pattern is taken from the command line; all
|
||||
arguments are treated as file names. There is an overall maximum of 100
|
||||
patterns. They are applied to each line in the order in which they are defined
|
||||
until one matches (or fails to match if <b>-v</b> is used). If <b>-f</b> is used
|
||||
with <b>-e</b>, the command line patterns are matched first, followed by the
|
||||
patterns from the file, independent of the order in which these options are
|
||||
specified. Note that multiple use of <b>-e</b> is not the same as a single
|
||||
pattern with alternatives. For example, X|Y finds the first character in a line
|
||||
that is X or Y, whereas if the two patterns are given separately,
|
||||
<b>pcregrep</b> finds X if it is present, even if it follows Y in the line. It
|
||||
finds Y only if there is no X in the line. This really matters only if you are
|
||||
using <b>-o</b> to show the portion of the line that matched.
|
||||
</P>
|
||||
<P>
|
||||
<b>--exclude</b>=<i>pattern</i>
|
||||
When <b>pcregrep</b> is searching the files in a directory as a consequence of
|
||||
the <b>-r</b> (recursive search) option, any files whose names match the pattern
|
||||
are excluded. The pattern is a PCRE regular expression. If a file name matches
|
||||
both <b>--include</b> and <b>--exclude</b>, it is excluded. There is no short
|
||||
form for this option.
|
||||
</P>
|
||||
<P>
|
||||
<b>-F</b>, <b>--fixed-strings</b>
|
||||
Interpret each pattern as a list of fixed strings, separated by newlines,
|
||||
instead of as a regular expression. The <b>-w</b> (match as a word) and <b>-x</b>
|
||||
(match whole line) options can be used with <b>-F</b>. They apply to each of the
|
||||
fixed strings. A line is selected if any of the fixed strings are found in it
|
||||
(subject to <b>-w</b> or <b>-x</b>, if present).
|
||||
</P>
|
||||
<P>
|
||||
<b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
|
||||
Read a number of patterns from the file, one per line, and match them against
|
||||
each line of input. A data line is output if any of the patterns match it. The
|
||||
filename can be given as "-" to refer to the standard input. When <b>-f</b> is
|
||||
used, patterns specified on the command line using <b>-e</b> may also be
|
||||
present; they are tested before the file's patterns. However, no other pattern
|
||||
is taken from the command line; all arguments are treated as file names. There
|
||||
is an overall maximum of 100 patterns. Trailing white space is removed from
|
||||
each line, and blank lines are ignored. An empty file contains no patterns and
|
||||
therefore matches nothing.
|
||||
</P>
|
||||
<P>
|
||||
<b>-H</b>, <b>--with-filename</b>
|
||||
Force the inclusion of the filename at the start of output lines when searching
|
||||
a single file. By default, the filename is not shown in this case. For matching
|
||||
lines, the filename is followed by a colon and a space; for context lines, a
|
||||
hyphen separator is used. If a line number is also being output, it follows the
|
||||
file name without a space.
|
||||
</P>
|
||||
<P>
|
||||
<b>-h</b>, <b>--no-filename</b>
|
||||
Suppress the output filenames when searching multiple files. By default,
|
||||
filenames are shown when multiple files are searched. For matching lines, the
|
||||
filename is followed by a colon and a space; for context lines, a hyphen
|
||||
separator is used. If a line number is also being output, it follows the file
|
||||
name without a space.
|
||||
</P>
|
||||
<P>
|
||||
<b>--help</b>
|
||||
Output a brief help message and exit.
|
||||
</P>
|
||||
<P>
|
||||
<b>-i</b>, <b>--ignore-case</b>
|
||||
Ignore upper/lower case distinctions during comparisons.
|
||||
</P>
|
||||
<P>
|
||||
<b>--include</b>=<i>pattern</i>
|
||||
When <b>pcregrep</b> is searching the files in a directory as a consequence of
|
||||
the <b>-r</b> (recursive search) option, only those files whose names match the
|
||||
pattern are included. The pattern is a PCRE regular expression. If a file name
|
||||
matches both <b>--include</b> and <b>--exclude</b>, it is excluded. There is no
|
||||
short form for this option.
|
||||
</P>
|
||||
<P>
|
||||
<b>-L</b>, <b>--files-without-match</b>
|
||||
Instead of outputting lines from the files, just output the names of the files
|
||||
that do not contain any lines that would have been output. Each file name is
|
||||
output once, on a separate line.
|
||||
</P>
|
||||
<P>
|
||||
<b>-l</b>, <b>--files-with-matches</b>
|
||||
Instead of outputting lines from the files, just output the names of the files
|
||||
containing lines that would have been output. Each file name is output
|
||||
once, on a separate line. Searching stops as soon as a matching line is found
|
||||
in a file.
|
||||
</P>
|
||||
<P>
|
||||
<b>--label</b>=<i>name</i>
|
||||
This option supplies a name to be used for the standard input when file names
|
||||
are being output. If not supplied, "(standard input)" is used. There is no
|
||||
short form for this option.
|
||||
</P>
|
||||
<P>
|
||||
<b>--locale</b>=<i>locale-name</i>
|
||||
This option specifies a locale to be used for pattern matching. It overrides
|
||||
the value in the <b>LC_ALL</b> or <b>LC_CTYPE</b> environment variables. If no
|
||||
locale is specified, the PCRE library's default (usually the "C" locale) is
|
||||
used. There is no short form for this option.
|
||||
</P>
|
||||
<P>
|
||||
<b>-M</b>, <b>--multiline</b>
|
||||
Allow patterns to match more than one line. When this option is given, patterns
|
||||
may usefully contain literal newline characters and internal occurrences of ^
|
||||
and $ characters. The output for any one match may consist of more than one
|
||||
line. When this option is set, the PCRE library is called in "multiline" mode.
|
||||
There is a limit to the number of lines that can be matched, imposed by the way
|
||||
that <b>pcregrep</b> buffers the input file as it scans it. However,
|
||||
<b>pcregrep</b> ensures that at least 8K characters or the rest of the document
|
||||
(whichever is the shorter) are available for forward matching, and similarly
|
||||
the previous 8K characters (or all the previous characters, if fewer than 8K)
|
||||
are guaranteed to be available for lookbehind assertions.
|
||||
</P>
|
||||
<P>
|
||||
<b>-N</b> <i>newline-type</i>, <b>--newline=</b><i>newline-type</i>
|
||||
The PCRE library supports three different character sequences for indicating
|
||||
the ends of lines. They are the single-character sequences CR (carriage return)
|
||||
and LF (linefeed), and the two-character sequence CR, LF. When the library is
|
||||
built, a default line-ending sequence is specified. This is normally the
|
||||
standard sequence for the operating system. Unless otherwise specified by this
|
||||
option, <b>pcregrep</b> uses the default. The possible values for this option
|
||||
are CR, LF, or CRLF. This makes it possible to use <b>pcregrep</b> on files that
|
||||
have come from other environments without having to modify their line endings.
|
||||
If the data that is being scanned does not agree with the convention set by
|
||||
this option, <b>pcregrep</b> may behave in strange ways.
|
||||
</P>
|
||||
<P>
|
||||
<b>-n</b>, <b>--line-number</b>
|
||||
Precede each output line by its line number in the file, followed by a colon
|
||||
and a space for matching lines or a hyphen and a space for context lines. If
|
||||
the filename is also being output, it precedes the line number.
|
||||
</P>
|
||||
<P>
|
||||
<b>-o</b>, <b>--only-matching</b>
|
||||
Show only the part of the line that matched a pattern. In this mode, no
|
||||
context is shown. That is, the <b>-A</b>, <b>-B</b>, and <b>-C</b> options are
|
||||
ignored.
|
||||
</P>
|
||||
<P>
|
||||
<b>-q</b>, <b>--quiet</b>
|
||||
Work quietly, that is, display nothing except error messages. The exit
|
||||
status indicates whether or not any matches were found.
|
||||
</P>
|
||||
<P>
|
||||
<b>-r</b>, <b>--recursive</b>
|
||||
If any given path is a directory, recursively scan the files it contains,
|
||||
taking note of any <b>--include</b> and <b>--exclude</b> settings. By default, a
|
||||
directory is read as a normal file; in some operating systems this gives an
|
||||
immediate end-of-file. This option is a shorthand for setting the <b>-d</b>
|
||||
option to "recurse".
|
||||
</P>
|
||||
<P>
|
||||
<b>-s</b>, <b>--no-messages</b>
|
||||
Suppress error messages about non-existent or unreadable files. Such files are
|
||||
quietly skipped. However, the return code is still 2, even if matches were
|
||||
found in other files.
|
||||
</P>
|
||||
<P>
|
||||
<b>-u</b>, <b>--utf-8</b>
|
||||
Operate in UTF-8 mode. This option is available only if PCRE has been compiled
|
||||
with UTF-8 support. Both patterns and subject lines must be valid strings of
|
||||
UTF-8 characters.
|
||||
</P>
|
||||
<P>
|
||||
<b>-V</b>, <b>--version</b>
|
||||
Write the version numbers of <b>pcregrep</b> and the PCRE library that is being
|
||||
used to the standard error stream.
|
||||
</P>
|
||||
<P>
|
||||
<b>-v</b>, <b>--invert-match</b>
|
||||
Invert the sense of the match, so that lines which do <i>not</i> match any of
|
||||
the patterns are the ones that are found.
|
||||
</P>
|
||||
<P>
|
||||
<b>-w</b>, <b>--word-regex</b>, <b>--word-regexp</b>
|
||||
Force the patterns to match only whole words. This is equivalent to having \b
|
||||
at the start and end of the pattern.
|
||||
</P>
|
||||
<P>
|
||||
<b>-x</b>, <b>--line-regex</b>, \fP--line-regexp\fP
|
||||
Force the patterns to be anchored (each must start matching at the beginning of
|
||||
a line) and in addition, require them to match entire lines. This is
|
||||
equivalent to having ^ and $ characters at the start and end of each
|
||||
alternative branch in every pattern.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
|
||||
<P>
|
||||
The environment variables <b>LC_ALL</b> and <b>LC_CTYPE</b> are examined, in that
|
||||
order, for a locale. The first one that is set is used. This can be overridden
|
||||
by the <b>--locale</b> option. If no locale is set, the PCRE library's default
|
||||
(usually the "C" locale) is used.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">NEWLINES</a><br>
|
||||
<P>
|
||||
The <b>-N</b> (<b>--newline</b>) option allows <b>pcregrep</b> to scan files with
|
||||
different newline conventions from the default. However, the setting of this
|
||||
option does not affect the way in which <b>pcregrep</b> writes information to
|
||||
the standard error and output streams. It uses the string "\n" in C
|
||||
<b>printf()</b> calls to indicate newlines, relying on the C I/O library to
|
||||
convert this to an appropriate sequence if the output is sent to a file.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">OPTIONS COMPATIBILITY</a><br>
|
||||
<P>
|
||||
The majority of short and long forms of <b>pcregrep</b>'s options are the same
|
||||
as in the GNU <b>grep</b> program. Any long option of the form
|
||||
<b>--xxx-regexp</b> (GNU terminology) is also available as <b>--xxx-regex</b>
|
||||
(PCRE terminology). However, the <b>--locale</b>, <b>-M</b>, <b>--multiline</b>,
|
||||
<b>-u</b>, and <b>--utf-8</b> options are specific to <b>pcregrep</b>.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">OPTIONS WITH DATA</a><br>
|
||||
<P>
|
||||
There are four different ways in which an option with data can be specified.
|
||||
If a short form option is used, the data may follow immediately, or in the next
|
||||
command line item. For example:
|
||||
<pre>
|
||||
-f/some/file
|
||||
-f /some/file
|
||||
</pre>
|
||||
If a long form option is used, the data may appear in the same command line
|
||||
item, separated by an equals character, or (with one exception) it may appear
|
||||
in the next command line item. For example:
|
||||
<pre>
|
||||
--file=/some/file
|
||||
--file /some/file
|
||||
</pre>
|
||||
Note, however, that if you want to supply a file name beginning with ~ as data
|
||||
in a shell command, and have the shell expand ~ to a home directory, you must
|
||||
separate the file name from the option, because the shell does not treat ~
|
||||
specially unless it is at the start of an item.
|
||||
</P>
|
||||
<P>
|
||||
The exception to the above is the <b>--colour</b> (or <b>--color</b>) option,
|
||||
for which the data is optional. If this option does have data, it must be given
|
||||
in the first form, using an equals character. Otherwise it will be assumed that
|
||||
it has no data.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">MATCHING ERRORS</a><br>
|
||||
<P>
|
||||
It is possible to supply a regular expression that takes a very long time to
|
||||
fail to match certain lines. Such patterns normally involve nested indefinite
|
||||
repeats, for example: (a+)*\d when matched against a line of a's with no final
|
||||
digit. The PCRE matching function has a resource limit that causes it to abort
|
||||
in these circumstances. If this happens, <b>pcregrep</b> outputs an error
|
||||
message and the line that caused the problem to the standard error stream. If
|
||||
there are more than 20 such errors, <b>pcregrep</b> gives up.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">DIAGNOSTICS</a><br>
|
||||
<P>
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
|
||||
for syntax errors and non-existent or inacessible files (even if matches were
|
||||
found in other files) or too many matching errors. Using the <b>-s</b> option to
|
||||
suppress error messages about inaccessble files does not affect the return
|
||||
code.
|
||||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
University Computing Service
|
||||
<br>
|
||||
Cambridge CB2 3QG, England.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 06 June 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
192
libs/pcre/doc/html/pcrematching.html
Normal file
192
libs/pcre/doc/html/pcrematching.html
Normal file
@ -0,0 +1,192 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcrematching specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcrematching man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">PCRE MATCHING ALGORITHMS</a>
|
||||
<li><a name="TOC2" href="#SEC2">REGULAR EXPRESSIONS AS TREES</a>
|
||||
<li><a name="TOC3" href="#SEC3">THE STANDARD MATCHING ALGORITHM</a>
|
||||
<li><a name="TOC4" href="#SEC4">THE DFA MATCHING ALGORITHM</a>
|
||||
<li><a name="TOC5" href="#SEC5">ADVANTAGES OF THE DFA ALGORITHM</a>
|
||||
<li><a name="TOC6" href="#SEC6">DISADVANTAGES OF THE DFA ALGORITHM</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">PCRE MATCHING ALGORITHMS</a><br>
|
||||
<P>
|
||||
This document describes the two different algorithms that are available in PCRE
|
||||
for matching a compiled regular expression against a given subject string. The
|
||||
"standard" algorithm is the one provided by the <b>pcre_exec()</b> function.
|
||||
This works in the same was as Perl's matching function, and provides a
|
||||
Perl-compatible matching operation.
|
||||
</P>
|
||||
<P>
|
||||
An alternative algorithm is provided by the <b>pcre_dfa_exec()</b> function;
|
||||
this operates in a different way, and is not Perl-compatible. It has advantages
|
||||
and disadvantages compared with the standard algorithm, and these are described
|
||||
below.
|
||||
</P>
|
||||
<P>
|
||||
When there is only one possible way in which a given subject string can match a
|
||||
pattern, the two algorithms give the same answer. A difference arises, however,
|
||||
when there are multiple possibilities. For example, if the pattern
|
||||
<pre>
|
||||
^<.*>
|
||||
</pre>
|
||||
is matched against the string
|
||||
<pre>
|
||||
<something> <something else> <something further>
|
||||
</pre>
|
||||
there are three possible answers. The standard algorithm finds only one of
|
||||
them, whereas the DFA algorithm finds all three.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">REGULAR EXPRESSIONS AS TREES</a><br>
|
||||
<P>
|
||||
The set of strings that are matched by a regular expression can be represented
|
||||
as a tree structure. An unlimited repetition in the pattern makes the tree of
|
||||
infinite size, but it is still a tree. Matching the pattern to a given subject
|
||||
string (from a given starting point) can be thought of as a search of the tree.
|
||||
There are two ways to search a tree: depth-first and breadth-first, and these
|
||||
correspond to the two matching algorithms provided by PCRE.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">THE STANDARD MATCHING ALGORITHM</a><br>
|
||||
<P>
|
||||
In the terminology of Jeffrey Friedl's book \fIMastering Regular
|
||||
Expressions\fP, the standard algorithm is an "NFA algorithm". It conducts a
|
||||
depth-first search of the pattern tree. That is, it proceeds along a single
|
||||
path through the tree, checking that the subject matches what is required. When
|
||||
there is a mismatch, the algorithm tries any alternatives at the current point,
|
||||
and if they all fail, it backs up to the previous branch point in the tree, and
|
||||
tries the next alternative branch at that level. This often involves backing up
|
||||
(moving to the left) in the subject string as well. The order in which
|
||||
repetition branches are tried is controlled by the greedy or ungreedy nature of
|
||||
the quantifier.
|
||||
</P>
|
||||
<P>
|
||||
If a leaf node is reached, a matching string has been found, and at that point
|
||||
the algorithm stops. Thus, if there is more than one possible match, this
|
||||
algorithm returns the first one that it finds. Whether this is the shortest,
|
||||
the longest, or some intermediate length depends on the way the greedy and
|
||||
ungreedy repetition quantifiers are specified in the pattern.
|
||||
</P>
|
||||
<P>
|
||||
Because it ends up with a single path through the tree, it is relatively
|
||||
straightforward for this algorithm to keep track of the substrings that are
|
||||
matched by portions of the pattern in parentheses. This provides support for
|
||||
capturing parentheses and back references.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">THE DFA MATCHING ALGORITHM</a><br>
|
||||
<P>
|
||||
DFA stands for "deterministic finite automaton", but you do not need to
|
||||
understand the origins of that name. This algorithm conducts a breadth-first
|
||||
search of the tree. Starting from the first matching point in the subject, it
|
||||
scans the subject string from left to right, once, character by character, and
|
||||
as it does this, it remembers all the paths through the tree that represent
|
||||
valid matches.
|
||||
</P>
|
||||
<P>
|
||||
The scan continues until either the end of the subject is reached, or there are
|
||||
no more unterminated paths. At this point, terminated paths represent the
|
||||
different matching possibilities (if there are none, the match has failed).
|
||||
Thus, if there is more than one possible match, this algorithm finds all of
|
||||
them, and in particular, it finds the longest. In PCRE, there is an option to
|
||||
stop the algorithm after the first match (which is necessarily the shortest)
|
||||
has been found.
|
||||
</P>
|
||||
<P>
|
||||
Note that all the matches that are found start at the same point in the
|
||||
subject. If the pattern
|
||||
<pre>
|
||||
cat(er(pillar)?)
|
||||
</pre>
|
||||
is matched against the string "the caterpillar catchment", the result will be
|
||||
the three strings "cat", "cater", and "caterpillar" that start at the fourth
|
||||
character of the subject. The algorithm does not automatically move on to find
|
||||
matches that start at later positions.
|
||||
</P>
|
||||
<P>
|
||||
There are a number of features of PCRE regular expressions that are not
|
||||
supported by the DFA matching algorithm. They are as follows:
|
||||
</P>
|
||||
<P>
|
||||
1. Because the algorithm finds all possible matches, the greedy or ungreedy
|
||||
nature of repetition quantifiers is not relevant. Greedy and ungreedy
|
||||
quantifiers are treated in exactly the same way.
|
||||
</P>
|
||||
<P>
|
||||
2. When dealing with multiple paths through the tree simultaneously, it is not
|
||||
straightforward to keep track of captured substrings for the different matching
|
||||
possibilities, and PCRE's implementation of this algorithm does not attempt to
|
||||
do this. This means that no captured substrings are available.
|
||||
</P>
|
||||
<P>
|
||||
3. Because no substrings are captured, back references within the pattern are
|
||||
not supported, and cause errors if encountered.
|
||||
</P>
|
||||
<P>
|
||||
4. For the same reason, conditional expressions that use a backreference as the
|
||||
condition are not supported.
|
||||
</P>
|
||||
<P>
|
||||
5. Callouts are supported, but the value of the <i>capture_top</i> field is
|
||||
always 1, and the value of the <i>capture_last</i> field is always -1.
|
||||
</P>
|
||||
<P>
|
||||
6.
|
||||
The \C escape sequence, which (in the standard algorithm) matches a single
|
||||
byte, even in UTF-8 mode, is not supported because the DFA algorithm moves
|
||||
through the subject string one character at a time, for all active paths
|
||||
through the tree.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">ADVANTAGES OF THE DFA ALGORITHM</a><br>
|
||||
<P>
|
||||
Using the DFA matching algorithm provides the following advantages:
|
||||
</P>
|
||||
<P>
|
||||
1. All possible matches (at a single point in the subject) are automatically
|
||||
found, and in particular, the longest match is found. To find more than one
|
||||
match using the standard algorithm, you have to do kludgy things with
|
||||
callouts.
|
||||
</P>
|
||||
<P>
|
||||
2. There is much better support for partial matching. The restrictions on the
|
||||
content of the pattern that apply when using the standard algorithm for partial
|
||||
matching do not apply to the DFA algorithm. For non-anchored patterns, the
|
||||
starting position of a partial match is available.
|
||||
</P>
|
||||
<P>
|
||||
3. Because the DFA algorithm scans the subject string just once, and never
|
||||
needs to backtrack, it is possible to pass very long subject strings to the
|
||||
matching function in several pieces, checking for partial matching each time.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">DISADVANTAGES OF THE DFA ALGORITHM</a><br>
|
||||
<P>
|
||||
The DFA algorithm suffers from a number of disadvantages:
|
||||
</P>
|
||||
<P>
|
||||
1. It is substantially slower than the standard algorithm. This is partly
|
||||
because it has to search for all possible matches, but is also because it is
|
||||
less susceptible to optimization.
|
||||
</P>
|
||||
<P>
|
||||
2. Capturing parentheses and back references are not supported.
|
||||
</P>
|
||||
<P>
|
||||
3. The "atomic group" feature of PCRE regular expressions is supported, but
|
||||
does not provide the advantage that it does for the standard algorithm.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 06 June 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
225
libs/pcre/doc/html/pcrepartial.html
Normal file
225
libs/pcre/doc/html/pcrepartial.html
Normal file
@ -0,0 +1,225 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcrepartial specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcrepartial man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">PARTIAL MATCHING IN PCRE</a>
|
||||
<li><a name="TOC2" href="#SEC2">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a>
|
||||
<li><a name="TOC3" href="#SEC3">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a>
|
||||
<li><a name="TOC4" href="#SEC4">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">PARTIAL MATCHING IN PCRE</a><br>
|
||||
<P>
|
||||
In normal use of PCRE, if the subject string that is passed to
|
||||
<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> matches as far as it goes, but is
|
||||
too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There
|
||||
are circumstances where it might be helpful to distinguish this case from other
|
||||
cases in which there is no match.
|
||||
</P>
|
||||
<P>
|
||||
Consider, for example, an application where a human is required to type in data
|
||||
for a field with specific formatting requirements. An example might be a date
|
||||
in the form <i>ddmmmyy</i>, defined by this pattern:
|
||||
<pre>
|
||||
^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$
|
||||
</pre>
|
||||
If the application sees the user's keystrokes one by one, and can check that
|
||||
what has been typed so far is potentially valid, it is able to raise an error
|
||||
as soon as a mistake is made, possibly beeping and not reflecting the
|
||||
character that has been typed. This immediate feedback is likely to be a better
|
||||
user interface than a check that is delayed until the entire string has been
|
||||
entered.
|
||||
</P>
|
||||
<P>
|
||||
PCRE supports the concept of partial matching by means of the PCRE_PARTIAL
|
||||
option, which can be set when calling <b>pcre_exec()</b> or
|
||||
<b>pcre_dfa_exec()</b>. When this flag is set for <b>pcre_exec()</b>, the return
|
||||
code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time
|
||||
during the matching process the last part of the subject string matched part of
|
||||
the pattern. Unfortunately, for non-anchored matching, it is not possible to
|
||||
obtain the position of the start of the partial match. No captured data is set
|
||||
when PCRE_ERROR_PARTIAL is returned.
|
||||
</P>
|
||||
<P>
|
||||
When PCRE_PARTIAL is set for <b>pcre_dfa_exec()</b>, the return code
|
||||
PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the
|
||||
subject is reached, there have been no complete matches, but there is still at
|
||||
least one matching possibility. The portion of the string that provided the
|
||||
partial match is set as the first matching string.
|
||||
</P>
|
||||
<P>
|
||||
Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the
|
||||
last literal byte in a pattern, and abandons matching immediately if such a
|
||||
byte is not present in the subject string. This optimization cannot be used
|
||||
for a subject string that might match only partially.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">RESTRICTED PATTERNS FOR PCRE_PARTIAL</a><br>
|
||||
<P>
|
||||
Because of the way certain internal optimizations are implemented in the
|
||||
<b>pcre_exec()</b> function, the PCRE_PARTIAL option cannot be used with all
|
||||
patterns. These restrictions do not apply when <b>pcre_dfa_exec()</b> is used.
|
||||
For <b>pcre_exec()</b>, repeated single characters such as
|
||||
<pre>
|
||||
a{2,4}
|
||||
</pre>
|
||||
and repeated single metasequences such as
|
||||
<pre>
|
||||
\d+
|
||||
</pre>
|
||||
are not permitted if the maximum number of occurrences is greater than one.
|
||||
Optional items such as \d? (where the maximum is one) are permitted.
|
||||
Quantifiers with any values are permitted after parentheses, so the invalid
|
||||
examples above can be coded thus:
|
||||
<pre>
|
||||
(a){2,4}
|
||||
(\d)+
|
||||
</pre>
|
||||
These constructions run more slowly, but for the kinds of application that are
|
||||
envisaged for this facility, this is not felt to be a major restriction.
|
||||
</P>
|
||||
<P>
|
||||
If PCRE_PARTIAL is set for a pattern that does not conform to the restrictions,
|
||||
<b>pcre_exec()</b> returns the error code PCRE_ERROR_BADPARTIAL (-13).
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">EXAMPLE OF PARTIAL MATCHING USING PCRETEST</a><br>
|
||||
<P>
|
||||
If the escape sequence \P is present in a <b>pcretest</b> data line, the
|
||||
PCRE_PARTIAL flag is used for the match. Here is a run of <b>pcretest</b> that
|
||||
uses the date example quoted above:
|
||||
<pre>
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
data> 25jun04\P
|
||||
0: 25jun04
|
||||
1: jun
|
||||
data> 25dec3\P
|
||||
Partial match
|
||||
data> 3ju\P
|
||||
Partial match
|
||||
data> 3juj\P
|
||||
No match
|
||||
data> j\P
|
||||
No match
|
||||
</pre>
|
||||
The first data string is matched completely, so <b>pcretest</b> shows the
|
||||
matched substrings. The remaining four strings do not match the complete
|
||||
pattern, but the first two are partial matches. The same test, using DFA
|
||||
matching (by means of the \D escape sequence), produces the following output:
|
||||
<pre>
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
data> 25jun04\P\D
|
||||
0: 25jun04
|
||||
data> 23dec3\P\D
|
||||
Partial match: 23dec3
|
||||
data> 3ju\P\D
|
||||
Partial match: 3ju
|
||||
data> 3juj\P\D
|
||||
No match
|
||||
data> j\P\D
|
||||
No match
|
||||
</pre>
|
||||
Notice that in this case the portion of the string that was matched is made
|
||||
available.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()</a><br>
|
||||
<P>
|
||||
When a partial match has been found using <b>pcre_dfa_exec()</b>, it is possible
|
||||
to continue the match by providing additional subject data and calling
|
||||
<b>pcre_dfa_exec()</b> again with the PCRE_DFA_RESTART option and the same
|
||||
working space (where details of the previous partial match are stored). Here is
|
||||
an example using <b>pcretest</b>, where the \R escape sequence sets the
|
||||
PCRE_DFA_RESTART option and the \D escape sequence requests the use of
|
||||
<b>pcre_dfa_exec()</b>:
|
||||
<pre>
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
data> 23ja\P\D
|
||||
Partial match: 23ja
|
||||
data> n05\R\D
|
||||
0: n05
|
||||
</pre>
|
||||
The first call has "23ja" as the subject, and requests partial matching; the
|
||||
second call has "n05" as the subject for the continued (restarted) match.
|
||||
Notice that when the match is complete, only the last part is shown; PCRE does
|
||||
not retain the previously partially-matched string. It is up to the calling
|
||||
program to do that if it needs to.
|
||||
</P>
|
||||
<P>
|
||||
This facility can be used to pass very long subject strings to
|
||||
<b>pcre_dfa_exec()</b>. However, some care is needed for certain types of
|
||||
pattern.
|
||||
</P>
|
||||
<P>
|
||||
1. If the pattern contains tests for the beginning or end of a line, you need
|
||||
to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
|
||||
subject string for any call does not contain the beginning or end of a line.
|
||||
</P>
|
||||
<P>
|
||||
2. If the pattern contains backward assertions (including \b or \B), you need
|
||||
to arrange for some overlap in the subject strings to allow for this. For
|
||||
example, you could pass the subject in chunks that were 500 bytes long, but in
|
||||
a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
|
||||
bytes at the start of the buffer.
|
||||
</P>
|
||||
<P>
|
||||
3. Matching a subject string that is split into multiple segments does not
|
||||
always produce exactly the same result as matching over one single long string.
|
||||
The difference arises when there are multiple matching possibilities, because a
|
||||
partial match result is given only when there are no completed matches in a
|
||||
call to fBpcre_dfa_exec()\fP. This means that as soon as the shortest match has
|
||||
been found, continuation to a new subject segment is no longer possible.
|
||||
Consider this <b>pcretest</b> example:
|
||||
<pre>
|
||||
re> /dog(sbody)?/
|
||||
data> do\P\D
|
||||
Partial match: do
|
||||
data> gsb\R\P\D
|
||||
0: g
|
||||
data> dogsbody\D
|
||||
0: dogsbody
|
||||
1: dog
|
||||
</pre>
|
||||
The pattern matches the words "dog" or "dogsbody". When the subject is
|
||||
presented in several parts ("do" and "gsb" being the first two) the match stops
|
||||
when "dog" has been found, and it is not possible to continue. On the other
|
||||
hand, if "dogsbody" is presented as a single string, both matches are found.
|
||||
</P>
|
||||
<P>
|
||||
Because of this phenomenon, it does not usually make sense to end a pattern
|
||||
that is going to be matched in this way with a variable repeat.
|
||||
</P>
|
||||
<P>
|
||||
4. Patterns that contain alternatives at the top level which do not all
|
||||
start with the same pattern item may not work as expected. For example,
|
||||
consider this pattern:
|
||||
<pre>
|
||||
1234|3789
|
||||
</pre>
|
||||
If the first part of the subject is "ABC123", a partial match of the first
|
||||
alternative is found at offset 3. There is no partial match for the second
|
||||
alternative, because such a match does not start at the same point in the
|
||||
subject string. Attempting to continue with the string "789" does not yield a
|
||||
match because only those alternatives that match at one point in the subject
|
||||
are remembered. The problem arises because the start of the second alternative
|
||||
matches within the first alternative. There is no problem with anchored
|
||||
patterns or patterns such as:
|
||||
<pre>
|
||||
1234|ABCD
|
||||
</pre>
|
||||
where no string can be a partial match for both alternatives.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 16 January 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
1661
libs/pcre/doc/html/pcrepattern.html
Normal file
1661
libs/pcre/doc/html/pcrepattern.html
Normal file
File diff suppressed because it is too large
Load Diff
97
libs/pcre/doc/html/pcreperform.html
Normal file
97
libs/pcre/doc/html/pcreperform.html
Normal file
@ -0,0 +1,97 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcreperform specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcreperform man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
PCRE PERFORMANCE
|
||||
</b><br>
|
||||
<P>
|
||||
Certain items that may appear in regular expression patterns are more efficient
|
||||
than others. It is more efficient to use a character class like [aeiou] than a
|
||||
set of alternatives such as (a|e|i|o|u). In general, the simplest construction
|
||||
that provides the required behaviour is usually the most efficient. Jeffrey
|
||||
Friedl's book contains a lot of useful general discussion about optimizing
|
||||
regular expressions for efficient performance. This document contains a few
|
||||
observations about PCRE.
|
||||
</P>
|
||||
<P>
|
||||
Using Unicode character properties (the \p, \P, and \X escapes) is slow,
|
||||
because PCRE has to scan a structure that contains data for over fifteen
|
||||
thousand characters whenever it needs a character's property. If you can find
|
||||
an alternative pattern that does not use character properties, it will probably
|
||||
be faster.
|
||||
</P>
|
||||
<P>
|
||||
When a pattern begins with .* not in parentheses, or in parentheses that are
|
||||
not the subject of a backreference, and the PCRE_DOTALL option is set, the
|
||||
pattern is implicitly anchored by PCRE, since it can match only at the start of
|
||||
a subject string. However, if PCRE_DOTALL is not set, PCRE cannot make this
|
||||
optimization, because the . metacharacter does not then match a newline, and if
|
||||
the subject string contains newlines, the pattern may match from the character
|
||||
immediately following one of them instead of from the very start. For example,
|
||||
the pattern
|
||||
<pre>
|
||||
.*second
|
||||
</pre>
|
||||
matches the subject "first\nand second" (where \n stands for a newline
|
||||
character), with the match starting at the seventh character. In order to do
|
||||
this, PCRE has to retry the match starting after every newline in the subject.
|
||||
</P>
|
||||
<P>
|
||||
If you are using such a pattern with subject strings that do not contain
|
||||
newlines, the best performance is obtained by setting PCRE_DOTALL, or starting
|
||||
the pattern with ^.* or ^.*? to indicate explicit anchoring. That saves PCRE
|
||||
from having to scan along the subject looking for a newline to restart at.
|
||||
</P>
|
||||
<P>
|
||||
Beware of patterns that contain nested indefinite repeats. These can take a
|
||||
long time to run when applied to a string that does not match. Consider the
|
||||
pattern fragment
|
||||
<pre>
|
||||
(a+)*
|
||||
</pre>
|
||||
This can match "aaaa" in 33 different ways, and this number increases very
|
||||
rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
|
||||
times, and for each of those cases other than 0, the + repeats can match
|
||||
different numbers of times.) When the remainder of the pattern is such that the
|
||||
entire match is going to fail, PCRE has in principle to try every possible
|
||||
variation, and this can take an extremely long time.
|
||||
</P>
|
||||
<P>
|
||||
An optimization catches some of the more simple cases such as
|
||||
<pre>
|
||||
(a+)*b
|
||||
</pre>
|
||||
where a literal character follows. Before embarking on the standard matching
|
||||
procedure, PCRE checks that there is a "b" later in the subject string, and if
|
||||
there is not, it fails the match immediately. However, when there is no
|
||||
following literal this optimization cannot be used. You can see the difference
|
||||
by comparing the behaviour of
|
||||
<pre>
|
||||
(a+)*\d
|
||||
</pre>
|
||||
with the pattern above. The former gives a failure almost instantly when
|
||||
applied to a whole line of "a" characters, whereas the latter takes an
|
||||
appreciable time with strings longer than about 20 characters.
|
||||
</P>
|
||||
<P>
|
||||
In many cases, the solution to this kind of performance issue is to use an
|
||||
atomic group or a possessive quantifier.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 28 February 2005
|
||||
<br>
|
||||
Copyright © 1997-2005 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
244
libs/pcre/doc/html/pcreposix.html
Normal file
244
libs/pcre/doc/html/pcreposix.html
Normal file
@ -0,0 +1,244 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcreposix specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcreposix man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SYNOPSIS OF POSIX API</a>
|
||||
<li><a name="TOC2" href="#SEC2">DESCRIPTION</a>
|
||||
<li><a name="TOC3" href="#SEC3">COMPILING A PATTERN</a>
|
||||
<li><a name="TOC4" href="#SEC4">MATCHING NEWLINE CHARACTERS</a>
|
||||
<li><a name="TOC5" href="#SEC5">MATCHING A PATTERN</a>
|
||||
<li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a>
|
||||
<li><a name="TOC7" href="#SEC7">MEMORY USAGE</a>
|
||||
<li><a name="TOC8" href="#SEC8">AUTHOR</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br>
|
||||
<P>
|
||||
<b>#include <pcreposix.h></b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int regcomp(regex_t *<i>preg</i>, const char *<i>pattern</i>,</b>
|
||||
<b>int <i>cflags</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b>int regexec(regex_t *<i>preg</i>, const char *<i>string</i>,</b>
|
||||
<b>size_t <i>nmatch</i>, regmatch_t <i>pmatch</i>[], int <i>eflags</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b>size_t regerror(int <i>errcode</i>, const regex_t *<i>preg</i>,</b>
|
||||
<b>char *<i>errbuf</i>, size_t <i>errbuf_size</i>);</b>
|
||||
</P>
|
||||
<P>
|
||||
<b>void regfree(regex_t *<i>preg</i>);</b>
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
This set of functions provides a POSIX-style API to the PCRE regular expression
|
||||
package. See the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation for a description of PCRE's native API, which contains much
|
||||
additional functionality.
|
||||
</P>
|
||||
<P>
|
||||
The functions described here are just wrapper functions that ultimately call
|
||||
the PCRE native API. Their prototypes are defined in the <b>pcreposix.h</b>
|
||||
header file, and on Unix systems the library itself is called
|
||||
<b>pcreposix.a</b>, so can be accessed by adding <b>-lpcreposix</b> to the
|
||||
command for linking an application that uses them. Because the POSIX functions
|
||||
call the native ones, it is also necessary to add <b>-lpcre</b>.
|
||||
</P>
|
||||
<P>
|
||||
I have implemented only those option bits that can be reasonably mapped to PCRE
|
||||
native options. In addition, the option REG_EXTENDED is defined with the value
|
||||
zero. This has no effect, but since programs that are written to the POSIX
|
||||
interface often use it, this makes it easier to slot in PCRE as a replacement
|
||||
library. Other POSIX options are not even defined.
|
||||
</P>
|
||||
<P>
|
||||
When PCRE is called via these functions, it is only the API that is POSIX-like
|
||||
in style. The syntax and semantics of the regular expressions themselves are
|
||||
still those of Perl, subject to the setting of various PCRE options, as
|
||||
described below. "POSIX-like in style" means that the API approximates to the
|
||||
POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
|
||||
domains it is probably even less compatible.
|
||||
</P>
|
||||
<P>
|
||||
The header for these functions is supplied as <b>pcreposix.h</b> to avoid any
|
||||
potential clash with other POSIX libraries. It can, of course, be renamed or
|
||||
aliased as <b>regex.h</b>, which is the "correct" name. It provides two
|
||||
structure types, <i>regex_t</i> for compiled internal forms, and
|
||||
<i>regmatch_t</i> for returning captured substrings. It also defines some
|
||||
constants whose names start with "REG_"; these are used for setting options and
|
||||
identifying error codes.
|
||||
</P>
|
||||
<P>
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">COMPILING A PATTERN</a><br>
|
||||
<P>
|
||||
The function <b>regcomp()</b> is called to compile a pattern into an
|
||||
internal form. The pattern is a C string terminated by a binary zero, and
|
||||
is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
|
||||
to a <b>regex_t</b> structure that is used as a base for storing information
|
||||
about the compiled regular expression.
|
||||
</P>
|
||||
<P>
|
||||
The argument <i>cflags</i> is either zero, or contains one or more of the bits
|
||||
defined by the following macros:
|
||||
<pre>
|
||||
REG_DOTALL
|
||||
</pre>
|
||||
The PCRE_DOTALL option is set when the regular expression is passed for
|
||||
compilation to the native function. Note that REG_DOTALL is not part of the
|
||||
POSIX standard.
|
||||
<pre>
|
||||
REG_ICASE
|
||||
</pre>
|
||||
The PCRE_CASELESS option is set when the regular expression is passed for
|
||||
compilation to the native function.
|
||||
<pre>
|
||||
REG_NEWLINE
|
||||
</pre>
|
||||
The PCRE_MULTILINE option is set when the regular expression is passed for
|
||||
compilation to the native function. Note that this does <i>not</i> mimic the
|
||||
defined POSIX behaviour for REG_NEWLINE (see the following section).
|
||||
<pre>
|
||||
REG_NOSUB
|
||||
</pre>
|
||||
The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
|
||||
for compilation to the native function. In addition, when a pattern that is
|
||||
compiled with this flag is passed to <b>regexec()</b> for matching, the
|
||||
<i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings
|
||||
are returned.
|
||||
<pre>
|
||||
REG_UTF8
|
||||
</pre>
|
||||
The PCRE_UTF8 option is set when the regular expression is passed for
|
||||
compilation to the native function. This causes the pattern itself and all data
|
||||
strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
|
||||
is not part of the POSIX standard.
|
||||
</P>
|
||||
<P>
|
||||
In the absence of these flags, no options are passed to the native function.
|
||||
This means the the regex is compiled with PCRE default semantics. In
|
||||
particular, the way it handles newline characters in the subject string is the
|
||||
Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
|
||||
<i>some</i> of the effects specified for REG_NEWLINE. It does not affect the way
|
||||
newlines are matched by . (they aren't) or by a negative class such as [^a]
|
||||
(they are).
|
||||
</P>
|
||||
<P>
|
||||
The yield of <b>regcomp()</b> is zero on success, and non-zero otherwise. The
|
||||
<i>preg</i> structure is filled in on success, and one member of the structure
|
||||
is public: <i>re_nsub</i> contains the number of capturing subpatterns in
|
||||
the regular expression. Various error codes are defined in the header file.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">MATCHING NEWLINE CHARACTERS</a><br>
|
||||
<P>
|
||||
This area is not simple, because POSIX and Perl take different views of things.
|
||||
It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
|
||||
intended to be a POSIX engine. The following table lists the different
|
||||
possibilities for matching newline characters in PCRE:
|
||||
<pre>
|
||||
Default Change with
|
||||
|
||||
. matches newline no PCRE_DOTALL
|
||||
newline matches [^a] yes not changeable
|
||||
$ matches \n at end yes PCRE_DOLLARENDONLY
|
||||
$ matches \n in middle no PCRE_MULTILINE
|
||||
^ matches \n in middle no PCRE_MULTILINE
|
||||
</pre>
|
||||
This is the equivalent table for POSIX:
|
||||
<pre>
|
||||
Default Change with
|
||||
|
||||
. matches newline yes REG_NEWLINE
|
||||
newline matches [^a] yes REG_NEWLINE
|
||||
$ matches \n at end no REG_NEWLINE
|
||||
$ matches \n in middle no REG_NEWLINE
|
||||
^ matches \n in middle no REG_NEWLINE
|
||||
</pre>
|
||||
PCRE's behaviour is the same as Perl's, except that there is no equivalent for
|
||||
PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop
|
||||
newline from matching [^a].
|
||||
</P>
|
||||
<P>
|
||||
The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
|
||||
PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the
|
||||
REG_NEWLINE action.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br>
|
||||
<P>
|
||||
The function <b>regexec()</b> is called to match a compiled pattern <i>preg</i>
|
||||
against a given <i>string</i>, which is terminated by a zero byte, subject to
|
||||
the options in <i>eflags</i>. These can be:
|
||||
<pre>
|
||||
REG_NOTBOL
|
||||
</pre>
|
||||
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
<pre>
|
||||
REG_NOTEOL
|
||||
</pre>
|
||||
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
</P>
|
||||
<P>
|
||||
If the pattern was compiled with the REG_NOSUB flag, no data about any matched
|
||||
strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
|
||||
<b>regexec()</b> are ignored.
|
||||
</P>
|
||||
<P>
|
||||
Otherwise,the portion of the string that was matched, and also any captured
|
||||
substrings, are returned via the <i>pmatch</i> argument, which points to an
|
||||
array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the
|
||||
members <i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first
|
||||
character of each substring and the offset to the first character after the end
|
||||
of each substring, respectively. The 0th element of the vector relates to the
|
||||
entire portion of <i>string</i> that was matched; subsequent elements relate to
|
||||
the capturing subpatterns of the regular expression. Unused entries in the
|
||||
array have both structure members set to -1.
|
||||
</P>
|
||||
<P>
|
||||
A successful match yields a zero return; various error codes are defined in the
|
||||
header file, of which REG_NOMATCH is the "expected" failure code.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">ERROR MESSAGES</a><br>
|
||||
<P>
|
||||
The <b>regerror()</b> function maps a non-zero errorcode from either
|
||||
<b>regcomp()</b> or <b>regexec()</b> to a printable message. If <i>preg</i> is not
|
||||
NULL, the error should have arisen from the use of that structure. A message
|
||||
terminated by a binary zero is placed in <i>errbuf</i>. The length of the
|
||||
message, including the zero, is limited to <i>errbuf_size</i>. The yield of the
|
||||
function is the size of buffer needed to hold the whole message.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">MEMORY USAGE</a><br>
|
||||
<P>
|
||||
Compiling a regular expression causes memory to be allocated and associated
|
||||
with the <i>preg</i> structure. The function <b>regfree()</b> frees all such
|
||||
memory, after which <i>preg</i> may no longer be used as a compiled expression.
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
University Computing Service,
|
||||
<br>
|
||||
Cambridge CB2 3QG, England.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 16 January 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
140
libs/pcre/doc/html/pcreprecompile.html
Normal file
140
libs/pcre/doc/html/pcreprecompile.html
Normal file
@ -0,0 +1,140 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcreprecompile specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcreprecompile man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE PATTERNS</a>
|
||||
<li><a name="TOC2" href="#SEC2">SAVING A COMPILED PATTERN</a>
|
||||
<li><a name="TOC3" href="#SEC3">RE-USING A PRECOMPILED PATTERN</a>
|
||||
<li><a name="TOC4" href="#SEC4">COMPATIBILITY WITH DIFFERENT PCRE RELEASES</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE PATTERNS</a><br>
|
||||
<P>
|
||||
If you are running an application that uses a large number of regular
|
||||
expression patterns, it may be useful to store them in a precompiled form
|
||||
instead of having to compile them every time the application is run.
|
||||
If you are not using any private character tables (see the
|
||||
<a href="pcre_maketables.html"><b>pcre_maketables()</b></a>
|
||||
documentation), this is relatively straightforward. If you are using private
|
||||
tables, it is a little bit more complicated.
|
||||
</P>
|
||||
<P>
|
||||
If you save compiled patterns to a file, you can copy them to a different host
|
||||
and run them there. This works even if the new host has the opposite endianness
|
||||
to the one on which the patterns were compiled. There may be a small
|
||||
performance penalty, but it should be insignificant.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">SAVING A COMPILED PATTERN</a><br>
|
||||
<P>
|
||||
The value returned by <b>pcre_compile()</b> points to a single block of memory
|
||||
that holds the compiled pattern and associated data. You can find the length of
|
||||
this block in bytes by calling <b>pcre_fullinfo()</b> with an argument of
|
||||
PCRE_INFO_SIZE. You can then save the data in any appropriate manner. Here is
|
||||
sample code that compiles a pattern and writes it to a file. It assumes that
|
||||
the variable <i>fd</i> refers to a file that is open for output:
|
||||
<pre>
|
||||
int erroroffset, rc, size;
|
||||
char *error;
|
||||
pcre *re;
|
||||
|
||||
re = pcre_compile("my pattern", 0, &error, &erroroffset, NULL);
|
||||
if (re == NULL) { ... handle errors ... }
|
||||
rc = pcre_fullinfo(re, NULL, PCRE_INFO_SIZE, &size);
|
||||
if (rc < 0) { ... handle errors ... }
|
||||
rc = fwrite(re, 1, size, fd);
|
||||
if (rc != size) { ... handle errors ... }
|
||||
</pre>
|
||||
In this example, the bytes that comprise the compiled pattern are copied
|
||||
exactly. Note that this is binary data that may contain any of the 256 possible
|
||||
byte values. On systems that make a distinction between binary and non-binary
|
||||
data, be sure that the file is opened for binary output.
|
||||
</P>
|
||||
<P>
|
||||
If you want to write more than one pattern to a file, you will have to devise a
|
||||
way of separating them. For binary data, preceding each pattern with its length
|
||||
is probably the most straightforward approach. Another possibility is to write
|
||||
out the data in hexadecimal instead of binary, one pattern to a line.
|
||||
</P>
|
||||
<P>
|
||||
Saving compiled patterns in a file is only one possible way of storing them for
|
||||
later use. They could equally well be saved in a database, or in the memory of
|
||||
some daemon process that passes them via sockets to the processes that want
|
||||
them.
|
||||
</P>
|
||||
<P>
|
||||
If the pattern has been studied, it is also possible to save the study data in
|
||||
a similar way to the compiled pattern itself. When studying generates
|
||||
additional information, <b>pcre_study()</b> returns a pointer to a
|
||||
<b>pcre_extra</b> data block. Its format is defined in the
|
||||
<a href="pcreapi.html#extradata">section on matching a pattern</a>
|
||||
in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation. The <i>study_data</i> field points to the binary study data, and
|
||||
this is what you must save (not the <b>pcre_extra</b> block itself). The length
|
||||
of the study data can be obtained by calling <b>pcre_fullinfo()</b> with an
|
||||
argument of PCRE_INFO_STUDYSIZE. Remember to check that <b>pcre_study()</b> did
|
||||
return a non-NULL value before trying to save the study data.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">RE-USING A PRECOMPILED PATTERN</a><br>
|
||||
<P>
|
||||
Re-using a precompiled pattern is straightforward. Having reloaded it into main
|
||||
memory, you pass its pointer to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> in
|
||||
the usual way. This should work even on another host, and even if that host has
|
||||
the opposite endianness to the one where the pattern was compiled.
|
||||
</P>
|
||||
<P>
|
||||
However, if you passed a pointer to custom character tables when the pattern
|
||||
was compiled (the <i>tableptr</i> argument of <b>pcre_compile()</b>), you must
|
||||
now pass a similar pointer to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>,
|
||||
because the value saved with the compiled pattern will obviously be nonsense. A
|
||||
field in a <b>pcre_extra()</b> block is used to pass this data, as described in
|
||||
the
|
||||
<a href="pcreapi.html#extradata">section on matching a pattern</a>
|
||||
in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
If you did not provide custom character tables when the pattern was compiled,
|
||||
the pointer in the compiled pattern is NULL, which causes <b>pcre_exec()</b> to
|
||||
use PCRE's internal tables. Thus, you do not need to take any special action at
|
||||
run time in this case.
|
||||
</P>
|
||||
<P>
|
||||
If you saved study data with the compiled pattern, you need to create your own
|
||||
<b>pcre_extra</b> data block and set the <i>study_data</i> field to point to the
|
||||
reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in the
|
||||
<i>flags</i> field to indicate that study data is present. Then pass the
|
||||
<b>pcre_extra</b> block to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> in the
|
||||
usual way.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">COMPATIBILITY WITH DIFFERENT PCRE RELEASES</a><br>
|
||||
<P>
|
||||
The layout of the control block that is at the start of the data that makes up
|
||||
a compiled pattern was changed for release 5.0. If you have any saved patterns
|
||||
that were compiled with previous releases (not a facility that was previously
|
||||
advertised), you will have to recompile them for release 5.0. However, from now
|
||||
on, it should be possible to make changes in a compatible manner.
|
||||
</P>
|
||||
<P>
|
||||
Notwithstanding the above, if you have any saved patterns in UTF-8 mode that
|
||||
use \p or \P that were compiled with any release up to and including 6.4, you
|
||||
will have to recompile them for release 6.5 and above.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 01 February 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
81
libs/pcre/doc/html/pcresample.html
Normal file
81
libs/pcre/doc/html/pcresample.html
Normal file
@ -0,0 +1,81 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcresample specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcresample man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
PCRE SAMPLE PROGRAM
|
||||
</b><br>
|
||||
<P>
|
||||
A simple, complete demonstration program, to get you started with using PCRE,
|
||||
is supplied in the file <i>pcredemo.c</i> in the PCRE distribution.
|
||||
</P>
|
||||
<P>
|
||||
The program compiles the regular expression that is its first argument, and
|
||||
matches it against the subject string in its second argument. No PCRE options
|
||||
are set, and default character tables are used. If matching succeeds, the
|
||||
program outputs the portion of the subject that matched, together with the
|
||||
contents of any captured substrings.
|
||||
</P>
|
||||
<P>
|
||||
If the -g option is given on the command line, the program then goes on to
|
||||
check for further matches of the same regular expression in the same subject
|
||||
string. The logic is a little bit tricky because of the possibility of matching
|
||||
an empty string. Comments in the code explain what is going on.
|
||||
</P>
|
||||
<P>
|
||||
If PCRE is installed in the standard include and library directories for your
|
||||
system, you should be able to compile the demonstration program using this
|
||||
command:
|
||||
<pre>
|
||||
gcc -o pcredemo pcredemo.c -lpcre
|
||||
</pre>
|
||||
If PCRE is installed elsewhere, you may need to add additional options to the
|
||||
command line. For example, on a Unix-like system that has PCRE installed in
|
||||
<i>/usr/local</i>, you can compile the demonstration program using a command
|
||||
like this:
|
||||
<pre>
|
||||
gcc -o pcredemo -I/usr/local/include pcredemo.c -L/usr/local/lib -lpcre
|
||||
</pre>
|
||||
Once you have compiled the demonstration program, you can run simple tests like
|
||||
this:
|
||||
<pre>
|
||||
./pcredemo 'cat|dog' 'the cat sat on the mat'
|
||||
./pcredemo -g 'cat|dog' 'the dog sat on the cat'
|
||||
</pre>
|
||||
Note that there is a much more comprehensive test program, called
|
||||
<a href="pcretest.html"><b>pcretest</b>,</a>
|
||||
which supports many more facilities for testing regular expressions and the
|
||||
PCRE library. The <b>pcredemo</b> program is provided as a simple coding
|
||||
example.
|
||||
</P>
|
||||
<P>
|
||||
On some operating systems (e.g. Solaris), when PCRE is not installed in the
|
||||
standard library directory, you may get an error like this when you try to run
|
||||
<b>pcredemo</b>:
|
||||
<pre>
|
||||
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
|
||||
</pre>
|
||||
This is caused by the way shared library support works on those systems. You
|
||||
need to add
|
||||
<pre>
|
||||
-R/usr/local/lib
|
||||
</pre>
|
||||
(for example) to the compile command to get round this problem.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 09 September 2004
|
||||
<br>
|
||||
Copyright © 1997-2004 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
127
libs/pcre/doc/html/pcrestack.html
Normal file
127
libs/pcre/doc/html/pcrestack.html
Normal file
@ -0,0 +1,127 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcrestack specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcrestack man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<br><b>
|
||||
PCRE DISCUSSION OF STACK USAGE
|
||||
</b><br>
|
||||
<P>
|
||||
When you call <b>pcre_exec()</b>, it makes use of an internal function called
|
||||
<b>match()</b>. This calls itself recursively at branch points in the pattern,
|
||||
in order to remember the state of the match so that it can back up and try a
|
||||
different alternative if the first one fails. As matching proceeds deeper and
|
||||
deeper into the tree of possibilities, the recursion depth increases.
|
||||
</P>
|
||||
<P>
|
||||
Not all calls of <b>match()</b> increase the recursion depth; for an item such
|
||||
as a* it may be called several times at the same level, after matching
|
||||
different numbers of a's. Furthermore, in a number of cases where the result of
|
||||
the recursive call would immediately be passed back as the result of the
|
||||
current call (a "tail recursion"), the function is just restarted instead.
|
||||
</P>
|
||||
<P>
|
||||
The <b>pcre_dfa_exec()</b> function operates in an entirely different way, and
|
||||
hardly uses recursion at all. The limit on its complexity is the amount of
|
||||
workspace it is given. The comments that follow do NOT apply to
|
||||
<b>pcre_dfa_exec()</b>; they are relevant only for <b>pcre_exec()</b>.
|
||||
</P>
|
||||
<P>
|
||||
You can set limits on the number of times that <b>match()</b> is called, both in
|
||||
total and recursively. If the limit is exceeded, an error occurs. For details,
|
||||
see the
|
||||
<a href="pcreapi.html#extradata">section on extra data for <b>pcre_exec()</b></a>
|
||||
in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
Each time that <b>match()</b> is actually called recursively, it uses memory
|
||||
from the process stack. For certain kinds of pattern and data, very large
|
||||
amounts of stack may be needed, despite the recognition of "tail recursion".
|
||||
You can often reduce the amount of recursion, and therefore the amount of stack
|
||||
used, by modifying the pattern that is being matched. Consider, for example,
|
||||
this pattern:
|
||||
<pre>
|
||||
([^<]|<(?!inet))+
|
||||
</pre>
|
||||
It matches from wherever it starts until it encounters "<inet" or the end of
|
||||
the data, and is the kind of pattern that might be used when processing an XML
|
||||
file. Each iteration of the outer parentheses matches either one character that
|
||||
is not "<" or a "<" that is not followed by "inet". However, each time a
|
||||
parenthesis is processed, a recursion occurs, so this formulation uses a stack
|
||||
frame for each matched character. For a long string, a lot of stack is
|
||||
required. Consider now this rewritten pattern, which matches exactly the same
|
||||
strings:
|
||||
<pre>
|
||||
([^<]++|<(?!inet))
|
||||
</pre>
|
||||
This uses very much less stack, because runs of characters that do not contain
|
||||
"<" are "swallowed" in one item inside the parentheses. Recursion happens only
|
||||
when a "<" character that is not followed by "inet" is encountered (and we
|
||||
assume this is relatively rare). A possessive quantifier is used to stop any
|
||||
backtracking into the runs of non-"<" characters, but that is not related to
|
||||
stack usage.
|
||||
</P>
|
||||
<P>
|
||||
In environments where stack memory is constrained, you might want to compile
|
||||
PCRE to use heap memory instead of stack for remembering back-up points. This
|
||||
makes it run a lot more slowly, however. Details of how to do this are given in
|
||||
the
|
||||
<a href="pcrebuild.html"><b>pcrebuild</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
In Unix-like environments, there is not often a problem with the stack, though
|
||||
the default limit on stack size varies from system to system. Values from 8Mb
|
||||
to 64Mb are common. You can find your default limit by running the command:
|
||||
<pre>
|
||||
ulimit -s
|
||||
</pre>
|
||||
The effect of running out of stack is often SIGSEGV, though sometimes an error
|
||||
message is given. You can normally increase the limit on stack size by code
|
||||
such as this:
|
||||
<pre>
|
||||
struct rlimit rlim;
|
||||
getrlimit(RLIMIT_STACK, &rlim);
|
||||
rlim.rlim_cur = 100*1024*1024;
|
||||
setrlimit(RLIMIT_STACK, &rlim);
|
||||
</pre>
|
||||
This reads the current limits (soft and hard) using <b>getrlimit()</b>, then
|
||||
attempts to increase the soft limit to 100Mb using <b>setrlimit()</b>. You must
|
||||
do this before calling <b>pcre_exec()</b>.
|
||||
</P>
|
||||
<P>
|
||||
PCRE has an internal counter that can be used to limit the depth of recursion,
|
||||
and thus cause <b>pcre_exec()</b> to give an error code before it runs out of
|
||||
stack. By default, the limit is very large, and unlikely ever to operate. It
|
||||
can be changed when PCRE is built, and it can also be set when
|
||||
<b>pcre_exec()</b> is called. For details of these interfaces, see the
|
||||
<a href="pcrebuild.html"><b>pcrebuild</b></a>
|
||||
and
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
As a very rough rule of thumb, you should reckon on about 500 bytes per
|
||||
recursion. Thus, if you want to limit your stack usage to 8Mb, you
|
||||
should set the limit at 16000 recursions. A 64Mb stack, on the other hand, can
|
||||
support around 128000 recursions. The <b>pcretest</b> test program has a command
|
||||
line option (<b>-S</b>) that can be used to increase its stack.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 29 June 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
616
libs/pcre/doc/html/pcretest.html
Normal file
616
libs/pcre/doc/html/pcretest.html
Normal file
@ -0,0 +1,616 @@
|
||||
<html>
|
||||
<head>
|
||||
<title>pcretest specification</title>
|
||||
</head>
|
||||
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
|
||||
<h1>pcretest man page</h1>
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
||||
<p>
|
||||
This page is part of the PCRE HTML documentation. It was generated automatically
|
||||
from the original man page. If there is any nonsense in it, please consult the
|
||||
man page, in case the conversion went wrong.
|
||||
<br>
|
||||
<ul>
|
||||
<li><a name="TOC1" href="#SEC1">SYNOPSIS</a>
|
||||
<li><a name="TOC2" href="#SEC2">OPTIONS</a>
|
||||
<li><a name="TOC3" href="#SEC3">DESCRIPTION</a>
|
||||
<li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a>
|
||||
<li><a name="TOC5" href="#SEC5">DATA LINES</a>
|
||||
<li><a name="TOC6" href="#SEC6">THE ALTERNATIVE MATCHING FUNCTION</a>
|
||||
<li><a name="TOC7" href="#SEC7">DEFAULT OUTPUT FROM PCRETEST</a>
|
||||
<li><a name="TOC8" href="#SEC8">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
|
||||
<li><a name="TOC9" href="#SEC9">RESTARTING AFTER A PARTIAL MATCH</a>
|
||||
<li><a name="TOC10" href="#SEC10">CALLOUTS</a>
|
||||
<li><a name="TOC11" href="#SEC11">SAVING AND RELOADING COMPILED PATTERNS</a>
|
||||
<li><a name="TOC12" href="#SEC12">AUTHOR</a>
|
||||
</ul>
|
||||
<br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
|
||||
<P>
|
||||
<b>pcretest [options] [source] [destination]</b>
|
||||
<br>
|
||||
<br>
|
||||
<b>pcretest</b> was written as a test program for the PCRE regular expression
|
||||
library itself, but it can also be used for experimenting with regular
|
||||
expressions. This document describes the features of the test program; for
|
||||
details of the regular expressions themselves, see the
|
||||
<a href="pcrepattern.html"><b>pcrepattern</b></a>
|
||||
documentation. For details of the PCRE library function calls and their
|
||||
options, see the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC2" href="#TOC1">OPTIONS</a><br>
|
||||
<P>
|
||||
<b>-C</b>
|
||||
Output the version number of the PCRE library, and all available information
|
||||
about the optional features that are included, and then exit.
|
||||
</P>
|
||||
<P>
|
||||
<b>-d</b>
|
||||
Behave as if each regex has the <b>/D</b> (debug) modifier; the internal
|
||||
form is output after compilation.
|
||||
</P>
|
||||
<P>
|
||||
<b>-dfa</b>
|
||||
Behave as if each data line contains the \D escape sequence; this causes the
|
||||
alternative matching function, <b>pcre_dfa_exec()</b>, to be used instead of the
|
||||
standard <b>pcre_exec()</b> function (more detail is given below).
|
||||
</P>
|
||||
<P>
|
||||
<b>-i</b>
|
||||
Behave as if each regex has the <b>/I</b> modifier; information about the
|
||||
compiled pattern is given after compilation.
|
||||
</P>
|
||||
<P>
|
||||
<b>-m</b>
|
||||
Output the size of each compiled pattern after it has been compiled. This is
|
||||
equivalent to adding <b>/M</b> to each regular expression. For compatibility
|
||||
with earlier versions of pcretest, <b>-s</b> is a synonym for <b>-m</b>.
|
||||
</P>
|
||||
<P>
|
||||
<b>-o</b> <i>osize</i>
|
||||
Set the number of elements in the output vector that is used when calling
|
||||
<b>pcre_exec()</b> to be <i>osize</i>. The default value is 45, which is enough
|
||||
for 14 capturing subexpressions. The vector size can be changed for individual
|
||||
matching calls by including \O in the data line (see below).
|
||||
</P>
|
||||
<P>
|
||||
<b>-p</b>
|
||||
Behave as if each regex has the <b>/P</b> modifier; the POSIX wrapper API is
|
||||
used to call PCRE. None of the other options has any effect when <b>-p</b> is
|
||||
set.
|
||||
</P>
|
||||
<P>
|
||||
<b>-q</b>
|
||||
Do not output the version number of <b>pcretest</b> at the start of execution.
|
||||
</P>
|
||||
<P>
|
||||
<b>-S</b> <i>size</i>
|
||||
On Unix-like systems, set the size of the runtime stack to <i>size</i>
|
||||
megabytes.
|
||||
</P>
|
||||
<P>
|
||||
<b>-t</b>
|
||||
Run each compile, study, and match many times with a timer, and output
|
||||
resulting time per compile or match (in milliseconds). Do not set <b>-m</b> with
|
||||
<b>-t</b>, because you will then get the size output a zillion times, and the
|
||||
timing will be distorted.
|
||||
</P>
|
||||
<br><a name="SEC3" href="#TOC1">DESCRIPTION</a><br>
|
||||
<P>
|
||||
If <b>pcretest</b> is given two filename arguments, it reads from the first and
|
||||
writes to the second. If it is given only one filename argument, it reads from
|
||||
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
||||
stdout, and prompts for each line of input, using "re>" to prompt for regular
|
||||
expressions, and "data>" to prompt for data lines.
|
||||
</P>
|
||||
<P>
|
||||
The program handles any number of sets of input on a single input file. Each
|
||||
set starts with a regular expression, and continues with any number of data
|
||||
lines to be matched against the pattern.
|
||||
</P>
|
||||
<P>
|
||||
Each data line is matched separately and independently. If you want to do
|
||||
multi-line matches, you have to use the \n escape sequence (or \r or \r\n,
|
||||
depending on the newline setting) in a single line of input to encode the
|
||||
newline characters. There is no limit on the length of data lines; the input
|
||||
buffer is automatically extended if it is too small.
|
||||
</P>
|
||||
<P>
|
||||
An empty line signals the end of the data lines, at which point a new regular
|
||||
expression is read. The regular expressions are given enclosed in any
|
||||
non-alphanumeric delimiters other than backslash, for example:
|
||||
<pre>
|
||||
/(a|bc)x+yz/
|
||||
</pre>
|
||||
White space before the initial delimiter is ignored. A regular expression may
|
||||
be continued over several input lines, in which case the newline characters are
|
||||
included within it. It is possible to include the delimiter within the pattern
|
||||
by escaping it, for example
|
||||
<pre>
|
||||
/abc\/def/
|
||||
</pre>
|
||||
If you do so, the escape and the delimiter form part of the pattern, but since
|
||||
delimiters are always non-alphanumeric, this does not affect its interpretation.
|
||||
If the terminating delimiter is immediately followed by a backslash, for
|
||||
example,
|
||||
<pre>
|
||||
/abc/\
|
||||
</pre>
|
||||
then a backslash is added to the end of the pattern. This is done to provide a
|
||||
way of testing the error condition that arises if a pattern finishes with a
|
||||
backslash, because
|
||||
<pre>
|
||||
/abc\/
|
||||
</pre>
|
||||
is interpreted as the first line of a pattern that starts with "abc/", causing
|
||||
pcretest to read the next line as a continuation of the regular expression.
|
||||
</P>
|
||||
<br><a name="SEC4" href="#TOC1">PATTERN MODIFIERS</a><br>
|
||||
<P>
|
||||
A pattern may be followed by any number of modifiers, which are mostly single
|
||||
characters. Following Perl usage, these are referred to below as, for example,
|
||||
"the <b>/i</b> modifier", even though the delimiter of the pattern need not
|
||||
always be a slash, and no slash is used when writing modifiers. Whitespace may
|
||||
appear between the final pattern delimiter and the first modifier, and between
|
||||
the modifiers themselves.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/i</b>, <b>/m</b>, <b>/s</b>, and <b>/x</b> modifiers set the PCRE_CASELESS,
|
||||
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when
|
||||
<b>pcre_compile()</b> is called. These four modifier letters have the same
|
||||
effect as they do in Perl. For example:
|
||||
<pre>
|
||||
/caseless/i
|
||||
</pre>
|
||||
The following table shows additional modifiers for setting PCRE options that do
|
||||
not correspond to anything in Perl:
|
||||
<pre>
|
||||
<b>/A</b> PCRE_ANCHORED
|
||||
<b>/C</b> PCRE_AUTO_CALLOUT
|
||||
<b>/E</b> PCRE_DOLLAR_ENDONLY
|
||||
<b>/f</b> PCRE_FIRSTLINE
|
||||
<b>/J</b> PCRE_DUPNAMES
|
||||
<b>/N</b> PCRE_NO_AUTO_CAPTURE
|
||||
<b>/U</b> PCRE_UNGREEDY
|
||||
<b>/X</b> PCRE_EXTRA
|
||||
<b>/<cr></b> PCRE_NEWLINE_CR
|
||||
<b>/<lf></b> PCRE_NEWLINE_LF
|
||||
<b>/<crlf></b> PCRE_NEWLINE_CRLF
|
||||
</pre>
|
||||
Those specifying line endings are literal strings as shown. Details of the
|
||||
meanings of these PCRE options are given in the
|
||||
<a href="pcreapi.html"><b>pcreapi</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><b>
|
||||
Finding all matches in a string
|
||||
</b><br>
|
||||
<P>
|
||||
Searching for all possible matches within each subject string can be requested
|
||||
by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called
|
||||
again to search the remainder of the subject string. The difference between
|
||||
<b>/g</b> and <b>/G</b> is that the former uses the <i>startoffset</i> argument to
|
||||
<b>pcre_exec()</b> to start searching at a new point within the entire string
|
||||
(which is in effect what Perl does), whereas the latter passes over a shortened
|
||||
substring. This makes a difference to the matching process if the pattern
|
||||
begins with a lookbehind assertion (including \b or \B).
|
||||
</P>
|
||||
<P>
|
||||
If any call to <b>pcre_exec()</b> in a <b>/g</b> or <b>/G</b> sequence matches an
|
||||
empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
|
||||
flags set in order to search for another, non-empty, match at the same point.
|
||||
If this second match fails, the start offset is advanced by one, and the normal
|
||||
match is retried. This imitates the way Perl handles such cases when using the
|
||||
<b>/g</b> modifier or the <b>split()</b> function.
|
||||
</P>
|
||||
<br><b>
|
||||
Other modifiers
|
||||
</b><br>
|
||||
<P>
|
||||
There are yet more modifiers for controlling the way <b>pcretest</b>
|
||||
operates.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/+</b> modifier requests that as well as outputting the substring that
|
||||
matched the entire pattern, pcretest should in addition output the remainder of
|
||||
the subject string. This is useful for tests where the subject contains
|
||||
multiple copies of the same substring.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/L</b> modifier must be followed directly by the name of a locale, for
|
||||
example,
|
||||
<pre>
|
||||
/pattern/Lfr_FR
|
||||
</pre>
|
||||
For this reason, it must be the last modifier. The given locale is set,
|
||||
<b>pcre_maketables()</b> is called to build a set of character tables for the
|
||||
locale, and this is then passed to <b>pcre_compile()</b> when compiling the
|
||||
regular expression. Without an <b>/L</b> modifier, NULL is passed as the tables
|
||||
pointer; that is, <b>/L</b> applies only to the expression on which it appears.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/I</b> modifier requests that <b>pcretest</b> output information about the
|
||||
compiled pattern (whether it is anchored, has a fixed first character, and
|
||||
so on). It does this by calling <b>pcre_fullinfo()</b> after compiling a
|
||||
pattern. If the pattern is studied, the results of that are also output.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/D</b> modifier is a PCRE debugging feature, which also assumes <b>/I</b>.
|
||||
It causes the internal form of compiled regular expressions to be output after
|
||||
compilation. If the pattern was studied, the information returned is also
|
||||
output.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/F</b> modifier causes <b>pcretest</b> to flip the byte order of the
|
||||
fields in the compiled pattern that contain 2-byte and 4-byte numbers. This
|
||||
facility is for testing the feature in PCRE that allows it to execute patterns
|
||||
that were compiled on a host with a different endianness. This feature is not
|
||||
available when the POSIX interface to PCRE is being used, that is, when the
|
||||
<b>/P</b> pattern modifier is specified. See also the section about saving and
|
||||
reloading compiled patterns below.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/S</b> modifier causes <b>pcre_study()</b> to be called after the
|
||||
expression has been compiled, and the results used when the expression is
|
||||
matched.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/M</b> modifier causes the size of memory block used to hold the compiled
|
||||
pattern to be output.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/P</b> modifier causes <b>pcretest</b> to call PCRE via the POSIX wrapper
|
||||
API rather than its native API. When this is done, all other modifiers except
|
||||
<b>/i</b>, <b>/m</b>, and <b>/+</b> are ignored. REG_ICASE is set if <b>/i</b> is
|
||||
present, and REG_NEWLINE is set if <b>/m</b> is present. The wrapper functions
|
||||
force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
|
||||
</P>
|
||||
<P>
|
||||
The <b>/8</b> modifier causes <b>pcretest</b> to call PCRE with the PCRE_UTF8
|
||||
option set. This turns on support for UTF-8 character handling in PCRE,
|
||||
provided that it was compiled with this support enabled. This modifier also
|
||||
causes any non-printing characters in output strings to be printed using the
|
||||
\x{hh...} notation if they are valid UTF-8 sequences.
|
||||
</P>
|
||||
<P>
|
||||
If the <b>/?</b> modifier is used with <b>/8</b>, it causes <b>pcretest</b> to
|
||||
call <b>pcre_compile()</b> with the PCRE_NO_UTF8_CHECK option, to suppress the
|
||||
checking of the string for UTF-8 validity.
|
||||
</P>
|
||||
<br><a name="SEC5" href="#TOC1">DATA LINES</a><br>
|
||||
<P>
|
||||
Before each data line is passed to <b>pcre_exec()</b>, leading and trailing
|
||||
whitespace is removed, and it is then scanned for \ escapes. Some of these are
|
||||
pretty esoteric features, intended for checking out some of the more
|
||||
complicated features of PCRE. If you are just testing "ordinary" regular
|
||||
expressions, you probably don't need any of these. The following escapes are
|
||||
recognized:
|
||||
<pre>
|
||||
\a alarm (= BEL)
|
||||
\b backspace
|
||||
\e escape
|
||||
\f formfeed
|
||||
\n newline
|
||||
\qdd set the PCRE_MATCH_LIMIT limit to dd (any number of digits)
|
||||
\r carriage return
|
||||
\t tab
|
||||
\v vertical tab
|
||||
\nnn octal character (up to 3 octal digits)
|
||||
\xhh hexadecimal character (up to 2 hex digits)
|
||||
\x{hh...} hexadecimal character, any number of digits in UTF-8 mode
|
||||
\A pass the PCRE_ANCHORED option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\B pass the PCRE_NOTBOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\Cdd call pcre_copy_substring() for substring dd after a successful match (number less than 32)
|
||||
\Cname call pcre_copy_named_substring() for substring "name" after a successful match (name termin-
|
||||
ated by next non alphanumeric character)
|
||||
\C+ show the current captured substrings at callout time
|
||||
\C- do not supply a callout function
|
||||
\C!n return 1 instead of 0 when callout number n is reached
|
||||
\C!n!m return 1 instead of 0 when callout number n is reached for the nth time
|
||||
\C*n pass the number n (may be negative) as callout data; this is used as the callout return value
|
||||
\D use the <b>pcre_dfa_exec()</b> match function
|
||||
\F only shortest match for <b>pcre_dfa_exec()</b>
|
||||
\Gdd call pcre_get_substring() for substring dd after a successful match (number less than 32)
|
||||
\Gname call pcre_get_named_substring() for substring "name" after a successful match (name termin-
|
||||
ated by next non-alphanumeric character)
|
||||
\L call pcre_get_substringlist() after a successful match
|
||||
\M discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings
|
||||
\N pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\Odd set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits)
|
||||
\P pass the PCRE_PARTIAL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits)
|
||||
\R pass the PCRE_DFA_RESTART option to <b>pcre_dfa_exec()</b>
|
||||
\S output details of memory get/free calls during matching
|
||||
\Z pass the PCRE_NOTEOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\? pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\>dd start the match at offset dd (any number of digits);
|
||||
this sets the <i>startoffset</i> argument for <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\<cr> pass the PCRE_NEWLINE_CR option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\<lf> pass the PCRE_NEWLINE_LF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
\<crlf> pass the PCRE_NEWLINE_CRLF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
|
||||
</pre>
|
||||
The escapes that specify line endings are literal strings, exactly as shown.
|
||||
A backslash followed by anything else just escapes the anything else. If the
|
||||
very last character is a backslash, it is ignored. This gives a way of passing
|
||||
an empty line as data, since a real empty line terminates the data input.
|
||||
</P>
|
||||
<P>
|
||||
If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with
|
||||
different values in the <i>match_limit</i> and <i>match_limit_recursion</i>
|
||||
fields of the <b>pcre_extra</b> data structure, until it finds the minimum
|
||||
numbers for each parameter that allow <b>pcre_exec()</b> to complete. The
|
||||
<i>match_limit</i> number is a measure of the amount of backtracking that takes
|
||||
place, and checking it out can be instructive. For most simple matches, the
|
||||
number is quite small, but for patterns with very large numbers of matching
|
||||
possibilities, it can become large very quickly with increasing length of
|
||||
subject string. The <i>match_limit_recursion</i> number is a measure of how much
|
||||
stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed
|
||||
to complete the match attempt.
|
||||
</P>
|
||||
<P>
|
||||
When \O is used, the value specified may be higher or lower than the size set
|
||||
by the <b>-O</b> command line option (or defaulted to 45); \O applies only to
|
||||
the call of <b>pcre_exec()</b> for the line in which it appears.
|
||||
</P>
|
||||
<P>
|
||||
If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper
|
||||
API to be used, the only option-setting sequences that have any effect are \B
|
||||
and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to
|
||||
<b>regexec()</b>.
|
||||
</P>
|
||||
<P>
|
||||
The use of \x{hh...} to represent UTF-8 characters is not dependent on the use
|
||||
of the <b>/8</b> modifier on the pattern. It is recognized always. There may be
|
||||
any number of hexadecimal digits inside the braces. The result is from one to
|
||||
six bytes, encoded according to the UTF-8 rules.
|
||||
</P>
|
||||
<br><a name="SEC6" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
|
||||
<P>
|
||||
By default, <b>pcretest</b> uses the standard PCRE matching function,
|
||||
<b>pcre_exec()</b> to match each data line. From release 6.0, PCRE supports an
|
||||
alternative matching function, <b>pcre_dfa_test()</b>, which operates in a
|
||||
different way, and has some restrictions. The differences between the two
|
||||
functions are described in the
|
||||
<a href="pcrematching.html"><b>pcrematching</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<P>
|
||||
If a data line contains the \D escape sequence, or if the command line
|
||||
contains the <b>-dfa</b> option, the alternative matching function is called.
|
||||
This function finds all possible matches at a given point. If, however, the \F
|
||||
escape sequence is present in the data line, it stops after the first match is
|
||||
found. This is always the shortest possible match.
|
||||
</P>
|
||||
<br><a name="SEC7" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br>
|
||||
<P>
|
||||
This section describes the output when the normal matching function,
|
||||
<b>pcre_exec()</b>, is being used.
|
||||
</P>
|
||||
<P>
|
||||
When a match succeeds, pcretest outputs the list of captured substrings that
|
||||
<b>pcre_exec()</b> returns, starting with number 0 for the string that matched
|
||||
the whole pattern. Otherwise, it outputs "No match" or "Partial match"
|
||||
when <b>pcre_exec()</b> returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
|
||||
respectively, and otherwise the PCRE negative error number. Here is an example
|
||||
of an interactive <b>pcretest</b> run.
|
||||
<pre>
|
||||
$ pcretest
|
||||
PCRE version 5.00 07-Sep-2004
|
||||
|
||||
re> /^abc(\d+)/
|
||||
data> abc123
|
||||
0: abc123
|
||||
1: 123
|
||||
data> xyz
|
||||
No match
|
||||
</pre>
|
||||
If the strings contain any non-printing characters, they are output as \0x
|
||||
escapes, or as \x{...} escapes if the <b>/8</b> modifier was present on the
|
||||
pattern. If the pattern has the <b>/+</b> modifier, the output for substring 0
|
||||
is followed by the the rest of the subject string, identified by "0+" like
|
||||
this:
|
||||
<pre>
|
||||
re> /cat/+
|
||||
data> cataract
|
||||
0: cat
|
||||
0+ aract
|
||||
</pre>
|
||||
If the pattern has the <b>/g</b> or <b>/G</b> modifier, the results of successive
|
||||
matching attempts are output in sequence, like this:
|
||||
<pre>
|
||||
re> /\Bi(\w\w)/g
|
||||
data> Mississippi
|
||||
0: iss
|
||||
1: ss
|
||||
0: iss
|
||||
1: ss
|
||||
0: ipp
|
||||
1: pp
|
||||
</pre>
|
||||
"No match" is output only if the first match attempt fails.
|
||||
</P>
|
||||
<P>
|
||||
If any of the sequences <b>\C</b>, <b>\G</b>, or <b>\L</b> are present in a
|
||||
data line that is successfully matched, the substrings extracted by the
|
||||
convenience functions are output with C, G, or L after the string number
|
||||
instead of a colon. This is in addition to the normal full list. The string
|
||||
length (that is, the return from the extraction function) is given in
|
||||
parentheses after each string for <b>\C</b> and <b>\G</b>.
|
||||
</P>
|
||||
<P>
|
||||
Note that while patterns can be continued over several lines (a plain ">"
|
||||
prompt is used for continuations), data lines may not. However newlines can be
|
||||
included in data by means of the \n escape (or \r or \r\n for those newline
|
||||
settings).
|
||||
</P>
|
||||
<br><a name="SEC8" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
|
||||
<P>
|
||||
When the alternative matching function, <b>pcre_dfa_exec()</b>, is used (by
|
||||
means of the \D escape sequence or the <b>-dfa</b> command line option), the
|
||||
output consists of a list of all the matches that start at the first point in
|
||||
the subject where there is at least one match. For example:
|
||||
<pre>
|
||||
re> /(tang|tangerine|tan)/
|
||||
data> yellow tangerine\D
|
||||
0: tangerine
|
||||
1: tang
|
||||
2: tan
|
||||
</pre>
|
||||
(Using the normal matching function on this data finds only "tang".) The
|
||||
longest matching string is always given first (and numbered zero).
|
||||
</P>
|
||||
<P>
|
||||
If \fB/g\P is present on the pattern, the search for further matches resumes
|
||||
at the end of the longest match. For example:
|
||||
<pre>
|
||||
re> /(tang|tangerine|tan)/g
|
||||
data> yellow tangerine and tangy sultana\D
|
||||
0: tangerine
|
||||
1: tang
|
||||
2: tan
|
||||
0: tang
|
||||
1: tan
|
||||
0: tan
|
||||
</pre>
|
||||
Since the matching function does not support substring capture, the escape
|
||||
sequences that are concerned with captured substrings are not relevant.
|
||||
</P>
|
||||
<br><a name="SEC9" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
|
||||
<P>
|
||||
When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
|
||||
indicating that the subject partially matched the pattern, you can restart the
|
||||
match with additional subject data by means of the \R escape sequence. For
|
||||
example:
|
||||
<pre>
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
data> 23ja\P\D
|
||||
Partial match: 23ja
|
||||
data> n05\R\D
|
||||
0: n05
|
||||
</pre>
|
||||
For further information about partial matching, see the
|
||||
<a href="pcrepartial.html"><b>pcrepartial</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC10" href="#TOC1">CALLOUTS</a><br>
|
||||
<P>
|
||||
If the pattern contains any callout requests, <b>pcretest</b>'s callout function
|
||||
is called during matching. This works with both matching functions. By default,
|
||||
the called function displays the callout number, the start and current
|
||||
positions in the text at the callout time, and the next pattern item to be
|
||||
tested. For example, the output
|
||||
<pre>
|
||||
--->pqrabcdef
|
||||
0 ^ ^ \d
|
||||
</pre>
|
||||
indicates that callout number 0 occurred for a match attempt starting at the
|
||||
fourth character of the subject string, when the pointer was at the seventh
|
||||
character of the data, and when the next pattern item was \d. Just one
|
||||
circumflex is output if the start and current positions are the same.
|
||||
</P>
|
||||
<P>
|
||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as a
|
||||
result of the <b>/C</b> pattern modifier. In this case, instead of showing the
|
||||
callout number, the offset in the pattern, preceded by a plus, is output. For
|
||||
example:
|
||||
<pre>
|
||||
re> /\d?[A-E]\*/C
|
||||
data> E*
|
||||
--->E*
|
||||
+0 ^ \d?
|
||||
+3 ^ [A-E]
|
||||
+8 ^^ \*
|
||||
+10 ^ ^
|
||||
0: E*
|
||||
</pre>
|
||||
The callout function in <b>pcretest</b> returns zero (carry on matching) by
|
||||
default, but you can use a \C item in a data line (as described above) to
|
||||
change this.
|
||||
</P>
|
||||
<P>
|
||||
Inserting callouts can be helpful when using <b>pcretest</b> to check
|
||||
complicated regular expressions. For further information about callouts, see
|
||||
the
|
||||
<a href="pcrecallout.html"><b>pcrecallout</b></a>
|
||||
documentation.
|
||||
</P>
|
||||
<br><a name="SEC11" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
|
||||
<P>
|
||||
The facilities described in this section are not available when the POSIX
|
||||
inteface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is
|
||||
specified.
|
||||
</P>
|
||||
<P>
|
||||
When the POSIX interface is not in use, you can cause <b>pcretest</b> to write a
|
||||
compiled pattern to a file, by following the modifiers with > and a file name.
|
||||
For example:
|
||||
<pre>
|
||||
/pattern/im >/some/file
|
||||
</pre>
|
||||
See the
|
||||
<a href="pcreprecompile.html"><b>pcreprecompile</b></a>
|
||||
documentation for a discussion about saving and re-using compiled patterns.
|
||||
</P>
|
||||
<P>
|
||||
The data that is written is binary. The first eight bytes are the length of the
|
||||
compiled pattern data followed by the length of the optional study data, each
|
||||
written as four bytes in big-endian order (most significant byte first). If
|
||||
there is no study data (either the pattern was not studied, or studying did not
|
||||
return any data), the second length is zero. The lengths are followed by an
|
||||
exact copy of the compiled pattern. If there is additional study data, this
|
||||
follows immediately after the compiled pattern. After writing the file,
|
||||
<b>pcretest</b> expects to read a new pattern.
|
||||
</P>
|
||||
<P>
|
||||
A saved pattern can be reloaded into <b>pcretest</b> by specifing < and a file
|
||||
name instead of a pattern. The name of the file must not contain a < character,
|
||||
as otherwise <b>pcretest</b> will interpret the line as a pattern delimited by <
|
||||
characters.
|
||||
For example:
|
||||
<pre>
|
||||
re> </some/file
|
||||
Compiled regex loaded from /some/file
|
||||
No study data
|
||||
</pre>
|
||||
When the pattern has been loaded, <b>pcretest</b> proceeds to read data lines in
|
||||
the usual way.
|
||||
</P>
|
||||
<P>
|
||||
You can copy a file written by <b>pcretest</b> to a different host and reload it
|
||||
there, even if the new host has opposite endianness to the one on which the
|
||||
pattern was compiled. For example, you can compile on an i86 machine and run on
|
||||
a SPARC machine.
|
||||
</P>
|
||||
<P>
|
||||
File names for saving and reloading can be absolute or relative, but note that
|
||||
the shell facility of expanding a file name that starts with a tilde (~) is not
|
||||
available.
|
||||
</P>
|
||||
<P>
|
||||
The ability to save and reload files in <b>pcretest</b> is intended for testing
|
||||
and experimentation. It is not intended for production use because only a
|
||||
single pattern can be written to a file. Furthermore, there is no facility for
|
||||
supplying custom character tables for use with a reloaded pattern. If the
|
||||
original pattern was compiled with custom tables, an attempt to match a subject
|
||||
string using a reloaded pattern is likely to cause <b>pcretest</b> to crash.
|
||||
Finally, if you attempt to load a file that is not in the correct format, the
|
||||
result is undefined.
|
||||
</P>
|
||||
<br><a name="SEC12" href="#TOC1">AUTHOR</a><br>
|
||||
<P>
|
||||
Philip Hazel
|
||||
<br>
|
||||
University Computing Service,
|
||||
<br>
|
||||
Cambridge CB2 3QG, England.
|
||||
</P>
|
||||
<P>
|
||||
Last updated: 29 June 2006
|
||||
<br>
|
||||
Copyright © 1997-2006 University of Cambridge.
|
||||
<p>
|
||||
Return to the <a href="index.html">PCRE index page</a>.
|
||||
</p>
|
244
libs/pcre/doc/pcre.3
Normal file
244
libs/pcre/doc/pcre.3
Normal file
@ -0,0 +1,244 @@
|
||||
.TH PCRE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH INTRODUCTION
|
||||
.rs
|
||||
.sp
|
||||
The PCRE library is a set of functions that implement regular expression
|
||||
pattern matching using the same syntax and semantics as Perl, with just a few
|
||||
differences. The current implementation of PCRE (release 6.x) corresponds
|
||||
approximately with Perl 5.8, including support for UTF-8 encoded strings and
|
||||
Unicode general category properties. However, this support has to be explicitly
|
||||
enabled; it is not the default.
|
||||
.P
|
||||
In addition to the Perl-compatible matching function, PCRE also contains an
|
||||
alternative matching function that matches the same compiled patterns in a
|
||||
different way. In certain circumstances, the alternative function has some
|
||||
advantages. For a discussion of the two matching algorithms, see the
|
||||
.\" HREF
|
||||
\fBpcrematching\fP
|
||||
.\"
|
||||
page.
|
||||
.P
|
||||
PCRE is written in C and released as a C library. A number of people have
|
||||
written wrappers and interfaces of various kinds. In particular, Google Inc.
|
||||
have provided a comprehensive C++ wrapper. This is now included as part of the
|
||||
PCRE distribution. The
|
||||
.\" HREF
|
||||
\fBpcrecpp\fP
|
||||
.\"
|
||||
page has details of this interface. Other people's contributions can be found
|
||||
in the \fIContrib\fR directory at the primary FTP site, which is:
|
||||
.sp
|
||||
.\" HTML <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">
|
||||
.\" </a>
|
||||
ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre
|
||||
.P
|
||||
Details of exactly which Perl regular expression features are and are not
|
||||
supported by PCRE are given in separate documents. See the
|
||||
.\" HREF
|
||||
\fBpcrepattern\fR
|
||||
.\"
|
||||
and
|
||||
.\" HREF
|
||||
\fBpcrecompat\fR
|
||||
.\"
|
||||
pages.
|
||||
.P
|
||||
Some features of PCRE can be included, excluded, or changed when the library is
|
||||
built. The
|
||||
.\" HREF
|
||||
\fBpcre_config()\fR
|
||||
.\"
|
||||
function makes it possible for a client to discover which features are
|
||||
available. The features themselves are described in the
|
||||
.\" HREF
|
||||
\fBpcrebuild\fP
|
||||
.\"
|
||||
page. Documentation about building PCRE for various operating systems can be
|
||||
found in the \fBREADME\fP file in the source distribution.
|
||||
.P
|
||||
The library contains a number of undocumented internal functions and data
|
||||
tables that are used by more than one of the exported external functions, but
|
||||
which are not intended for use by external callers. Their names all begin with
|
||||
"_pcre_", which hopefully will not provoke any name clashes. In some
|
||||
environments, it is possible to control which external symbols are exported
|
||||
when a shared library is built, and in these cases the undocumented symbols are
|
||||
not exported.
|
||||
.
|
||||
.
|
||||
.SH "USER DOCUMENTATION"
|
||||
.rs
|
||||
.sp
|
||||
The user documentation for PCRE comprises a number of different sections. In
|
||||
the "man" format, each of these is a separate "man page". In the HTML format,
|
||||
each is a separate page, linked from the index page. In the plain text format,
|
||||
all the sections are concatenated, for ease of searching. The sections are as
|
||||
follows:
|
||||
.sp
|
||||
pcre this document
|
||||
pcreapi details of PCRE's native C API
|
||||
pcrebuild options for building PCRE
|
||||
pcrecallout details of the callout feature
|
||||
pcrecompat discussion of Perl compatibility
|
||||
pcrecpp details of the C++ wrapper
|
||||
pcregrep description of the \fBpcregrep\fP command
|
||||
pcrematching discussion of the two matching algorithms
|
||||
pcrepartial details of the partial matching facility
|
||||
.\" JOIN
|
||||
pcrepattern syntax and semantics of supported
|
||||
regular expressions
|
||||
pcreperform discussion of performance issues
|
||||
pcreposix the POSIX-compatible C API
|
||||
pcreprecompile details of saving and re-using precompiled patterns
|
||||
pcresample discussion of the sample program
|
||||
pcrestack discussion of stack usage
|
||||
pcretest description of the \fBpcretest\fP testing command
|
||||
.sp
|
||||
In addition, in the "man" and HTML formats, there is a short page for each
|
||||
C library function, listing its arguments and results.
|
||||
.
|
||||
.
|
||||
.SH LIMITATIONS
|
||||
.rs
|
||||
.sp
|
||||
There are some size limitations in PCRE but it is hoped that they will never in
|
||||
practice be relevant.
|
||||
.P
|
||||
The maximum length of a compiled pattern is 65539 (sic) bytes if PCRE is
|
||||
compiled with the default internal linkage size of 2. If you want to process
|
||||
regular expressions that are truly enormous, you can compile PCRE with an
|
||||
internal linkage size of 3 or 4 (see the \fBREADME\fP file in the source
|
||||
distribution and the
|
||||
.\" HREF
|
||||
\fBpcrebuild\fP
|
||||
.\"
|
||||
documentation for details). In these cases the limit is substantially larger.
|
||||
However, the speed of execution will be slower.
|
||||
.P
|
||||
All values in repeating quantifiers must be less than 65536. The maximum
|
||||
compiled length of subpattern with an explicit repeat count is 30000 bytes. The
|
||||
maximum number of capturing subpatterns is 65535.
|
||||
.P
|
||||
There is no limit to the number of non-capturing subpatterns, but the maximum
|
||||
depth of nesting of all kinds of parenthesized subpattern, including capturing
|
||||
subpatterns, assertions, and other types of subpattern, is 200.
|
||||
.P
|
||||
The maximum length of name for a named subpattern is 32, and the maximum number
|
||||
of named subpatterns is 10000.
|
||||
.P
|
||||
The maximum length of a subject string is the largest positive number that an
|
||||
integer variable can hold. However, when using the traditional matching
|
||||
function, PCRE uses recursion to handle subpatterns and indefinite repetition.
|
||||
This means that the available stack space may limit the size of a subject
|
||||
string that can be processed by certain patterns. For a discussion of stack
|
||||
issues, see the
|
||||
.\" HREF
|
||||
\fBpcrestack\fP
|
||||
.\"
|
||||
documentation.
|
||||
.sp
|
||||
.\" HTML <a name="utf8support"></a>
|
||||
.
|
||||
.
|
||||
.SH "UTF-8 AND UNICODE PROPERTY SUPPORT"
|
||||
.rs
|
||||
.sp
|
||||
From release 3.3, PCRE has had some support for character strings encoded in
|
||||
the UTF-8 format. For release 4.0 this was greatly extended to cover most
|
||||
common requirements, and in release 5.0 additional support for Unicode general
|
||||
category properties was added.
|
||||
.P
|
||||
In order process UTF-8 strings, you must build PCRE to include UTF-8 support in
|
||||
the code, and, in addition, you must call
|
||||
.\" HREF
|
||||
\fBpcre_compile()\fP
|
||||
.\"
|
||||
with the PCRE_UTF8 option flag. When you do this, both the pattern and any
|
||||
subject strings that are matched against it are treated as UTF-8 strings
|
||||
instead of just strings of bytes.
|
||||
.P
|
||||
If you compile PCRE with UTF-8 support, but do not use it at run time, the
|
||||
library will be a bit bigger, but the additional run time overhead is limited
|
||||
to testing the PCRE_UTF8 flag in several places, so should not be very large.
|
||||
.P
|
||||
If PCRE is built with Unicode character property support (which implies UTF-8
|
||||
support), the escape sequences \ep{..}, \eP{..}, and \eX are supported.
|
||||
The available properties that can be tested are limited to the general
|
||||
category properties such as Lu for an upper case letter or Nd for a decimal
|
||||
number, the Unicode script names such as Arabic or Han, and the derived
|
||||
properties Any and L&. A full list is given in the
|
||||
.\" HREF
|
||||
\fBpcrepattern\fP
|
||||
.\"
|
||||
documentation. Only the short names for properties are supported. For example,
|
||||
\ep{L} matches a letter. Its Perl synonym, \ep{Letter}, is not supported.
|
||||
Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
|
||||
compatibility with Perl 5.6. PCRE does not support this.
|
||||
.P
|
||||
The following comments apply when PCRE is running in UTF-8 mode:
|
||||
.P
|
||||
1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
|
||||
are checked for validity on entry to the relevant functions. If an invalid
|
||||
UTF-8 string is passed, an error return is given. In some situations, you may
|
||||
already know that your strings are valid, and therefore want to skip these
|
||||
checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag
|
||||
at compile time or at run time, PCRE assumes that the pattern or subject it
|
||||
is given (respectively) contains only valid UTF-8 codes. In this case, it does
|
||||
not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to
|
||||
PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
|
||||
may crash.
|
||||
.P
|
||||
2. An unbraced hexadecimal escape sequence (such as \exb3) matches a two-byte
|
||||
UTF-8 character if the value is greater than 127.
|
||||
.P
|
||||
3. Octal numbers up to \e777 are recognized, and match two-byte UTF-8
|
||||
characters for values greater than \e177.
|
||||
.P
|
||||
4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
|
||||
bytes, for example: \ex{100}{3}.
|
||||
.P
|
||||
5. The dot metacharacter matches one UTF-8 character instead of a single byte.
|
||||
.P
|
||||
6. The escape sequence \eC can be used to match a single byte in UTF-8 mode,
|
||||
but its use can lead to some strange effects. This facility is not available in
|
||||
the alternative matching function, \fBpcre_dfa_exec()\fP.
|
||||
.P
|
||||
7. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly
|
||||
test characters of any code value, but the characters that PCRE recognizes as
|
||||
digits, spaces, or word characters remain the same set as before, all with
|
||||
values less than 256. This remains true even when PCRE includes Unicode
|
||||
property support, because to do otherwise would slow down PCRE in many common
|
||||
cases. If you really want to test for a wider sense of, say, "digit", you
|
||||
must use Unicode property tests such as \ep{Nd}.
|
||||
.P
|
||||
8. Similarly, characters that match the POSIX named character classes are all
|
||||
low-valued characters.
|
||||
.P
|
||||
9. Case-insensitive matching applies only to characters whose values are less
|
||||
than 128, unless PCRE is built with Unicode property support. Even when Unicode
|
||||
property support is available, PCRE still uses its own character tables when
|
||||
checking the case of low-valued characters, so as not to degrade performance.
|
||||
The Unicode property information is used only for characters with higher
|
||||
values. Even when Unicode property support is available, PCRE supports
|
||||
case-insensitive matching only when there is a one-to-one mapping between a
|
||||
letter's cases. There are a small number of many-to-one mappings in Unicode;
|
||||
these are not supported by PCRE.
|
||||
.
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
Philip Hazel
|
||||
.br
|
||||
University Computing Service,
|
||||
.br
|
||||
Cambridge CB2 3QG, England.
|
||||
.P
|
||||
Putting an actual email address here seems to have been a spam magnet, so I've
|
||||
taken it away. If you want to email me, use my initial and surname, separated
|
||||
by a dot, at the domain ucs.cam.ac.uk.
|
||||
.sp
|
||||
.in 0
|
||||
Last updated: 05 June 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
5153
libs/pcre/doc/pcre.txt
Normal file
5153
libs/pcre/doc/pcre.txt
Normal file
File diff suppressed because it is too large
Load Diff
69
libs/pcre/doc/pcre_compile.3
Normal file
69
libs/pcre/doc/pcre_compile.3
Normal file
@ -0,0 +1,69 @@
|
||||
.TH PCRE_COMPILE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B pcre *pcre_compile(const char *\fIpattern\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function compiles a regular expression into an internal form. Its
|
||||
arguments are:
|
||||
.sp
|
||||
\fIpattern\fR A zero-terminated string containing the
|
||||
regular expression to be compiled
|
||||
\fIoptions\fR Zero or more option bits
|
||||
\fIerrptr\fR Where to put an error message
|
||||
\fIerroffset\fR Offset in pattern where error was found
|
||||
\fItableptr\fR Pointer to character tables, or NULL to
|
||||
use the built-in default
|
||||
.sp
|
||||
The option bits are:
|
||||
.sp
|
||||
PCRE_ANCHORED Force pattern anchoring
|
||||
PCRE_AUTO_CALLOUT Compile automatic callouts
|
||||
PCRE_CASELESS Do caseless matching
|
||||
PCRE_DOLLAR_ENDONLY $ not to match newline at end
|
||||
PCRE_DOTALL . matches anything including NL
|
||||
PCRE_DUPNAMES Allow duplicate names for subpatterns
|
||||
PCRE_EXTENDED Ignore whitespace and # comments
|
||||
PCRE_EXTRA PCRE extra features
|
||||
(not much use currently)
|
||||
PCRE_FIRSTLINE Force matching to be before newline
|
||||
PCRE_MULTILINE ^ and $ match newlines within data
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||
theses (named ones available)
|
||||
PCRE_UNGREEDY Invert greediness of quantifiers
|
||||
PCRE_UTF8 Run in UTF-8 mode
|
||||
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
|
||||
validity (only relevant if
|
||||
PCRE_UTF8 is set)
|
||||
.sp
|
||||
PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
|
||||
PCRE_NO_UTF8_CHECK.
|
||||
.P
|
||||
The yield of the function is a pointer to a private data structure that
|
||||
contains the compiled pattern, or NULL if an error was detected.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fR
|
||||
.\"
|
||||
page.
|
74
libs/pcre/doc/pcre_compile2.3
Normal file
74
libs/pcre/doc/pcre_compile2.3
Normal file
@ -0,0 +1,74 @@
|
||||
.TH PCRE_COMPILE2 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B pcre *pcre_compile2(const char *\fIpattern\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B int *\fIerrorcodeptr\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP, int *\fIerroffset\fP,
|
||||
.ti +5n
|
||||
.B const unsigned char *\fItableptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function compiles a regular expression into an internal form. It is the
|
||||
same as \fBpcre_compile()\fP, except for the addition of the \fIerrorcodeptr\fP
|
||||
argument. The arguments are:
|
||||
|
||||
.sp
|
||||
\fIpattern\fR A zero-terminated string containing the
|
||||
regular expression to be compiled
|
||||
\fIoptions\fR Zero or more option bits
|
||||
\fIerrorcodeptr\fP Where to put an error code
|
||||
\fIerrptr\fR Where to put an error message
|
||||
\fIerroffset\fR Offset in pattern where error was found
|
||||
\fItableptr\fR Pointer to character tables, or NULL to
|
||||
use the built-in default
|
||||
.sp
|
||||
The option bits are:
|
||||
.sp
|
||||
PCRE_ANCHORED Force pattern anchoring
|
||||
PCRE_AUTO_CALLOUT Compile automatic callouts
|
||||
PCRE_CASELESS Do caseless matching
|
||||
PCRE_DOLLAR_ENDONLY $ not to match newline at end
|
||||
PCRE_DOTALL . matches anything including NL
|
||||
PCRE_DUPNAMES Allow duplicate names for subpatterns
|
||||
PCRE_EXTENDED Ignore whitespace and # comments
|
||||
PCRE_EXTRA PCRE extra features
|
||||
(not much use currently)
|
||||
PCRE_FIRSTLINE Force matching to be before newline
|
||||
PCRE_MULTILINE ^ and $ match newlines within data
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NO_AUTO_CAPTURE Disable numbered capturing paren-
|
||||
theses (named ones available)
|
||||
PCRE_UNGREEDY Invert greediness of quantifiers
|
||||
PCRE_UTF8 Run in UTF-8 mode
|
||||
PCRE_NO_UTF8_CHECK Do not check the pattern for UTF-8
|
||||
validity (only relevant if
|
||||
PCRE_UTF8 is set)
|
||||
.sp
|
||||
PCRE must be built with UTF-8 support in order to use PCRE_UTF8 and
|
||||
PCRE_NO_UTF8_CHECK.
|
||||
.P
|
||||
The yield of the function is a pointer to a private data structure that
|
||||
contains the compiled pattern, or NULL if an error was detected.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fR
|
||||
.\"
|
||||
page.
|
50
libs/pcre/doc/pcre_config.3
Normal file
50
libs/pcre/doc/pcre_config.3
Normal file
@ -0,0 +1,50 @@
|
||||
.TH PCRE_CONFIG 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_config(int \fIwhat\fP, void *\fIwhere\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function makes it possible for a client program to find out which optional
|
||||
features are available in the version of the PCRE library it is using. Its
|
||||
arguments are as follows:
|
||||
.sp
|
||||
\fIwhat\fR A code specifying what information is required
|
||||
\fIwhere\fR Points to where to put the data
|
||||
.sp
|
||||
The available codes are:
|
||||
.sp
|
||||
PCRE_CONFIG_LINK_SIZE Internal link size: 2, 3, or 4
|
||||
PCRE_CONFIG_MATCH_LIMIT Internal resource limit
|
||||
PCRE_CONFIG_MATCH_LIMIT_RECURSION
|
||||
Internal recursion depth limit
|
||||
PCRE_CONFIG_NEWLINE Value of the newline sequence
|
||||
PCRE_CONFIG_POSIX_MALLOC_THRESHOLD
|
||||
Threshold of return slots, above
|
||||
which \fBmalloc()\fR is used by
|
||||
the POSIX API
|
||||
PCRE_CONFIG_STACKRECURSE Recursion implementation (1=stack 0=heap)
|
||||
PCRE_CONFIG_UTF8 Availability of UTF-8 support (1=yes 0=no)
|
||||
PCRE_CONFIG_UNICODE_PROPERTIES
|
||||
Availability of Unicode property support
|
||||
(1=yes 0=no)
|
||||
.sp
|
||||
The function yields 0 on success or PCRE_ERROR_BADOPTION otherwise.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fR
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fR
|
||||
.\"
|
||||
page.
|
44
libs/pcre/doc/pcre_copy_named_substring.3
Normal file
44
libs/pcre/doc/pcre_copy_named_substring.3
Normal file
@ -0,0 +1,44 @@
|
||||
.TH PCRE_COPY_NAMED_SUBSTRING 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_copy_named_substring(const pcre *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B const char *\fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, const char *\fIstringname\fP,
|
||||
.ti +5n
|
||||
.B char *\fIbuffer\fP, int \fIbuffersize\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a captured substring, identified
|
||||
by name, into a given buffer. The arguments are:
|
||||
.sp
|
||||
\fIcode\fP Pattern that was successfully matched
|
||||
\fIsubject\fP Subject that has been successfully matched
|
||||
\fIovector\fP Offset vector that \fBpcre_exec()\fP used
|
||||
\fIstringcount\fP Value returned by \fBpcre_exec()\fP
|
||||
\fIstringname\fP Name of the required substring
|
||||
\fIbuffer\fP Buffer to receive the string
|
||||
\fIbuffersize\fP Size of buffer
|
||||
.sp
|
||||
The yield is the length of the substring, PCRE_ERROR_NOMEMORY if the buffer was
|
||||
too small, or PCRE_ERROR_NOSUBSTRING if the string name is invalid.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
41
libs/pcre/doc/pcre_copy_substring.3
Normal file
41
libs/pcre/doc/pcre_copy_substring.3
Normal file
@ -0,0 +1,41 @@
|
||||
.TH PCRE_COPY_SUBSTRING 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_copy_substring(const char *\fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, int \fIstringnumber\fP, char *\fIbuffer\fP,
|
||||
.ti +5n
|
||||
.B int \fIbuffersize\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a captured substring into a given
|
||||
buffer. The arguments are:
|
||||
.sp
|
||||
\fIsubject\fP Subject that has been successfully matched
|
||||
\fIovector\fP Offset vector that \fBpcre_exec()\fP used
|
||||
\fIstringcount\fP Value returned by \fBpcre_exec()\fP
|
||||
\fIstringnumber\fP Number of the required substring
|
||||
\fIbuffer\fP Buffer to receive the string
|
||||
\fIbuffersize\fP Size of buffer
|
||||
.sp
|
||||
The yield is the legnth of the string, PCRE_ERROR_NOMEMORY if the buffer was
|
||||
too small, or PCRE_ERROR_NOSUBSTRING if the string number is invalid.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
85
libs/pcre/doc/pcre_dfa_exec.3
Normal file
85
libs/pcre/doc/pcre_dfa_exec.3
Normal file
@ -0,0 +1,85 @@
|
||||
.TH PCRE_DFA_EXEC 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_dfa_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP,
|
||||
.ti +5n
|
||||
.B int *\fIworkspace\fP, int \fIwscount\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function matches a compiled regular expression against a given subject
|
||||
string, using a DFA matching algorithm (\fInot\fP Perl-compatible). Note that
|
||||
the main, Perl-compatible, matching function is \fBpcre_exec()\fP. The
|
||||
arguments for this function are:
|
||||
.sp
|
||||
\fIcode\fP Points to the compiled pattern
|
||||
\fIextra\fP Points to an associated \fBpcre_extra\fP structure,
|
||||
or is NULL
|
||||
\fIsubject\fP Points to the subject string
|
||||
\fIlength\fP Length of the subject string, in bytes
|
||||
\fIstartoffset\fP Offset in bytes in the subject at which to
|
||||
start matching
|
||||
\fIoptions\fP Option bits
|
||||
\fIovector\fP Points to a vector of ints for result offsets
|
||||
\fIovecsize\fP Number of elements in the vector
|
||||
\fIworkspace\fP Points to a vector of ints used as working space
|
||||
\fIwscount\fP Number of elements in the vector
|
||||
.sp
|
||||
The options are:
|
||||
.sp
|
||||
PCRE_ANCHORED Match only at the first position
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NOTBOL Subject is not the beginning of a line
|
||||
PCRE_NOTEOL Subject is not the end of a line
|
||||
PCRE_NOTEMPTY An empty string is not a valid match
|
||||
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
|
||||
validity (only relevant if PCRE_UTF8
|
||||
was set at compile time)
|
||||
PCRE_PARTIAL Return PCRE_ERROR_PARTIAL for a partial match
|
||||
PCRE_DFA_SHORTEST Return only the shortest match
|
||||
PCRE_DFA_RESTART This is a restart after a partial match
|
||||
.sp
|
||||
There are restrictions on what may appear in a pattern when matching using the
|
||||
DFA algorithm is requested. Details are given in the
|
||||
.\" HREF
|
||||
\fBpcrematching\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
A \fBpcre_extra\fP structure contains the following fields:
|
||||
.sp
|
||||
\fIflags\fP Bits indicating which fields are set
|
||||
\fIstudy_data\fP Opaque data from \fBpcre_study()\fP
|
||||
\fImatch_limit\fP Limit on internal resource use
|
||||
\fImatch_limit_recursion\fP Limit on internal recursion depth
|
||||
\fIcallout_data\fP Opaque data passed back to callouts
|
||||
\fItables\fP Points to character tables or is NULL
|
||||
.sp
|
||||
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
|
||||
PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
|
||||
PCRE_EXTRA_TABLES. For DFA matching, the \fImatch_limit\fP and
|
||||
\fImatch_limit_recursion\fP fields are not used, and must not be set.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
73
libs/pcre/doc/pcre_exec.3
Normal file
73
libs/pcre/doc/pcre_exec.3
Normal file
@ -0,0 +1,73 @@
|
||||
.TH PCRE_EXEC 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_exec(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B "const char *\fIsubject\fP," int \fIlength\fP, int \fIstartoffset\fP,
|
||||
.ti +5n
|
||||
.B int \fIoptions\fP, int *\fIovector\fP, int \fIovecsize\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function matches a compiled regular expression against a given subject
|
||||
string, using a matching algorithm that is similar to Perl's. It returns
|
||||
offsets to captured substrings. Its arguments are:
|
||||
.sp
|
||||
\fIcode\fP Points to the compiled pattern
|
||||
\fIextra\fP Points to an associated \fBpcre_extra\fP structure,
|
||||
or is NULL
|
||||
\fIsubject\fP Points to the subject string
|
||||
\fIlength\fP Length of the subject string, in bytes
|
||||
\fIstartoffset\fP Offset in bytes in the subject at which to
|
||||
start matching
|
||||
\fIoptions\fP Option bits
|
||||
\fIovector\fP Points to a vector of ints for result offsets
|
||||
\fIovecsize\fP Number of elements in the vector (a multiple of 3)
|
||||
.sp
|
||||
The options are:
|
||||
.sp
|
||||
PCRE_ANCHORED Match only at the first position
|
||||
PCRE_NEWLINE_CR Set CR as the newline sequence
|
||||
PCRE_NEWLINE_CRLF Set CRLF as the newline sequence
|
||||
PCRE_NEWLINE_LF Set LF as the newline sequence
|
||||
PCRE_NOTBOL Subject is not the beginning of a line
|
||||
PCRE_NOTEOL Subject is not the end of a line
|
||||
PCRE_NOTEMPTY An empty string is not a valid match
|
||||
PCRE_NO_UTF8_CHECK Do not check the subject for UTF-8
|
||||
validity (only relevant if PCRE_UTF8
|
||||
was set at compile time)
|
||||
PCRE_PARTIAL Return PCRE_ERROR_PARTIAL for a partial match
|
||||
.sp
|
||||
There are restrictions on what may appear in a pattern when partial matching is
|
||||
requested.
|
||||
.P
|
||||
A \fBpcre_extra\fP structure contains the following fields:
|
||||
.sp
|
||||
\fIflags\fP Bits indicating which fields are set
|
||||
\fIstudy_data\fP Opaque data from \fBpcre_study()\fP
|
||||
\fImatch_limit\fP Limit on internal resource use
|
||||
\fImatch_limit_recursion\fP Limit on internal recursion depth
|
||||
\fIcallout_data\fP Opaque data passed back to callouts
|
||||
\fItables\fP Points to character tables or is NULL
|
||||
.sp
|
||||
The flag bits are PCRE_EXTRA_STUDY_DATA, PCRE_EXTRA_MATCH_LIMIT,
|
||||
PCRE_EXTRA_MATCH_LIMIT_RECURSION, PCRE_EXTRA_CALLOUT_DATA, and
|
||||
PCRE_EXTRA_TABLES.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
28
libs/pcre/doc/pcre_free_substring.3
Normal file
28
libs/pcre/doc/pcre_free_substring.3
Normal file
@ -0,0 +1,28 @@
|
||||
.TH PCRE_FREE_SUBSTRING 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B void pcre_free_substring(const char *\fIstringptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for freeing the store obtained by a previous
|
||||
call to \fBpcre_get_substring()\fP or \fBpcre_get_named_substring()\fP. Its
|
||||
only argument is a pointer to the string.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
28
libs/pcre/doc/pcre_free_substring_list.3
Normal file
28
libs/pcre/doc/pcre_free_substring_list.3
Normal file
@ -0,0 +1,28 @@
|
||||
.TH PCRE_FREE_SUBSTRING_LIST 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B void pcre_free_substring_list(const char **\fIstringptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for freeing the store obtained by a previous
|
||||
call to \fBpcre_get_substring_list()\fP. Its only argument is a pointer to the
|
||||
list of string pointers.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
59
libs/pcre/doc/pcre_fullinfo.3
Normal file
59
libs/pcre/doc/pcre_fullinfo.3
Normal file
@ -0,0 +1,59 @@
|
||||
.TH PCRE_FULLINFO 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_fullinfo(const pcre *\fIcode\fP, "const pcre_extra *\fIextra\fP,"
|
||||
.ti +5n
|
||||
.B int \fIwhat\fP, void *\fIwhere\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function returns information about a compiled pattern. Its arguments are:
|
||||
.sp
|
||||
\fIcode\fP Compiled regular expression
|
||||
\fIextra\fP Result of \fBpcre_study()\fP or NULL
|
||||
\fIwhat\fP What information is required
|
||||
\fIwhere\fP Where to put the information
|
||||
.sp
|
||||
The following information is available:
|
||||
.sp
|
||||
PCRE_INFO_BACKREFMAX Number of highest back reference
|
||||
PCRE_INFO_CAPTURECOUNT Number of capturing subpatterns
|
||||
PCRE_INFO_DEFAULT_TABLES Pointer to default tables
|
||||
PCRE_INFO_FIRSTBYTE Fixed first byte for a match, or
|
||||
-1 for start of string
|
||||
or after newline, or
|
||||
-2 otherwise
|
||||
PCRE_INFO_FIRSTTABLE Table of first bytes
|
||||
(after studying)
|
||||
PCRE_INFO_LASTLITERAL Literal last byte required
|
||||
PCRE_INFO_NAMECOUNT Number of named subpatterns
|
||||
PCRE_INFO_NAMEENTRYSIZE Size of name table entry
|
||||
PCRE_INFO_NAMETABLE Pointer to name table
|
||||
PCRE_INFO_OPTIONS Options used for compilation
|
||||
PCRE_INFO_SIZE Size of compiled pattern
|
||||
PCRE_INFO_STUDYSIZE Size of study data
|
||||
.sp
|
||||
The yield of the function is zero on success or:
|
||||
.sp
|
||||
PCRE_ERROR_NULL the argument \fIcode\fP was NULL
|
||||
the argument \fIwhere\fP was NULL
|
||||
PCRE_ERROR_BADMAGIC the "magic number" was not found
|
||||
PCRE_ERROR_BADOPTION the value of \fIwhat\fP was invalid
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
45
libs/pcre/doc/pcre_get_named_substring.3
Normal file
45
libs/pcre/doc/pcre_get_named_substring.3
Normal file
@ -0,0 +1,45 @@
|
||||
.TH PCRE_GET_NAMED_SUBSTRING 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_get_named_substring(const pcre *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B const char *\fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, const char *\fIstringname\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIstringptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a captured substring by name. The
|
||||
arguments are:
|
||||
.sp
|
||||
\fIcode\fP Compiled pattern
|
||||
\fIsubject\fP Subject that has been successfully matched
|
||||
\fIovector\fP Offset vector that \fBpcre_exec()\fP used
|
||||
\fIstringcount\fP Value returned by \fBpcre_exec()\fP
|
||||
\fIstringname\fP Name of the required substring
|
||||
\fIstringptr\fP Where to put the string pointer
|
||||
.sp
|
||||
The memory in which the substring is placed is obtained by calling
|
||||
\fBpcre_malloc()\fP. The yield of the function is the length of the extracted
|
||||
substring, PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
|
||||
PCRE_ERROR_NOSUBSTRING if the string name is invalid.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
35
libs/pcre/doc/pcre_get_stringnumber.3
Normal file
35
libs/pcre/doc/pcre_get_stringnumber.3
Normal file
@ -0,0 +1,35 @@
|
||||
.TH PCRE_GET_STRINGNUMBER 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_get_stringnumber(const pcre *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B const char *\fIname\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This convenience function finds the number of a named substring capturing
|
||||
parenthesis in a compiled pattern. Its arguments are:
|
||||
.sp
|
||||
\fIcode\fP Compiled regular expression
|
||||
\fIname\fP Name whose number is required
|
||||
.sp
|
||||
The yield of the function is the number of the parenthesis if the name is
|
||||
found, or PCRE_ERROR_NOSUBSTRING otherwise.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
41
libs/pcre/doc/pcre_get_stringtable_entries.3
Normal file
41
libs/pcre/doc/pcre_get_stringtable_entries.3
Normal file
@ -0,0 +1,41 @@
|
||||
.TH PCRE_GET_STRINGNUMBER 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_get_stringtable_entries(const pcre *\fIcode\fP,
|
||||
.ti +5n
|
||||
.B const char *\fIname\fP, char **\fIfirst\fP, char **\fIlast\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This convenience function finds, for a compiled pattern, the first and last
|
||||
entries for a given name in the table that translates capturing parenthesis
|
||||
names into numbers. When names are required to be unique (PCRE_DUPNAMES is
|
||||
\fInot\fP set), it is usually easier to use \fBpcre_get_stringnumber()\fP
|
||||
instead.
|
||||
.sp
|
||||
\fIcode\fP Compiled regular expression
|
||||
\fIname\fP Name whose entries required
|
||||
\fIfirst\fP Where to return a pointer to the first entry
|
||||
\fIlast\fP Where to return a pointer to the last entry
|
||||
.sp
|
||||
The yield of the function is the length of each entry, or
|
||||
PCRE_ERROR_NOSUBSTRING if none are found.
|
||||
.P
|
||||
There is a complete description of the PCRE native API, including the format of
|
||||
the table entries, in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
42
libs/pcre/doc/pcre_get_substring.3
Normal file
42
libs/pcre/doc/pcre_get_substring.3
Normal file
@ -0,0 +1,42 @@
|
||||
.TH PCRE_GET_SUBSTRING 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_get_substring(const char *\fIsubject\fP, int *\fIovector\fP,
|
||||
.ti +5n
|
||||
.B int \fIstringcount\fP, int \fIstringnumber\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIstringptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a captured substring. The
|
||||
arguments are:
|
||||
.sp
|
||||
\fIsubject\fP Subject that has been successfully matched
|
||||
\fIovector\fP Offset vector that \fBpcre_exec()\fP used
|
||||
\fIstringcount\fP Value returned by \fBpcre_exec()\fP
|
||||
\fIstringnumber\fP Number of the required substring
|
||||
\fIstringptr\fP Where to put the string pointer
|
||||
.sp
|
||||
The memory in which the substring is placed is obtained by calling
|
||||
\fBpcre_malloc()\fP. The yield of the function is the length of the substring,
|
||||
PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained, or
|
||||
PCRE_ERROR_NOSUBSTRING if the string number is invalid.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
40
libs/pcre/doc/pcre_get_substring_list.3
Normal file
40
libs/pcre/doc/pcre_get_substring_list.3
Normal file
@ -0,0 +1,40 @@
|
||||
.TH PCRE_GET_SUBSTRING_LIST 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_get_substring_list(const char *\fIsubject\fP,
|
||||
.ti +5n
|
||||
.B int *\fIovector\fP, int \fIstringcount\fP, "const char ***\fIlistptr\fP);"
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This is a convenience function for extracting a list of all the captured
|
||||
substrings. The arguments are:
|
||||
.sp
|
||||
\fIsubject\fP Subject that has been successfully matched
|
||||
\fIovector\fP Offset vector that \fBpcre_exec\fP used
|
||||
\fIstringcount\fP Value returned by \fBpcre_exec\fP
|
||||
\fIlistptr\fP Where to put a pointer to the list
|
||||
.sp
|
||||
The memory in which the substrings and the list are placed is obtained by
|
||||
calling \fBpcre_malloc()\fP. A pointer to a list of pointers is put in
|
||||
the variable whose address is in \fIlistptr\fP. The list is terminated by a
|
||||
NULL pointer. The yield of the function is zero on success or
|
||||
PCRE_ERROR_NOMEMORY if sufficient memory could not be obtained.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
27
libs/pcre/doc/pcre_info.3
Normal file
27
libs/pcre/doc/pcre_info.3
Normal file
@ -0,0 +1,27 @@
|
||||
.TH PCRE_INFO 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_info(const pcre *\fIcode\fP, int *\fIoptptr\fP, int
|
||||
.B *\fIfirstcharptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function is obsolete. You should be using \fBpcre_fullinfo()\fP instead.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
30
libs/pcre/doc/pcre_maketables.3
Normal file
30
libs/pcre/doc/pcre_maketables.3
Normal file
@ -0,0 +1,30 @@
|
||||
.TH PCRE_MAKETABLES 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B const unsigned char *pcre_maketables(void);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function builds a set of character tables for character values less than
|
||||
256. These can be passed to \fBpcre_compile()\fP to override PCRE's internal,
|
||||
built-in tables (which were made by \fBpcre_maketables()\fP when PCRE was
|
||||
compiled). You might want to do this if you are using a non-standard locale.
|
||||
The function yields a pointer to the tables.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
33
libs/pcre/doc/pcre_refcount.3
Normal file
33
libs/pcre/doc/pcre_refcount.3
Normal file
@ -0,0 +1,33 @@
|
||||
.TH PCRE_REFCOUNT 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int pcre_refcount(pcre *\fIcode\fP, int \fIadjust\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function is used to maintain a reference count inside a data block that
|
||||
contains a compiled pattern. Its arguments are:
|
||||
.sp
|
||||
\fIcode\fP Compiled regular expression
|
||||
\fIadjust\fP Adjustment to reference value
|
||||
.sp
|
||||
The yield of the function is the adjusted reference value, which is constrained
|
||||
to lie between 0 and 65535.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
43
libs/pcre/doc/pcre_study.3
Normal file
43
libs/pcre/doc/pcre_study.3
Normal file
@ -0,0 +1,43 @@
|
||||
.TH PCRE_STUDY 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B pcre_extra *pcre_study(const pcre *\fIcode\fP, int \fIoptions\fP,
|
||||
.ti +5n
|
||||
.B const char **\fIerrptr\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function studies a compiled pattern, to see if additional information can
|
||||
be extracted that might speed up matching. Its arguments are:
|
||||
.sp
|
||||
\fIcode\fP A compiled regular expression
|
||||
\fIoptions\fP Options for \fBpcre_study()\fP
|
||||
\fIerrptr\fP Where to put an error message
|
||||
.sp
|
||||
If the function succeeds, it returns a value that can be passed to
|
||||
\fBpcre_exec()\fP via its \fIextra\fP argument.
|
||||
.P
|
||||
If the function returns NULL, either it could not find any additional
|
||||
information, or there was an error. You can tell the difference by looking at
|
||||
the error value. It is NULL in first case.
|
||||
.P
|
||||
There are currently no options defined; the value of the second argument should
|
||||
always be zero.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
27
libs/pcre/doc/pcre_version.3
Normal file
27
libs/pcre/doc/pcre_version.3
Normal file
@ -0,0 +1,27 @@
|
||||
.TH PCRE_VERSION 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcre.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B char *pcre_version(void);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This function returns a character string that gives the version number of the
|
||||
PCRE library and the date of its release.
|
||||
.P
|
||||
There is a complete description of the PCRE native API in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
page and a description of the POSIX API in the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
page.
|
1789
libs/pcre/doc/pcreapi.3
Normal file
1789
libs/pcre/doc/pcreapi.3
Normal file
File diff suppressed because it is too large
Load Diff
213
libs/pcre/doc/pcrebuild.3
Normal file
213
libs/pcre/doc/pcrebuild.3
Normal file
@ -0,0 +1,213 @@
|
||||
.TH PCREBUILD 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE BUILD-TIME OPTIONS"
|
||||
.rs
|
||||
.sp
|
||||
This document describes the optional features of PCRE that can be selected when
|
||||
the library is compiled. They are all selected, or deselected, by providing
|
||||
options to the \fBconfigure\fP script that is run before the \fBmake\fP
|
||||
command. The complete list of options for \fBconfigure\fP (which includes the
|
||||
standard ones such as the selection of the installation directory) can be
|
||||
obtained by running
|
||||
.sp
|
||||
./configure --help
|
||||
.sp
|
||||
The following sections describe certain options whose names begin with --enable
|
||||
or --disable. These settings specify changes to the defaults for the
|
||||
\fBconfigure\fP command. Because of the way that \fBconfigure\fP works,
|
||||
--enable and --disable always come in pairs, so the complementary option always
|
||||
exists as well, but as it specifies the default, it is not described.
|
||||
.
|
||||
.SH "C++ SUPPORT"
|
||||
.rs
|
||||
.sp
|
||||
By default, the \fBconfigure\fP script will search for a C++ compiler and C++
|
||||
header files. If it finds them, it automatically builds the C++ wrapper library
|
||||
for PCRE. You can disable this by adding
|
||||
.sp
|
||||
--disable-cpp
|
||||
.sp
|
||||
to the \fBconfigure\fP command.
|
||||
.
|
||||
.SH "UTF-8 SUPPORT"
|
||||
.rs
|
||||
.sp
|
||||
To build PCRE with support for UTF-8 character strings, add
|
||||
.sp
|
||||
--enable-utf8
|
||||
.sp
|
||||
to the \fBconfigure\fP command. Of itself, this does not make PCRE treat
|
||||
strings as UTF-8. As well as compiling PCRE with this option, you also have
|
||||
have to set the PCRE_UTF8 option when you call the \fBpcre_compile()\fP
|
||||
function.
|
||||
.
|
||||
.SH "UNICODE CHARACTER PROPERTY SUPPORT"
|
||||
.rs
|
||||
.sp
|
||||
UTF-8 support allows PCRE to process character values greater than 255 in the
|
||||
strings that it handles. On its own, however, it does not provide any
|
||||
facilities for accessing the properties of such characters. If you want to be
|
||||
able to use the pattern escapes \eP, \ep, and \eX, which refer to Unicode
|
||||
character properties, you must add
|
||||
.sp
|
||||
--enable-unicode-properties
|
||||
.sp
|
||||
to the \fBconfigure\fP command. This implies UTF-8 support, even if you have
|
||||
not explicitly requested it.
|
||||
.P
|
||||
Including Unicode property support adds around 90K of tables to the PCRE
|
||||
library, approximately doubling its size. Only the general category properties
|
||||
such as \fILu\fP and \fINd\fP are supported. Details are given in the
|
||||
.\" HREF
|
||||
\fBpcrepattern\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.SH "CODE VALUE OF NEWLINE"
|
||||
.rs
|
||||
.sp
|
||||
By default, PCRE interprets character 10 (linefeed, LF) as indicating the end
|
||||
of a line. This is the normal newline character on Unix-like systems. You can
|
||||
compile PCRE to use character 13 (carriage return, CR) instead, by adding
|
||||
.sp
|
||||
--enable-newline-is-cr
|
||||
.sp
|
||||
to the \fBconfigure\fP command. There is also a --enable-newline-is-lf option,
|
||||
which explicitly specifies linefeed as the newline character.
|
||||
.sp
|
||||
Alternatively, you can specify that line endings are to be indicated by the two
|
||||
character sequence CRLF. If you want this, add
|
||||
.sp
|
||||
--enable-newline-is-crlf
|
||||
.sp
|
||||
to the \fBconfigure\fP command. Whatever line ending convention is selected
|
||||
when PCRE is built can be overridden when the library functions are called. At
|
||||
build time it is conventional to use the standard for your operating system.
|
||||
.
|
||||
.SH "BUILDING SHARED AND STATIC LIBRARIES"
|
||||
.rs
|
||||
.sp
|
||||
The PCRE building process uses \fBlibtool\fP to build both shared and static
|
||||
Unix libraries by default. You can suppress one of these by adding one of
|
||||
.sp
|
||||
--disable-shared
|
||||
--disable-static
|
||||
.sp
|
||||
to the \fBconfigure\fP command, as required.
|
||||
.
|
||||
.SH "POSIX MALLOC USAGE"
|
||||
.rs
|
||||
.sp
|
||||
When PCRE is called through the POSIX interface (see the
|
||||
.\" HREF
|
||||
\fBpcreposix\fP
|
||||
.\"
|
||||
documentation), additional working storage is required for holding the pointers
|
||||
to capturing substrings, because PCRE requires three integers per substring,
|
||||
whereas the POSIX interface provides only two. If the number of expected
|
||||
substrings is small, the wrapper function uses space on the stack, because this
|
||||
is faster than using \fBmalloc()\fP for each call. The default threshold above
|
||||
which the stack is no longer used is 10; it can be changed by adding a setting
|
||||
such as
|
||||
.sp
|
||||
--with-posix-malloc-threshold=20
|
||||
.sp
|
||||
to the \fBconfigure\fP command.
|
||||
.
|
||||
.SH "HANDLING VERY LARGE PATTERNS"
|
||||
.rs
|
||||
.sp
|
||||
Within a compiled pattern, offset values are used to point from one part to
|
||||
another (for example, from an opening parenthesis to an alternation
|
||||
metacharacter). By default, two-byte values are used for these offsets, leading
|
||||
to a maximum size for a compiled pattern of around 64K. This is sufficient to
|
||||
handle all but the most gigantic patterns. Nevertheless, some people do want to
|
||||
process enormous patterns, so it is possible to compile PCRE to use three-byte
|
||||
or four-byte offsets by adding a setting such as
|
||||
.sp
|
||||
--with-link-size=3
|
||||
.sp
|
||||
to the \fBconfigure\fP command. The value given must be 2, 3, or 4. Using
|
||||
longer offsets slows down the operation of PCRE because it has to load
|
||||
additional bytes when handling them.
|
||||
.P
|
||||
If you build PCRE with an increased link size, test 2 (and test 5 if you are
|
||||
using UTF-8) will fail. Part of the output of these tests is a representation
|
||||
of the compiled pattern, and this changes with the link size.
|
||||
.
|
||||
.SH "AVOIDING EXCESSIVE STACK USAGE"
|
||||
.rs
|
||||
.sp
|
||||
When matching with the \fBpcre_exec()\fP function, PCRE implements backtracking
|
||||
by making recursive calls to an internal function called \fBmatch()\fP. In
|
||||
environments where the size of the stack is limited, this can severely limit
|
||||
PCRE's operation. (The Unix environment does not usually suffer from this
|
||||
problem, but it may sometimes be necessary to increase the maximum stack size.
|
||||
There is a discussion in the
|
||||
.\" HREF
|
||||
\fBpcrestack\fP
|
||||
.\"
|
||||
documentation.) An alternative approach to recursion that uses memory from the
|
||||
heap to remember data, instead of using recursive function calls, has been
|
||||
implemented to work round the problem of limited stack size. If you want to
|
||||
build a version of PCRE that works this way, add
|
||||
.sp
|
||||
--disable-stack-for-recursion
|
||||
.sp
|
||||
to the \fBconfigure\fP command. With this configuration, PCRE will use the
|
||||
\fBpcre_stack_malloc\fP and \fBpcre_stack_free\fP variables to call memory
|
||||
management functions. Separate functions are provided because the usage is very
|
||||
predictable: the block sizes requested are always the same, and the blocks are
|
||||
always freed in reverse order. A calling program might be able to implement
|
||||
optimized functions that perform better than the standard \fBmalloc()\fP and
|
||||
\fBfree()\fP functions. PCRE runs noticeably more slowly when built in this
|
||||
way. This option affects only the \fBpcre_exec()\fP function; it is not
|
||||
relevant for the the \fBpcre_dfa_exec()\fP function.
|
||||
.
|
||||
.SH "LIMITING PCRE RESOURCE USAGE"
|
||||
.rs
|
||||
.sp
|
||||
Internally, PCRE has a function called \fBmatch()\fP, which it calls repeatedly
|
||||
(sometimes recursively) when matching a pattern with the \fBpcre_exec()\fP
|
||||
function. By controlling the maximum number of times this function may be
|
||||
called during a single matching operation, a limit can be placed on the
|
||||
resources used by a single call to \fBpcre_exec()\fP. The limit can be changed
|
||||
at run time, as described in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation. The default is 10 million, but this can be changed by adding a
|
||||
setting such as
|
||||
.sp
|
||||
--with-match-limit=500000
|
||||
.sp
|
||||
to the \fBconfigure\fP command. This setting has no effect on the
|
||||
\fBpcre_dfa_exec()\fP matching function.
|
||||
.P
|
||||
In some environments it is desirable to limit the depth of recursive calls of
|
||||
\fBmatch()\fP more strictly than the total number of calls, in order to
|
||||
restrict the maximum amount of stack (or heap, if --disable-stack-for-recursion
|
||||
is specified) that is used. A second limit controls this; it defaults to the
|
||||
value that is set for --with-match-limit, which imposes no additional
|
||||
constraints. However, you can set a lower limit by adding, for example,
|
||||
.sp
|
||||
--with-match-limit-recursion=10000
|
||||
.sp
|
||||
to the \fBconfigure\fP command. This value can also be overridden at run time.
|
||||
.
|
||||
.SH "USING EBCDIC CODE"
|
||||
.rs
|
||||
.sp
|
||||
PCRE assumes by default that it will run in an environment where the character
|
||||
code is ASCII (or Unicode, which is a superset of ASCII). PCRE can, however, be
|
||||
compiled to run in an EBCDIC environment by adding
|
||||
.sp
|
||||
--enable-ebcdic
|
||||
.sp
|
||||
to the \fBconfigure\fP command.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 06 June 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
161
libs/pcre/doc/pcrecallout.3
Normal file
161
libs/pcre/doc/pcrecallout.3
Normal file
@ -0,0 +1,161 @@
|
||||
.TH PCRECALLOUT 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE CALLOUTS"
|
||||
.rs
|
||||
.sp
|
||||
.B int (*pcre_callout)(pcre_callout_block *);
|
||||
.PP
|
||||
PCRE provides a feature called "callout", which is a means of temporarily
|
||||
passing control to the caller of PCRE in the middle of pattern matching. The
|
||||
caller of PCRE provides an external function by putting its entry point in the
|
||||
global variable \fIpcre_callout\fP. By default, this variable contains NULL,
|
||||
which disables all calling out.
|
||||
.P
|
||||
Within a regular expression, (?C) indicates the points at which the external
|
||||
function is to be called. Different callout points can be identified by putting
|
||||
a number less than 256 after the letter C. The default value is zero.
|
||||
For example, this pattern has two callout points:
|
||||
.sp
|
||||
(?C1)\deabc(?C2)def
|
||||
.sp
|
||||
If the PCRE_AUTO_CALLOUT option bit is set when \fBpcre_compile()\fP is called,
|
||||
PCRE automatically inserts callouts, all with number 255, before each item in
|
||||
the pattern. For example, if PCRE_AUTO_CALLOUT is used with the pattern
|
||||
.sp
|
||||
A(\ed{2}|--)
|
||||
.sp
|
||||
it is processed as if it were
|
||||
.sp
|
||||
(?C255)A(?C255)((?C255)\ed{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
|
||||
.sp
|
||||
Notice that there is a callout before and after each parenthesis and
|
||||
alternation bar. Automatic callouts can be used for tracking the progress of
|
||||
pattern matching. The
|
||||
.\" HREF
|
||||
\fBpcretest\fP
|
||||
.\"
|
||||
command has an option that sets automatic callouts; when it is used, the output
|
||||
indicates how the pattern is matched. This is useful information when you are
|
||||
trying to optimize the performance of a particular pattern.
|
||||
.
|
||||
.
|
||||
.SH "MISSING CALLOUTS"
|
||||
.rs
|
||||
.sp
|
||||
You should be aware that, because of optimizations in the way PCRE matches
|
||||
patterns, callouts sometimes do not happen. For example, if the pattern is
|
||||
.sp
|
||||
ab(?C4)cd
|
||||
.sp
|
||||
PCRE knows that any matching string must contain the letter "d". If the subject
|
||||
string is "abyz", the lack of "d" means that matching doesn't ever start, and
|
||||
the callout is never reached. However, with "abyd", though the result is still
|
||||
no match, the callout is obeyed.
|
||||
.
|
||||
.
|
||||
.SH "THE CALLOUT INTERFACE"
|
||||
.rs
|
||||
.sp
|
||||
During matching, when PCRE reaches a callout point, the external function
|
||||
defined by \fIpcre_callout\fP is called (if it is set). This applies to both
|
||||
the \fBpcre_exec()\fP and the \fBpcre_dfa_exec()\fP matching functions. The
|
||||
only argument to the callout function is a pointer to a \fBpcre_callout\fP
|
||||
block. This structure contains the following fields:
|
||||
.sp
|
||||
int \fIversion\fP;
|
||||
int \fIcallout_number\fP;
|
||||
int *\fIoffset_vector\fP;
|
||||
const char *\fIsubject\fP;
|
||||
int \fIsubject_length\fP;
|
||||
int \fIstart_match\fP;
|
||||
int \fIcurrent_position\fP;
|
||||
int \fIcapture_top\fP;
|
||||
int \fIcapture_last\fP;
|
||||
void *\fIcallout_data\fP;
|
||||
int \fIpattern_position\fP;
|
||||
int \fInext_item_length\fP;
|
||||
.sp
|
||||
The \fIversion\fP field is an integer containing the version number of the
|
||||
block format. The initial version was 0; the current version is 1. The version
|
||||
number will change again in future if additional fields are added, but the
|
||||
intention is never to remove any of the existing fields.
|
||||
.P
|
||||
The \fIcallout_number\fP field contains the number of the callout, as compiled
|
||||
into the pattern (that is, the number after ?C for manual callouts, and 255 for
|
||||
automatically generated callouts).
|
||||
.P
|
||||
The \fIoffset_vector\fP field is a pointer to the vector of offsets that was
|
||||
passed by the caller to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP. When
|
||||
\fBpcre_exec()\fP is used, the contents can be inspected in order to extract
|
||||
substrings that have been matched so far, in the same way as for extracting
|
||||
substrings after a match has completed. For \fBpcre_dfa_exec()\fP this field is
|
||||
not useful.
|
||||
.P
|
||||
The \fIsubject\fP and \fIsubject_length\fP fields contain copies of the values
|
||||
that were passed to \fBpcre_exec()\fP.
|
||||
.P
|
||||
The \fIstart_match\fP field contains the offset within the subject at which the
|
||||
current match attempt started. If the pattern is not anchored, the callout
|
||||
function may be called several times from the same point in the pattern for
|
||||
different starting points in the subject.
|
||||
.P
|
||||
The \fIcurrent_position\fP field contains the offset within the subject of the
|
||||
current match pointer.
|
||||
.P
|
||||
When the \fBpcre_exec()\fP function is used, the \fIcapture_top\fP field
|
||||
contains one more than the number of the highest numbered captured substring so
|
||||
far. If no substrings have been captured, the value of \fIcapture_top\fP is
|
||||
one. This is always the case when \fBpcre_dfa_exec()\fP is used, because it
|
||||
does not support captured substrings.
|
||||
.P
|
||||
The \fIcapture_last\fP field contains the number of the most recently captured
|
||||
substring. If no substrings have been captured, its value is -1. This is always
|
||||
the case when \fBpcre_dfa_exec()\fP is used.
|
||||
.P
|
||||
The \fIcallout_data\fP field contains a value that is passed to
|
||||
\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP specifically so that it can be
|
||||
passed back in callouts. It is passed in the \fIpcre_callout\fP field of the
|
||||
\fBpcre_extra\fP data structure. If no such data was passed, the value of
|
||||
\fIcallout_data\fP in a \fBpcre_callout\fP block is NULL. There is a
|
||||
description of the \fBpcre_extra\fP structure in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
The \fIpattern_position\fP field is present from version 1 of the
|
||||
\fIpcre_callout\fP structure. It contains the offset to the next item to be
|
||||
matched in the pattern string.
|
||||
.P
|
||||
The \fInext_item_length\fP field is present from version 1 of the
|
||||
\fIpcre_callout\fP structure. It contains the length of the next item to be
|
||||
matched in the pattern string. When the callout immediately precedes an
|
||||
alternation bar, a closing parenthesis, or the end of the pattern, the length
|
||||
is zero. When the callout precedes an opening parenthesis, the length is that
|
||||
of the entire subpattern.
|
||||
.P
|
||||
The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
|
||||
help in distinguishing between different automatic callouts, which all have the
|
||||
same callout number. However, they are set for all callouts.
|
||||
.
|
||||
.
|
||||
.SH "RETURN VALUES"
|
||||
.rs
|
||||
.sp
|
||||
The external callout function returns an integer to PCRE. If the value is zero,
|
||||
matching proceeds as normal. If the value is greater than zero, matching fails
|
||||
at the current point, but the testing of other matching possibilities goes
|
||||
ahead, just as if a lookahead assertion had failed. If the value is less than
|
||||
zero, the match is abandoned, and \fBpcre_exec()\fP (or \fBpcre_dfa_exec()\fP)
|
||||
returns the negative value.
|
||||
.P
|
||||
Negative values should normally be chosen from the set of PCRE_ERROR_xxx
|
||||
values. In particular, PCRE_ERROR_NOMATCH forces a standard "no match" failure.
|
||||
The error number PCRE_ERROR_CALLOUT is reserved for use by callout functions;
|
||||
it will never be used by PCRE itself.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 28 February 2005
|
||||
.br
|
||||
Copyright (c) 1997-2005 University of Cambridge.
|
126
libs/pcre/doc/pcrecompat.3
Normal file
126
libs/pcre/doc/pcrecompat.3
Normal file
@ -0,0 +1,126 @@
|
||||
.TH PCRECOMPAT 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "DIFFERENCES BETWEEN PCRE AND PERL"
|
||||
.rs
|
||||
.sp
|
||||
This document describes the differences in the ways that PCRE and Perl handle
|
||||
regular expressions. The differences described here are with respect to Perl
|
||||
5.8.
|
||||
.P
|
||||
1. PCRE has only a subset of Perl's UTF-8 and Unicode support. Details of what
|
||||
it does have are given in the
|
||||
.\" HTML <a href="pcre.html#utf8support">
|
||||
.\" </a>
|
||||
section on UTF-8 support
|
||||
.\"
|
||||
in the main
|
||||
.\" HREF
|
||||
\fBpcre\fP
|
||||
.\"
|
||||
page.
|
||||
.P
|
||||
2. PCRE does not allow repeat quantifiers on lookahead assertions. Perl permits
|
||||
them, but they do not mean what you might think. For example, (?!a){3} does
|
||||
not assert that the next three characters are not "a". It just asserts that the
|
||||
next character is not "a" three times.
|
||||
.P
|
||||
3. Capturing subpatterns that occur inside negative lookahead assertions are
|
||||
counted, but their entries in the offsets vector are never set. Perl sets its
|
||||
numerical variables from any such patterns that are matched before the
|
||||
assertion fails to match something (thereby succeeding), but only if the
|
||||
negative lookahead assertion contains just one branch.
|
||||
.P
|
||||
4. Though binary zero characters are supported in the subject string, they are
|
||||
not allowed in a pattern string because it is passed as a normal C string,
|
||||
terminated by zero. The escape sequence \e0 can be used in the pattern to
|
||||
represent a binary zero.
|
||||
.P
|
||||
5. The following Perl escape sequences are not supported: \el, \eu, \eL,
|
||||
\eU, and \eN. In fact these are implemented by Perl's general string-handling
|
||||
and are not part of its pattern matching engine. If any of these are
|
||||
encountered by PCRE, an error is generated.
|
||||
.P
|
||||
6. The Perl escape sequences \ep, \eP, and \eX are supported only if PCRE is
|
||||
built with Unicode character property support. The properties that can be
|
||||
tested with \ep and \eP are limited to the general category properties such as
|
||||
Lu and Nd, script names such as Greek or Han, and the derived properties Any
|
||||
and L&.
|
||||
.P
|
||||
7. PCRE does support the \eQ...\eE escape for quoting substrings. Characters in
|
||||
between are treated as literals. This is slightly different from Perl in that $
|
||||
and @ are also handled as literals inside the quotes. In Perl, they cause
|
||||
variable interpolation (but of course PCRE does not have variables). Note the
|
||||
following examples:
|
||||
.sp
|
||||
Pattern PCRE matches Perl matches
|
||||
.sp
|
||||
.\" JOIN
|
||||
\eQabc$xyz\eE abc$xyz abc followed by the
|
||||
contents of $xyz
|
||||
\eQabc\e$xyz\eE abc\e$xyz abc\e$xyz
|
||||
\eQabc\eE\e$\eQxyz\eE abc$xyz abc$xyz
|
||||
.sp
|
||||
The \eQ...\eE sequence is recognized both inside and outside character classes.
|
||||
.P
|
||||
8. Fairly obviously, PCRE does not support the (?{code}) and (?p{code})
|
||||
constructions. However, there is support for recursive patterns using the
|
||||
non-Perl items (?R), (?number), and (?P>name). Also, the PCRE "callout" feature
|
||||
allows an external function to be called during pattern matching. See the
|
||||
.\" HREF
|
||||
\fBpcrecallout\fP
|
||||
.\"
|
||||
documentation for details.
|
||||
.P
|
||||
9. There are some differences that are concerned with the settings of captured
|
||||
strings when part of a pattern is repeated. For example, matching "aba" against
|
||||
the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE it is set to "b".
|
||||
.P
|
||||
10. PCRE provides some extensions to the Perl regular expression facilities:
|
||||
.sp
|
||||
(a) Although lookbehind assertions must match fixed length strings, each
|
||||
alternative branch of a lookbehind assertion can match a different length of
|
||||
string. Perl requires them all to have the same length.
|
||||
.sp
|
||||
(b) If PCRE_DOLLAR_ENDONLY is set and PCRE_MULTILINE is not set, the $
|
||||
meta-character matches only at the very end of the string.
|
||||
.sp
|
||||
(c) If PCRE_EXTRA is set, a backslash followed by a letter with no special
|
||||
meaning is faulted. Otherwise, like Perl, the backslash is ignored. (Perl can
|
||||
be made to issue a warning.)
|
||||
.sp
|
||||
(d) If PCRE_UNGREEDY is set, the greediness of the repetition quantifiers is
|
||||
inverted, that is, by default they are not greedy, but if followed by a
|
||||
question mark they are.
|
||||
.sp
|
||||
(e) PCRE_ANCHORED can be used at matching time to force a pattern to be tried
|
||||
only at the first matching position in the subject string.
|
||||
.sp
|
||||
(f) The PCRE_NOTBOL, PCRE_NOTEOL, PCRE_NOTEMPTY, and PCRE_NO_AUTO_CAPTURE
|
||||
options for \fBpcre_exec()\fP have no Perl equivalents.
|
||||
.sp
|
||||
(g) The (?R), (?number), and (?P>name) constructs allows for recursive pattern
|
||||
matching (Perl can do this using the (?p{code}) construct, which PCRE cannot
|
||||
support.)
|
||||
.sp
|
||||
(h) PCRE supports named capturing substrings, using the Python syntax.
|
||||
.sp
|
||||
(i) PCRE supports the possessive quantifier "++" syntax, taken from Sun's Java
|
||||
package.
|
||||
.sp
|
||||
(j) The (R) condition, for testing recursion, is a PCRE extension.
|
||||
.sp
|
||||
(k) The callout facility is PCRE-specific.
|
||||
.sp
|
||||
(l) The partial matching facility is PCRE-specific.
|
||||
.sp
|
||||
(m) Patterns compiled by PCRE can be saved and re-used at a later time, even on
|
||||
different hosts that have the other endianness.
|
||||
.sp
|
||||
(n) The alternative matching function (\fBpcre_dfa_exec()\fP) matches in a
|
||||
different way and is not Perl-compatible.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 06 June 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
312
libs/pcre/doc/pcrecpp.3
Normal file
312
libs/pcre/doc/pcrecpp.3
Normal file
@ -0,0 +1,312 @@
|
||||
.TH PCRECPP 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions.
|
||||
.SH "SYNOPSIS OF C++ WRAPPER"
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcrecpp.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
The C++ wrapper for PCRE was provided by Google Inc. Some additional
|
||||
functionality was added by Giuseppe Maxia. This brief man page was constructed
|
||||
from the notes in the \fIpcrecpp.h\fP file, which should be consulted for
|
||||
further details.
|
||||
.
|
||||
.
|
||||
.SH "MATCHING INTERFACE"
|
||||
.rs
|
||||
.sp
|
||||
The "FullMatch" operation checks that supplied text matches a supplied pattern
|
||||
exactly. If pointer arguments are supplied, it copies matched sub-strings that
|
||||
match sub-patterns into them.
|
||||
.sp
|
||||
Example: successful match
|
||||
pcrecpp::RE re("h.*o");
|
||||
re.FullMatch("hello");
|
||||
.sp
|
||||
Example: unsuccessful match (requires full match):
|
||||
pcrecpp::RE re("e");
|
||||
!re.FullMatch("hello");
|
||||
.sp
|
||||
Example: creating a temporary RE object:
|
||||
pcrecpp::RE("h.*o").FullMatch("hello");
|
||||
.sp
|
||||
You can pass in a "const char*" or a "string" for "text". The examples below
|
||||
tend to use a const char*. You can, as in the different examples above, store
|
||||
the RE object explicitly in a variable or use a temporary RE object. The
|
||||
examples below use one mode or the other arbitrarily. Either could correctly be
|
||||
used for any of these examples.
|
||||
.P
|
||||
You must supply extra pointer arguments to extract matched subpieces.
|
||||
.sp
|
||||
Example: extracts "ruby" into "s" and 1234 into "i"
|
||||
int i;
|
||||
string s;
|
||||
pcrecpp::RE re("(\e\ew+):(\e\ed+)");
|
||||
re.FullMatch("ruby:1234", &s, &i);
|
||||
.sp
|
||||
Example: does not try to extract any extra sub-patterns
|
||||
re.FullMatch("ruby:1234", &s);
|
||||
.sp
|
||||
Example: does not try to extract into NULL
|
||||
re.FullMatch("ruby:1234", NULL, &i);
|
||||
.sp
|
||||
Example: integer overflow causes failure
|
||||
!re.FullMatch("ruby:1234567891234", NULL, &i);
|
||||
.sp
|
||||
Example: fails because there aren't enough sub-patterns:
|
||||
!pcrecpp::RE("\e\ew+:\e\ed+").FullMatch("ruby:1234", &s);
|
||||
.sp
|
||||
Example: fails because string cannot be stored in integer
|
||||
!pcrecpp::RE("(.*)").FullMatch("ruby", &i);
|
||||
.sp
|
||||
The provided pointer arguments can be pointers to any scalar numeric
|
||||
type, or one of:
|
||||
.sp
|
||||
string (matched piece is copied to string)
|
||||
StringPiece (StringPiece is mutated to point to matched piece)
|
||||
T (where "bool T::ParseFrom(const char*, int)" exists)
|
||||
NULL (the corresponding matched sub-pattern is not copied)
|
||||
.sp
|
||||
The function returns true iff all of the following conditions are satisfied:
|
||||
.sp
|
||||
a. "text" matches "pattern" exactly;
|
||||
.sp
|
||||
b. The number of matched sub-patterns is >= number of supplied
|
||||
pointers;
|
||||
.sp
|
||||
c. The "i"th argument has a suitable type for holding the
|
||||
string captured as the "i"th sub-pattern. If you pass in
|
||||
NULL for the "i"th argument, or pass fewer arguments than
|
||||
number of sub-patterns, "i"th captured sub-pattern is
|
||||
ignored.
|
||||
.sp
|
||||
The matching interface supports at most 16 arguments per call.
|
||||
If you need more, consider using the more general interface
|
||||
\fBpcrecpp::RE::DoMatch\fP. See \fBpcrecpp.h\fP for the signature for
|
||||
\fBDoMatch\fP.
|
||||
.
|
||||
.SH "PARTIAL MATCHES"
|
||||
.rs
|
||||
.sp
|
||||
You can use the "PartialMatch" operation when you want the pattern
|
||||
to match any substring of the text.
|
||||
.sp
|
||||
Example: simple search for a string:
|
||||
pcrecpp::RE("ell").PartialMatch("hello");
|
||||
.sp
|
||||
Example: find first number in a string:
|
||||
int number;
|
||||
pcrecpp::RE re("(\e\ed+)");
|
||||
re.PartialMatch("x*100 + 20", &number);
|
||||
assert(number == 100);
|
||||
.
|
||||
.
|
||||
.SH "UTF-8 AND THE MATCHING INTERFACE"
|
||||
.rs
|
||||
.sp
|
||||
By default, pattern and text are plain text, one byte per character. The UTF8
|
||||
flag, passed to the constructor, causes both pattern and string to be treated
|
||||
as UTF-8 text, still a byte stream but potentially multiple bytes per
|
||||
character. In practice, the text is likelier to be UTF-8 than the pattern, but
|
||||
the match returned may depend on the UTF8 flag, so always use it when matching
|
||||
UTF8 text. For example, "." will match one byte normally but with UTF8 set may
|
||||
match up to three bytes of a multi-byte character.
|
||||
.sp
|
||||
Example:
|
||||
pcrecpp::RE_Options options;
|
||||
options.set_utf8();
|
||||
pcrecpp::RE re(utf8_pattern, options);
|
||||
re.FullMatch(utf8_string);
|
||||
.sp
|
||||
Example: using the convenience function UTF8():
|
||||
pcrecpp::RE re(utf8_pattern, pcrecpp::UTF8());
|
||||
re.FullMatch(utf8_string);
|
||||
.sp
|
||||
NOTE: The UTF8 flag is ignored if pcre was not configured with the
|
||||
--enable-utf8 flag.
|
||||
.
|
||||
.
|
||||
.SH "PASSING MODIFIERS TO THE REGULAR EXPRESSION ENGINE"
|
||||
.rs
|
||||
.sp
|
||||
PCRE defines some modifiers to change the behavior of the regular expression
|
||||
engine. The C++ wrapper defines an auxiliary class, RE_Options, as a vehicle to
|
||||
pass such modifiers to a RE class. Currently, the following modifiers are
|
||||
supported:
|
||||
.sp
|
||||
modifier description Perl corresponding
|
||||
.sp
|
||||
PCRE_CASELESS case insensitive match /i
|
||||
PCRE_MULTILINE multiple lines match /m
|
||||
PCRE_DOTALL dot matches newlines /s
|
||||
PCRE_DOLLAR_ENDONLY $ matches only at end N/A
|
||||
PCRE_EXTRA strict escape parsing N/A
|
||||
PCRE_EXTENDED ignore whitespaces /x
|
||||
PCRE_UTF8 handles UTF8 chars built-in
|
||||
PCRE_UNGREEDY reverses * and *? N/A
|
||||
PCRE_NO_AUTO_CAPTURE disables capturing parens N/A (*)
|
||||
.sp
|
||||
(*) Both Perl and PCRE allow non capturing parentheses by means of the
|
||||
"?:" modifier within the pattern itself. e.g. (?:ab|cd) does not
|
||||
capture, while (ab|cd) does.
|
||||
.P
|
||||
For a full account on how each modifier works, please check the
|
||||
PCRE API reference page.
|
||||
.P
|
||||
For each modifier, there are two member functions whose name is made
|
||||
out of the modifier in lowercase, without the "PCRE_" prefix. For
|
||||
instance, PCRE_CASELESS is handled by
|
||||
.sp
|
||||
bool caseless()
|
||||
.sp
|
||||
which returns true if the modifier is set, and
|
||||
.sp
|
||||
RE_Options & set_caseless(bool)
|
||||
.sp
|
||||
which sets or unsets the modifier. Moreover, PCRE_EXTRA_MATCH_LIMIT can be
|
||||
accessed through the \fBset_match_limit()\fR and \fBmatch_limit()\fR member
|
||||
functions. Setting \fImatch_limit\fR to a non-zero value will limit the
|
||||
execution of pcre to keep it from doing bad things like blowing the stack or
|
||||
taking an eternity to return a result. A value of 5000 is good enough to stop
|
||||
stack blowup in a 2MB thread stack. Setting \fImatch_limit\fR to zero disables
|
||||
match limiting. Alternatively, you can call \fBmatch_limit_recursion()\fP
|
||||
which uses PCRE_EXTRA_MATCH_LIMIT_RECURSION to limit how much PCRE
|
||||
recurses. \fBmatch_limit()\fP limits the number of matches PCRE does;
|
||||
\fBmatch_limit_recursion()\fP limits the depth of internal recursion, and
|
||||
therefore the amount of stack that is used.
|
||||
.P
|
||||
Normally, to pass one or more modifiers to a RE class, you declare
|
||||
a \fIRE_Options\fR object, set the appropriate options, and pass this
|
||||
object to a RE constructor. Example:
|
||||
.sp
|
||||
RE_options opt;
|
||||
opt.set_caseless(true);
|
||||
if (RE("HELLO", opt).PartialMatch("hello world")) ...
|
||||
.sp
|
||||
RE_options has two constructors. The default constructor takes no arguments and
|
||||
creates a set of flags that are off by default. The optional parameter
|
||||
\fIoption_flags\fR is to facilitate transfer of legacy code from C programs.
|
||||
This lets you do
|
||||
.sp
|
||||
RE(pattern,
|
||||
RE_Options(PCRE_CASELESS|PCRE_MULTILINE)).PartialMatch(str);
|
||||
.sp
|
||||
However, new code is better off doing
|
||||
.sp
|
||||
RE(pattern,
|
||||
RE_Options().set_caseless(true).set_multiline(true))
|
||||
.PartialMatch(str);
|
||||
.sp
|
||||
If you are going to pass one of the most used modifiers, there are some
|
||||
convenience functions that return a RE_Options class with the
|
||||
appropriate modifier already set: \fBCASELESS()\fR, \fBUTF8()\fR,
|
||||
\fBMULTILINE()\fR, \fBDOTALL\fR(), and \fBEXTENDED()\fR.
|
||||
.P
|
||||
If you need to set several options at once, and you don't want to go through
|
||||
the pains of declaring a RE_Options object and setting several options, there
|
||||
is a parallel method that give you such ability on the fly. You can concatenate
|
||||
several \fBset_xxxxx()\fR member functions, since each of them returns a
|
||||
reference to its class object. For example, to pass PCRE_CASELESS,
|
||||
PCRE_EXTENDED, and PCRE_MULTILINE to a RE with one statement, you may write:
|
||||
.sp
|
||||
RE(" ^ xyz \e\es+ .* blah$",
|
||||
RE_Options()
|
||||
.set_caseless(true)
|
||||
.set_extended(true)
|
||||
.set_multiline(true)).PartialMatch(sometext);
|
||||
.sp
|
||||
.
|
||||
.
|
||||
.SH "SCANNING TEXT INCREMENTALLY"
|
||||
.rs
|
||||
.sp
|
||||
The "Consume" operation may be useful if you want to repeatedly
|
||||
match regular expressions at the front of a string and skip over
|
||||
them as they match. This requires use of the "StringPiece" type,
|
||||
which represents a sub-range of a real string. Like RE, StringPiece
|
||||
is defined in the pcrecpp namespace.
|
||||
.sp
|
||||
Example: read lines of the form "var = value" from a string.
|
||||
string contents = ...; // Fill string somehow
|
||||
pcrecpp::StringPiece input(contents); // Wrap in a StringPiece
|
||||
|
||||
string var;
|
||||
int value;
|
||||
pcrecpp::RE re("(\e\ew+) = (\e\ed+)\en");
|
||||
while (re.Consume(&input, &var, &value)) {
|
||||
...;
|
||||
}
|
||||
.sp
|
||||
Each successful call to "Consume" will set "var/value", and also
|
||||
advance "input" so it points past the matched text.
|
||||
.P
|
||||
The "FindAndConsume" operation is similar to "Consume" but does not
|
||||
anchor your match at the beginning of the string. For example, you
|
||||
could extract all words from a string by repeatedly calling
|
||||
.sp
|
||||
pcrecpp::RE("(\e\ew+)").FindAndConsume(&input, &word)
|
||||
.
|
||||
.
|
||||
.SH "PARSING HEX/OCTAL/C-RADIX NUMBERS"
|
||||
.rs
|
||||
.sp
|
||||
By default, if you pass a pointer to a numeric value, the
|
||||
corresponding text is interpreted as a base-10 number. You can
|
||||
instead wrap the pointer with a call to one of the operators Hex(),
|
||||
Octal(), or CRadix() to interpret the text in another base. The
|
||||
CRadix operator interprets C-style "0" (base-8) and "0x" (base-16)
|
||||
prefixes, but defaults to base-10.
|
||||
.sp
|
||||
Example:
|
||||
int a, b, c, d;
|
||||
pcrecpp::RE re("(.*) (.*) (.*) (.*)");
|
||||
re.FullMatch("100 40 0100 0x40",
|
||||
pcrecpp::Octal(&a), pcrecpp::Hex(&b),
|
||||
pcrecpp::CRadix(&c), pcrecpp::CRadix(&d));
|
||||
.sp
|
||||
will leave 64 in a, b, c, and d.
|
||||
.
|
||||
.
|
||||
.SH "REPLACING PARTS OF STRINGS"
|
||||
.rs
|
||||
.sp
|
||||
You can replace the first match of "pattern" in "str" with "rewrite".
|
||||
Within "rewrite", backslash-escaped digits (\e1 to \e9) can be
|
||||
used to insert text matching corresponding parenthesized group
|
||||
from the pattern. \e0 in "rewrite" refers to the entire matching
|
||||
text. For example:
|
||||
.sp
|
||||
string s = "yabba dabba doo";
|
||||
pcrecpp::RE("b+").Replace("d", &s);
|
||||
.sp
|
||||
will leave "s" containing "yada dabba doo". The result is true if the pattern
|
||||
matches and a replacement occurs, false otherwise.
|
||||
.P
|
||||
\fBGlobalReplace\fP is like \fBReplace\fP except that it replaces all
|
||||
occurrences of the pattern in the string with the rewrite. Replacements are
|
||||
not subject to re-matching. For example:
|
||||
.sp
|
||||
string s = "yabba dabba doo";
|
||||
pcrecpp::RE("b+").GlobalReplace("d", &s);
|
||||
.sp
|
||||
will leave "s" containing "yada dada doo". It returns the number of
|
||||
replacements made.
|
||||
.P
|
||||
\fBExtract\fP is like \fBReplace\fP, except that if the pattern matches,
|
||||
"rewrite" is copied into "out" (an additional argument) with substitutions.
|
||||
The non-matching portions of "text" are ignored. Returns true iff a match
|
||||
occurred and the extraction happened successfully; if no match occurs, the
|
||||
string is left unaffected.
|
||||
.
|
||||
.
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
The C++ wrapper was contributed by Google Inc.
|
||||
.br
|
||||
Copyright (c) 2005 Google Inc.
|
376
libs/pcre/doc/pcregrep.1
Normal file
376
libs/pcre/doc/pcregrep.1
Normal file
@ -0,0 +1,376 @@
|
||||
.TH PCREGREP 1
|
||||
.SH NAME
|
||||
pcregrep - a grep with Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
.B pcregrep [options] [long options] [pattern] [path1 path2 ...]
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
\fBpcregrep\fP searches files for character patterns, in the same way as other
|
||||
grep commands do, but it uses the PCRE regular expression library to support
|
||||
patterns that are compatible with the regular expressions of Perl 5. See
|
||||
.\" HREF
|
||||
\fBpcrepattern\fP
|
||||
.\"
|
||||
for a full description of syntax and semantics of the regular expressions that
|
||||
PCRE supports.
|
||||
.P
|
||||
Patterns, whether supplied on the command line or in a separate file, are given
|
||||
without delimiters. For example:
|
||||
.sp
|
||||
pcregrep Thursday /etc/motd
|
||||
.sp
|
||||
If you attempt to use delimiters (for example, by surrounding a pattern with
|
||||
slashes, as is common in Perl scripts), they are interpreted as part of the
|
||||
pattern. Quotes can of course be used on the command line because they are
|
||||
interpreted by the shell, and indeed they are required if a pattern contains
|
||||
white space or shell metacharacters.
|
||||
.P
|
||||
The first argument that follows any option settings is treated as the single
|
||||
pattern to be matched when neither \fB-e\fP nor \fB-f\fP is present.
|
||||
Conversely, when one or both of these options are used to specify patterns, all
|
||||
arguments are treated as path names. At least one of \fB-e\fP, \fB-f\fP, or an
|
||||
argument pattern must be provided.
|
||||
.P
|
||||
If no files are specified, \fBpcregrep\fP reads the standard input. The
|
||||
standard input can also be referenced by a name consisting of a single hyphen.
|
||||
For example:
|
||||
.sp
|
||||
pcregrep some-pattern /file1 - /file3
|
||||
.sp
|
||||
By default, each line that matches the pattern is copied to the standard
|
||||
output, and if there is more than one file, the file name is output at the
|
||||
start of each line. However, there are options that can change how
|
||||
\fBpcregrep\fP behaves. In particular, the \fB-M\fP option makes it possible to
|
||||
search for patterns that span line boundaries. What defines a line boundary is
|
||||
controlled by the \fB-N\fP (\fB--newline\fP) option.
|
||||
.P
|
||||
Patterns are limited to 8K or BUFSIZ characters, whichever is the greater.
|
||||
BUFSIZ is defined in \fB<stdio.h>\fP.
|
||||
.P
|
||||
If the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variable is set,
|
||||
\fBpcregrep\fP uses the value to set a locale when calling the PCRE library.
|
||||
The \fB--locale\fP option can be used to override this.
|
||||
.
|
||||
.SH OPTIONS
|
||||
.rs
|
||||
.TP 10
|
||||
\fB--\fP
|
||||
This terminate the list of options. It is useful if the next item on the
|
||||
command line starts with a hyphen but is not an option. This allows for the
|
||||
processing of patterns and filenames that start with hyphens.
|
||||
.TP
|
||||
\fB-A\fP \fInumber\fP, \fB--after-context=\fP\fInumber\fP
|
||||
Output \fInumber\fP lines of context after each matching line. If filenames
|
||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||
colon for the context lines. A line containing "--" is output between each
|
||||
group of lines, unless they are in fact contiguous in the input file. The value
|
||||
of \fInumber\fP is expected to be relatively small. However, \fBpcregrep\fP
|
||||
guarantees to have up to 8K of following text available for context output.
|
||||
.TP
|
||||
\fB-B\fP \fInumber\fP, \fB--before-context=\fP\fInumber\fP
|
||||
Output \fInumber\fP lines of context before each matching line. If filenames
|
||||
and/or line numbers are being output, a hyphen separator is used instead of a
|
||||
colon for the context lines. A line containing "--" is output between each
|
||||
group of lines, unless they are in fact contiguous in the input file. The value
|
||||
of \fInumber\fP is expected to be relatively small. However, \fBpcregrep\fP
|
||||
guarantees to have up to 8K of preceding text available for context output.
|
||||
.TP
|
||||
\fB-C\fP \fInumber\fP, \fB--context=\fP\fInumber\fP
|
||||
Output \fInumber\fP lines of context both before and after each matching line.
|
||||
This is equivalent to setting both \fB-A\fP and \fB-B\fP to the same value.
|
||||
.TP
|
||||
\fB-c\fP, \fB--count\fP
|
||||
Do not output individual lines; instead just output a count of the number of
|
||||
lines that would otherwise have been output. If several files are given, a
|
||||
count is output for each of them. In this mode, the \fB-A\fP, \fB-B\fP, and
|
||||
\fB-C\fP options are ignored.
|
||||
.TP
|
||||
\fB--colour\fP, \fB--color\fP
|
||||
If this option is given without any data, it is equivalent to "--colour=auto".
|
||||
If data is required, it must be given in the same shell item, separated by an
|
||||
equals sign.
|
||||
.TP
|
||||
\fB--colour=\fP\fIvalue\fP, \fB--color=\fP\fIvalue\fP
|
||||
This option specifies under what circumstances the part of a line that matched
|
||||
a pattern should be coloured in the output. The value may be "never" (the
|
||||
default), "always", or "auto". In the latter case, colouring happens only if
|
||||
the standard output is connected to a terminal. The colour can be specified by
|
||||
setting the environment variable PCREGREP_COLOUR or PCREGREP_COLOR. The value
|
||||
of this variable should be a string of two numbers, separated by a semicolon.
|
||||
They are copied directly into the control string for setting colour on a
|
||||
terminal, so it is your responsibility to ensure that they make sense. If
|
||||
neither of the environment variables is set, the default is "1;31", which gives
|
||||
red.
|
||||
.TP
|
||||
\fB-D\fP \fIaction\fP, \fB--devices=\fP\fIaction\fP
|
||||
If an input path is not a regular file or a directory, "action" specifies how
|
||||
it is to be processed. Valid values are "read" (the default) or "skip"
|
||||
(silently skip the path).
|
||||
.TP
|
||||
\fB-d\fP \fIaction\fP, \fB--directories=\fP\fIaction\fP
|
||||
If an input path is a directory, "action" specifies how it is to be processed.
|
||||
Valid values are "read" (the default), "recurse" (equivalent to the \fB-r\fP
|
||||
option), or "skip" (silently skip the path). In the default case, directories
|
||||
are read as if they were ordinary files. In some operating systems the effect
|
||||
of reading a directory like this is an immediate end-of-file.
|
||||
.TP
|
||||
\fB-e\fP \fIpattern\fP, \fB--regex=\fP\fIpattern\fP,
|
||||
\fB--regexp=\fP\fIpattern\fP Specify a pattern to be matched. This option can
|
||||
be used multiple times in order to specify several patterns. It can also be
|
||||
used as a way of specifying a single pattern that starts with a hyphen. When
|
||||
\fB-e\fP is used, no argument pattern is taken from the command line; all
|
||||
arguments are treated as file names. There is an overall maximum of 100
|
||||
patterns. They are applied to each line in the order in which they are defined
|
||||
until one matches (or fails to match if \fB-v\fP is used). If \fB-f\fP is used
|
||||
with \fB-e\fP, the command line patterns are matched first, followed by the
|
||||
patterns from the file, independent of the order in which these options are
|
||||
specified. Note that multiple use of \fB-e\fP is not the same as a single
|
||||
pattern with alternatives. For example, X|Y finds the first character in a line
|
||||
that is X or Y, whereas if the two patterns are given separately,
|
||||
\fBpcregrep\fP finds X if it is present, even if it follows Y in the line. It
|
||||
finds Y only if there is no X in the line. This really matters only if you are
|
||||
using \fB-o\fP to show the portion of the line that matched.
|
||||
.TP
|
||||
\fB--exclude\fP=\fIpattern\fP
|
||||
When \fBpcregrep\fP is searching the files in a directory as a consequence of
|
||||
the \fB-r\fP (recursive search) option, any files whose names match the pattern
|
||||
are excluded. The pattern is a PCRE regular expression. If a file name matches
|
||||
both \fB--include\fP and \fB--exclude\fP, it is excluded. There is no short
|
||||
form for this option.
|
||||
.TP
|
||||
\fB-F\fP, \fB--fixed-strings\fP
|
||||
Interpret each pattern as a list of fixed strings, separated by newlines,
|
||||
instead of as a regular expression. The \fB-w\fP (match as a word) and \fB-x\fP
|
||||
(match whole line) options can be used with \fB-F\fP. They apply to each of the
|
||||
fixed strings. A line is selected if any of the fixed strings are found in it
|
||||
(subject to \fB-w\fP or \fB-x\fP, if present).
|
||||
.TP
|
||||
\fB-f\fP \fIfilename\fP, \fB--file=\fP\fIfilename\fP
|
||||
Read a number of patterns from the file, one per line, and match them against
|
||||
each line of input. A data line is output if any of the patterns match it. The
|
||||
filename can be given as "-" to refer to the standard input. When \fB-f\fP is
|
||||
used, patterns specified on the command line using \fB-e\fP may also be
|
||||
present; they are tested before the file's patterns. However, no other pattern
|
||||
is taken from the command line; all arguments are treated as file names. There
|
||||
is an overall maximum of 100 patterns. Trailing white space is removed from
|
||||
each line, and blank lines are ignored. An empty file contains no patterns and
|
||||
therefore matches nothing.
|
||||
.TP
|
||||
\fB-H\fP, \fB--with-filename\fP
|
||||
Force the inclusion of the filename at the start of output lines when searching
|
||||
a single file. By default, the filename is not shown in this case. For matching
|
||||
lines, the filename is followed by a colon and a space; for context lines, a
|
||||
hyphen separator is used. If a line number is also being output, it follows the
|
||||
file name without a space.
|
||||
.TP
|
||||
\fB-h\fP, \fB--no-filename\fP
|
||||
Suppress the output filenames when searching multiple files. By default,
|
||||
filenames are shown when multiple files are searched. For matching lines, the
|
||||
filename is followed by a colon and a space; for context lines, a hyphen
|
||||
separator is used. If a line number is also being output, it follows the file
|
||||
name without a space.
|
||||
.TP
|
||||
\fB--help\fP
|
||||
Output a brief help message and exit.
|
||||
.TP
|
||||
\fB-i\fP, \fB--ignore-case\fP
|
||||
Ignore upper/lower case distinctions during comparisons.
|
||||
.TP
|
||||
\fB--include\fP=\fIpattern\fP
|
||||
When \fBpcregrep\fP is searching the files in a directory as a consequence of
|
||||
the \fB-r\fP (recursive search) option, only those files whose names match the
|
||||
pattern are included. The pattern is a PCRE regular expression. If a file name
|
||||
matches both \fB--include\fP and \fB--exclude\fP, it is excluded. There is no
|
||||
short form for this option.
|
||||
.TP
|
||||
\fB-L\fP, \fB--files-without-match\fP
|
||||
Instead of outputting lines from the files, just output the names of the files
|
||||
that do not contain any lines that would have been output. Each file name is
|
||||
output once, on a separate line.
|
||||
.TP
|
||||
\fB-l\fP, \fB--files-with-matches\fP
|
||||
Instead of outputting lines from the files, just output the names of the files
|
||||
containing lines that would have been output. Each file name is output
|
||||
once, on a separate line. Searching stops as soon as a matching line is found
|
||||
in a file.
|
||||
.TP
|
||||
\fB--label\fP=\fIname\fP
|
||||
This option supplies a name to be used for the standard input when file names
|
||||
are being output. If not supplied, "(standard input)" is used. There is no
|
||||
short form for this option.
|
||||
.TP
|
||||
\fB--locale\fP=\fIlocale-name\fP
|
||||
This option specifies a locale to be used for pattern matching. It overrides
|
||||
the value in the \fBLC_ALL\fP or \fBLC_CTYPE\fP environment variables. If no
|
||||
locale is specified, the PCRE library's default (usually the "C" locale) is
|
||||
used. There is no short form for this option.
|
||||
.TP
|
||||
\fB-M\fP, \fB--multiline\fP
|
||||
Allow patterns to match more than one line. When this option is given, patterns
|
||||
may usefully contain literal newline characters and internal occurrences of ^
|
||||
and $ characters. The output for any one match may consist of more than one
|
||||
line. When this option is set, the PCRE library is called in "multiline" mode.
|
||||
There is a limit to the number of lines that can be matched, imposed by the way
|
||||
that \fBpcregrep\fP buffers the input file as it scans it. However,
|
||||
\fBpcregrep\fP ensures that at least 8K characters or the rest of the document
|
||||
(whichever is the shorter) are available for forward matching, and similarly
|
||||
the previous 8K characters (or all the previous characters, if fewer than 8K)
|
||||
are guaranteed to be available for lookbehind assertions.
|
||||
.TP
|
||||
\fB-N\fP \fInewline-type\fP, \fB--newline=\fP\fInewline-type\fP
|
||||
The PCRE library supports three different character sequences for indicating
|
||||
the ends of lines. They are the single-character sequences CR (carriage return)
|
||||
and LF (linefeed), and the two-character sequence CR, LF. When the library is
|
||||
built, a default line-ending sequence is specified. This is normally the
|
||||
standard sequence for the operating system. Unless otherwise specified by this
|
||||
option, \fBpcregrep\fP uses the default. The possible values for this option
|
||||
are CR, LF, or CRLF. This makes it possible to use \fBpcregrep\fP on files that
|
||||
have come from other environments without having to modify their line endings.
|
||||
If the data that is being scanned does not agree with the convention set by
|
||||
this option, \fBpcregrep\fP may behave in strange ways.
|
||||
.TP
|
||||
\fB-n\fP, \fB--line-number\fP
|
||||
Precede each output line by its line number in the file, followed by a colon
|
||||
and a space for matching lines or a hyphen and a space for context lines. If
|
||||
the filename is also being output, it precedes the line number.
|
||||
.TP
|
||||
\fB-o\fP, \fB--only-matching\fP
|
||||
Show only the part of the line that matched a pattern. In this mode, no
|
||||
context is shown. That is, the \fB-A\fP, \fB-B\fP, and \fB-C\fP options are
|
||||
ignored.
|
||||
.TP
|
||||
\fB-q\fP, \fB--quiet\fP
|
||||
Work quietly, that is, display nothing except error messages. The exit
|
||||
status indicates whether or not any matches were found.
|
||||
.TP
|
||||
\fB-r\fP, \fB--recursive\fP
|
||||
If any given path is a directory, recursively scan the files it contains,
|
||||
taking note of any \fB--include\fP and \fB--exclude\fP settings. By default, a
|
||||
directory is read as a normal file; in some operating systems this gives an
|
||||
immediate end-of-file. This option is a shorthand for setting the \fB-d\fP
|
||||
option to "recurse".
|
||||
.TP
|
||||
\fB-s\fP, \fB--no-messages\fP
|
||||
Suppress error messages about non-existent or unreadable files. Such files are
|
||||
quietly skipped. However, the return code is still 2, even if matches were
|
||||
found in other files.
|
||||
.TP
|
||||
\fB-u\fP, \fB--utf-8\fP
|
||||
Operate in UTF-8 mode. This option is available only if PCRE has been compiled
|
||||
with UTF-8 support. Both patterns and subject lines must be valid strings of
|
||||
UTF-8 characters.
|
||||
.TP
|
||||
\fB-V\fP, \fB--version\fP
|
||||
Write the version numbers of \fBpcregrep\fP and the PCRE library that is being
|
||||
used to the standard error stream.
|
||||
.TP
|
||||
\fB-v\fP, \fB--invert-match\fP
|
||||
Invert the sense of the match, so that lines which do \fInot\fP match any of
|
||||
the patterns are the ones that are found.
|
||||
.TP
|
||||
\fB-w\fP, \fB--word-regex\fP, \fB--word-regexp\fP
|
||||
Force the patterns to match only whole words. This is equivalent to having \eb
|
||||
at the start and end of the pattern.
|
||||
.TP
|
||||
\fB-x\fP, \fB--line-regex\fP, \fP--line-regexp\fP
|
||||
Force the patterns to be anchored (each must start matching at the beginning of
|
||||
a line) and in addition, require them to match entire lines. This is
|
||||
equivalent to having ^ and $ characters at the start and end of each
|
||||
alternative branch in every pattern.
|
||||
.
|
||||
.
|
||||
.SH "ENVIRONMENT VARIABLES"
|
||||
.rs
|
||||
.sp
|
||||
The environment variables \fBLC_ALL\fP and \fBLC_CTYPE\fP are examined, in that
|
||||
order, for a locale. The first one that is set is used. This can be overridden
|
||||
by the \fB--locale\fP option. If no locale is set, the PCRE library's default
|
||||
(usually the "C" locale) is used.
|
||||
.
|
||||
.
|
||||
.SH "NEWLINES"
|
||||
.rs
|
||||
.sp
|
||||
The \fB-N\fP (\fB--newline\fP) option allows \fBpcregrep\fP to scan files with
|
||||
different newline conventions from the default. However, the setting of this
|
||||
option does not affect the way in which \fBpcregrep\fP writes information to
|
||||
the standard error and output streams. It uses the string "\en" in C
|
||||
\fBprintf()\fP calls to indicate newlines, relying on the C I/O library to
|
||||
convert this to an appropriate sequence if the output is sent to a file.
|
||||
.
|
||||
.
|
||||
.SH "OPTIONS COMPATIBILITY"
|
||||
.rs
|
||||
.sp
|
||||
The majority of short and long forms of \fBpcregrep\fP's options are the same
|
||||
as in the GNU \fBgrep\fP program. Any long option of the form
|
||||
\fB--xxx-regexp\fP (GNU terminology) is also available as \fB--xxx-regex\fP
|
||||
(PCRE terminology). However, the \fB--locale\fP, \fB-M\fP, \fB--multiline\fP,
|
||||
\fB-u\fP, and \fB--utf-8\fP options are specific to \fBpcregrep\fP.
|
||||
.
|
||||
.
|
||||
.SH "OPTIONS WITH DATA"
|
||||
.rs
|
||||
.sp
|
||||
There are four different ways in which an option with data can be specified.
|
||||
If a short form option is used, the data may follow immediately, or in the next
|
||||
command line item. For example:
|
||||
.sp
|
||||
-f/some/file
|
||||
-f /some/file
|
||||
.sp
|
||||
If a long form option is used, the data may appear in the same command line
|
||||
item, separated by an equals character, or (with one exception) it may appear
|
||||
in the next command line item. For example:
|
||||
.sp
|
||||
--file=/some/file
|
||||
--file /some/file
|
||||
.sp
|
||||
Note, however, that if you want to supply a file name beginning with ~ as data
|
||||
in a shell command, and have the shell expand ~ to a home directory, you must
|
||||
separate the file name from the option, because the shell does not treat ~
|
||||
specially unless it is at the start of an item.
|
||||
.P
|
||||
The exception to the above is the \fB--colour\fP (or \fB--color\fP) option,
|
||||
for which the data is optional. If this option does have data, it must be given
|
||||
in the first form, using an equals character. Otherwise it will be assumed that
|
||||
it has no data.
|
||||
.
|
||||
.
|
||||
.SH MATCHING ERRORS
|
||||
.rs
|
||||
.sp
|
||||
It is possible to supply a regular expression that takes a very long time to
|
||||
fail to match certain lines. Such patterns normally involve nested indefinite
|
||||
repeats, for example: (a+)*\ed when matched against a line of a's with no final
|
||||
digit. The PCRE matching function has a resource limit that causes it to abort
|
||||
in these circumstances. If this happens, \fBpcregrep\fP outputs an error
|
||||
message and the line that caused the problem to the standard error stream. If
|
||||
there are more than 20 such errors, \fBpcregrep\fP gives up.
|
||||
.
|
||||
.
|
||||
.SH DIAGNOSTICS
|
||||
.rs
|
||||
.sp
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found, and 2
|
||||
for syntax errors and non-existent or inacessible files (even if matches were
|
||||
found in other files) or too many matching errors. Using the \fB-s\fP option to
|
||||
suppress error messages about inaccessble files does not affect the return
|
||||
code.
|
||||
.
|
||||
.
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
Philip Hazel
|
||||
.br
|
||||
University Computing Service
|
||||
.br
|
||||
Cambridge CB2 3QG, England.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 06 June 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
399
libs/pcre/doc/pcregrep.txt
Normal file
399
libs/pcre/doc/pcregrep.txt
Normal file
@ -0,0 +1,399 @@
|
||||
PCREGREP(1) PCREGREP(1)
|
||||
|
||||
|
||||
NAME
|
||||
pcregrep - a grep with Perl-compatible regular expressions.
|
||||
|
||||
|
||||
SYNOPSIS
|
||||
pcregrep [options] [long options] [pattern] [path1 path2 ...]
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
|
||||
pcregrep searches files for character patterns, in the same way as
|
||||
other grep commands do, but it uses the PCRE regular expression library
|
||||
to support patterns that are compatible with the regular expressions of
|
||||
Perl 5. See pcrepattern for a full description of syntax and semantics
|
||||
of the regular expressions that PCRE supports.
|
||||
|
||||
Patterns, whether supplied on the command line or in a separate file,
|
||||
are given without delimiters. For example:
|
||||
|
||||
pcregrep Thursday /etc/motd
|
||||
|
||||
If you attempt to use delimiters (for example, by surrounding a pattern
|
||||
with slashes, as is common in Perl scripts), they are interpreted as
|
||||
part of the pattern. Quotes can of course be used on the command line
|
||||
because they are interpreted by the shell, and indeed they are required
|
||||
if a pattern contains white space or shell metacharacters.
|
||||
|
||||
The first argument that follows any option settings is treated as the
|
||||
single pattern to be matched when neither -e nor -f is present. Con-
|
||||
versely, when one or both of these options are used to specify pat-
|
||||
terns, all arguments are treated as path names. At least one of -e, -f,
|
||||
or an argument pattern must be provided.
|
||||
|
||||
If no files are specified, pcregrep reads the standard input. The stan-
|
||||
dard input can also be referenced by a name consisting of a single
|
||||
hyphen. For example:
|
||||
|
||||
pcregrep some-pattern /file1 - /file3
|
||||
|
||||
By default, each line that matches the pattern is copied to the stan-
|
||||
dard output, and if there is more than one file, the file name is out-
|
||||
put at the start of each line. However, there are options that can
|
||||
change how pcregrep behaves. In particular, the -M option makes it pos-
|
||||
sible to search for patterns that span line boundaries. What defines a
|
||||
line boundary is controlled by the -N (--newline) option.
|
||||
|
||||
Patterns are limited to 8K or BUFSIZ characters, whichever is the
|
||||
greater. BUFSIZ is defined in <stdio.h>.
|
||||
|
||||
If the LC_ALL or LC_CTYPE environment variable is set, pcregrep uses
|
||||
the value to set a locale when calling the PCRE library. The --locale
|
||||
option can be used to override this.
|
||||
|
||||
|
||||
OPTIONS
|
||||
|
||||
-- This terminate the list of options. It is useful if the next
|
||||
item on the command line starts with a hyphen but is not an
|
||||
option. This allows for the processing of patterns and file-
|
||||
names that start with hyphens.
|
||||
|
||||
-A number, --after-context=number
|
||||
Output number lines of context after each matching line. If
|
||||
filenames and/or line numbers are being output, a hyphen sep-
|
||||
arator is used instead of a colon for the context lines. A
|
||||
line containing "--" is output between each group of lines,
|
||||
unless they are in fact contiguous in the input file. The
|
||||
value of number is expected to be relatively small. However,
|
||||
pcregrep guarantees to have up to 8K of following text avail-
|
||||
able for context output.
|
||||
|
||||
-B number, --before-context=number
|
||||
Output number lines of context before each matching line. If
|
||||
filenames and/or line numbers are being output, a hyphen sep-
|
||||
arator is used instead of a colon for the context lines. A
|
||||
line containing "--" is output between each group of lines,
|
||||
unless they are in fact contiguous in the input file. The
|
||||
value of number is expected to be relatively small. However,
|
||||
pcregrep guarantees to have up to 8K of preceding text avail-
|
||||
able for context output.
|
||||
|
||||
-C number, --context=number
|
||||
Output number lines of context both before and after each
|
||||
matching line. This is equivalent to setting both -A and -B
|
||||
to the same value.
|
||||
|
||||
-c, --count
|
||||
Do not output individual lines; instead just output a count
|
||||
of the number of lines that would otherwise have been output.
|
||||
If several files are given, a count is output for each of
|
||||
them. In this mode, the -A, -B, and -C options are ignored.
|
||||
|
||||
--colour, --color
|
||||
If this option is given without any data, it is equivalent to
|
||||
"--colour=auto". If data is required, it must be given in
|
||||
the same shell item, separated by an equals sign.
|
||||
|
||||
--colour=value, --color=value
|
||||
This option specifies under what circumstances the part of a
|
||||
line that matched a pattern should be coloured in the output.
|
||||
The value may be "never" (the default), "always", or "auto".
|
||||
In the latter case, colouring happens only if the standard
|
||||
output is connected to a terminal. The colour can be speci-
|
||||
fied by setting the environment variable PCREGREP_COLOUR or
|
||||
PCREGREP_COLOR. The value of this variable should be a string
|
||||
of two numbers, separated by a semicolon. They are copied
|
||||
directly into the control string for setting colour on a ter-
|
||||
minal, so it is your responsibility to ensure that they make
|
||||
sense. If neither of the environment variables is set, the
|
||||
default is "1;31", which gives red.
|
||||
|
||||
-D action, --devices=action
|
||||
If an input path is not a regular file or a directory,
|
||||
"action" specifies how it is to be processed. Valid values
|
||||
are "read" (the default) or "skip" (silently skip the path).
|
||||
|
||||
-d action, --directories=action
|
||||
If an input path is a directory, "action" specifies how it is
|
||||
to be processed. Valid values are "read" (the default),
|
||||
"recurse" (equivalent to the -r option), or "skip" (silently
|
||||
skip the path). In the default case, directories are read as
|
||||
if they were ordinary files. In some operating systems the
|
||||
effect of reading a directory like this is an immediate end-
|
||||
of-file.
|
||||
|
||||
-e pattern, --regex=pattern,
|
||||
--regexp=pattern Specify a pattern to be matched. This option
|
||||
can be used multiple times in order to specify several pat-
|
||||
terns. It can also be used as a way of specifying a single
|
||||
pattern that starts with a hyphen. When -e is used, no argu-
|
||||
ment pattern is taken from the command line; all arguments
|
||||
are treated as file names. There is an overall maximum of 100
|
||||
patterns. They are applied to each line in the order in which
|
||||
they are defined until one matches (or fails to match if -v
|
||||
is used). If -f is used with -e, the command line patterns
|
||||
are matched first, followed by the patterns from the file,
|
||||
independent of the order in which these options are speci-
|
||||
fied. Note that multiple use of -e is not the same as a sin-
|
||||
gle pattern with alternatives. For example, X|Y finds the
|
||||
first character in a line that is X or Y, whereas if the two
|
||||
patterns are given separately, pcregrep finds X if it is
|
||||
present, even if it follows Y in the line. It finds Y only if
|
||||
there is no X in the line. This really matters only if you
|
||||
are using -o to show the portion of the line that matched.
|
||||
|
||||
--exclude=pattern
|
||||
When pcregrep is searching the files in a directory as a con-
|
||||
sequence of the -r (recursive search) option, any files whose
|
||||
names match the pattern are excluded. The pattern is a PCRE
|
||||
regular expression. If a file name matches both --include and
|
||||
--exclude, it is excluded. There is no short form for this
|
||||
option.
|
||||
|
||||
-F, --fixed-strings
|
||||
Interpret each pattern as a list of fixed strings, separated
|
||||
by newlines, instead of as a regular expression. The -w
|
||||
(match as a word) and -x (match whole line) options can be
|
||||
used with -F. They apply to each of the fixed strings. A line
|
||||
is selected if any of the fixed strings are found in it (sub-
|
||||
ject to -w or -x, if present).
|
||||
|
||||
-f filename, --file=filename
|
||||
Read a number of patterns from the file, one per line, and
|
||||
match them against each line of input. A data line is output
|
||||
if any of the patterns match it. The filename can be given as
|
||||
"-" to refer to the standard input. When -f is used, patterns
|
||||
specified on the command line using -e may also be present;
|
||||
they are tested before the file's patterns. However, no other
|
||||
pattern is taken from the command line; all arguments are
|
||||
treated as file names. There is an overall maximum of 100
|
||||
patterns. Trailing white space is removed from each line, and
|
||||
blank lines are ignored. An empty file contains no patterns
|
||||
and therefore matches nothing.
|
||||
|
||||
-H, --with-filename
|
||||
Force the inclusion of the filename at the start of output
|
||||
lines when searching a single file. By default, the filename
|
||||
is not shown in this case. For matching lines, the filename
|
||||
is followed by a colon and a space; for context lines, a
|
||||
hyphen separator is used. If a line number is also being out-
|
||||
put, it follows the file name without a space.
|
||||
|
||||
-h, --no-filename
|
||||
Suppress the output filenames when searching multiple files.
|
||||
By default, filenames are shown when multiple files are
|
||||
searched. For matching lines, the filename is followed by a
|
||||
colon and a space; for context lines, a hyphen separator is
|
||||
used. If a line number is also being output, it follows the
|
||||
file name without a space.
|
||||
|
||||
--help Output a brief help message and exit.
|
||||
|
||||
-i, --ignore-case
|
||||
Ignore upper/lower case distinctions during comparisons.
|
||||
|
||||
--include=pattern
|
||||
When pcregrep is searching the files in a directory as a con-
|
||||
sequence of the -r (recursive search) option, only those
|
||||
files whose names match the pattern are included. The pattern
|
||||
is a PCRE regular expression. If a file name matches both
|
||||
--include and --exclude, it is excluded. There is no short
|
||||
form for this option.
|
||||
|
||||
-L, --files-without-match
|
||||
Instead of outputting lines from the files, just output the
|
||||
names of the files that do not contain any lines that would
|
||||
have been output. Each file name is output once, on a sepa-
|
||||
rate line.
|
||||
|
||||
-l, --files-with-matches
|
||||
Instead of outputting lines from the files, just output the
|
||||
names of the files containing lines that would have been out-
|
||||
put. Each file name is output once, on a separate line.
|
||||
Searching stops as soon as a matching line is found in a
|
||||
file.
|
||||
|
||||
--label=name
|
||||
This option supplies a name to be used for the standard input
|
||||
when file names are being output. If not supplied, "(standard
|
||||
input)" is used. There is no short form for this option.
|
||||
|
||||
--locale=locale-name
|
||||
This option specifies a locale to be used for pattern match-
|
||||
ing. It overrides the value in the LC_ALL or LC_CTYPE envi-
|
||||
ronment variables. If no locale is specified, the PCRE
|
||||
library's default (usually the "C" locale) is used. There is
|
||||
no short form for this option.
|
||||
|
||||
-M, --multiline
|
||||
Allow patterns to match more than one line. When this option
|
||||
is given, patterns may usefully contain literal newline char-
|
||||
acters and internal occurrences of ^ and $ characters. The
|
||||
output for any one match may consist of more than one line.
|
||||
When this option is set, the PCRE library is called in "mul-
|
||||
tiline" mode. There is a limit to the number of lines that
|
||||
can be matched, imposed by the way that pcregrep buffers the
|
||||
input file as it scans it. However, pcregrep ensures that at
|
||||
least 8K characters or the rest of the document (whichever is
|
||||
the shorter) are available for forward matching, and simi-
|
||||
larly the previous 8K characters (or all the previous charac-
|
||||
ters, if fewer than 8K) are guaranteed to be available for
|
||||
lookbehind assertions.
|
||||
|
||||
-N newline-type, --newline=newline-type
|
||||
The PCRE library supports three different character sequences
|
||||
for indicating the ends of lines. They are the single-charac-
|
||||
ter sequences CR (carriage return) and LF (linefeed), and the
|
||||
two-character sequence CR, LF. When the library is built, a
|
||||
default line-ending sequence is specified. This is normally
|
||||
the standard sequence for the operating system. Unless other-
|
||||
wise specified by this option, pcregrep uses the default. The
|
||||
possible values for this option are CR, LF, or CRLF. This
|
||||
makes it possible to use pcregrep on files that have come
|
||||
from other environments without having to modify their line
|
||||
endings. If the data that is being scanned does not agree
|
||||
with the convention set by this option, pcregrep may behave
|
||||
in strange ways.
|
||||
|
||||
-n, --line-number
|
||||
Precede each output line by its line number in the file, fol-
|
||||
lowed by a colon and a space for matching lines or a hyphen
|
||||
and a space for context lines. If the filename is also being
|
||||
output, it precedes the line number.
|
||||
|
||||
-o, --only-matching
|
||||
Show only the part of the line that matched a pattern. In
|
||||
this mode, no context is shown. That is, the -A, -B, and -C
|
||||
options are ignored.
|
||||
|
||||
-q, --quiet
|
||||
Work quietly, that is, display nothing except error messages.
|
||||
The exit status indicates whether or not any matches were
|
||||
found.
|
||||
|
||||
-r, --recursive
|
||||
If any given path is a directory, recursively scan the files
|
||||
it contains, taking note of any --include and --exclude set-
|
||||
tings. By default, a directory is read as a normal file; in
|
||||
some operating systems this gives an immediate end-of-file.
|
||||
This option is a shorthand for setting the -d option to
|
||||
"recurse".
|
||||
|
||||
-s, --no-messages
|
||||
Suppress error messages about non-existent or unreadable
|
||||
files. Such files are quietly skipped. However, the return
|
||||
code is still 2, even if matches were found in other files.
|
||||
|
||||
-u, --utf-8
|
||||
Operate in UTF-8 mode. This option is available only if PCRE
|
||||
has been compiled with UTF-8 support. Both patterns and sub-
|
||||
ject lines must be valid strings of UTF-8 characters.
|
||||
|
||||
-V, --version
|
||||
Write the version numbers of pcregrep and the PCRE library
|
||||
that is being used to the standard error stream.
|
||||
|
||||
-v, --invert-match
|
||||
Invert the sense of the match, so that lines which do not
|
||||
match any of the patterns are the ones that are found.
|
||||
|
||||
-w, --word-regex, --word-regexp
|
||||
Force the patterns to match only whole words. This is equiva-
|
||||
lent to having \b at the start and end of the pattern.
|
||||
|
||||
-x, --line-regex, --line-regexp
|
||||
Force the patterns to be anchored (each must start matching
|
||||
at the beginning of a line) and in addition, require them to
|
||||
match entire lines. This is equivalent to having ^ and $
|
||||
characters at the start and end of each alternative branch in
|
||||
every pattern.
|
||||
|
||||
|
||||
ENVIRONMENT VARIABLES
|
||||
|
||||
The environment variables LC_ALL and LC_CTYPE are examined, in that
|
||||
order, for a locale. The first one that is set is used. This can be
|
||||
overridden by the --locale option. If no locale is set, the PCRE
|
||||
library's default (usually the "C" locale) is used.
|
||||
|
||||
|
||||
NEWLINES
|
||||
|
||||
The -N (--newline) option allows pcregrep to scan files with different
|
||||
newline conventions from the default. However, the setting of this
|
||||
option does not affect the way in which pcregrep writes information to
|
||||
the standard error and output streams. It uses the string "\n" in C
|
||||
printf() calls to indicate newlines, relying on the C I/O library to
|
||||
convert this to an appropriate sequence if the output is sent to a
|
||||
file.
|
||||
|
||||
|
||||
OPTIONS COMPATIBILITY
|
||||
|
||||
The majority of short and long forms of pcregrep's options are the same
|
||||
as in the GNU grep program. Any long option of the form --xxx-regexp
|
||||
(GNU terminology) is also available as --xxx-regex (PCRE terminology).
|
||||
However, the --locale, -M, --multiline, -u, and --utf-8 options are
|
||||
specific to pcregrep.
|
||||
|
||||
|
||||
OPTIONS WITH DATA
|
||||
|
||||
There are four different ways in which an option with data can be spec-
|
||||
ified. If a short form option is used, the data may follow immedi-
|
||||
ately, or in the next command line item. For example:
|
||||
|
||||
-f/some/file
|
||||
-f /some/file
|
||||
|
||||
If a long form option is used, the data may appear in the same command
|
||||
line item, separated by an equals character, or (with one exception) it
|
||||
may appear in the next command line item. For example:
|
||||
|
||||
--file=/some/file
|
||||
--file /some/file
|
||||
|
||||
Note, however, that if you want to supply a file name beginning with ~
|
||||
as data in a shell command, and have the shell expand ~ to a home
|
||||
directory, you must separate the file name from the option, because the
|
||||
shell does not treat ~ specially unless it is at the start of an item.
|
||||
|
||||
The exception to the above is the --colour (or --color) option, for
|
||||
which the data is optional. If this option does have data, it must be
|
||||
given in the first form, using an equals character. Otherwise it will
|
||||
be assumed that it has no data.
|
||||
|
||||
|
||||
MATCHING ERRORS
|
||||
|
||||
It is possible to supply a regular expression that takes a very long
|
||||
time to fail to match certain lines. Such patterns normally involve
|
||||
nested indefinite repeats, for example: (a+)*\d when matched against a
|
||||
line of a's with no final digit. The PCRE matching function has a
|
||||
resource limit that causes it to abort in these circumstances. If this
|
||||
happens, pcregrep outputs an error message and the line that caused the
|
||||
problem to the standard error stream. If there are more than 20 such
|
||||
errors, pcregrep gives up.
|
||||
|
||||
|
||||
DIAGNOSTICS
|
||||
|
||||
Exit status is 0 if any matches were found, 1 if no matches were found,
|
||||
and 2 for syntax errors and non-existent or inacessible files (even if
|
||||
matches were found in other files) or too many matching errors. Using
|
||||
the -s option to suppress error messages about inaccessble files does
|
||||
not affect the return code.
|
||||
|
||||
|
||||
AUTHOR
|
||||
|
||||
Philip Hazel
|
||||
University Computing Service
|
||||
Cambridge CB2 3QG, England.
|
||||
|
||||
Last updated: 06 June 2006
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
157
libs/pcre/doc/pcrematching.3
Normal file
157
libs/pcre/doc/pcrematching.3
Normal file
@ -0,0 +1,157 @@
|
||||
.TH PCREMATCHING 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE MATCHING ALGORITHMS"
|
||||
.rs
|
||||
.sp
|
||||
This document describes the two different algorithms that are available in PCRE
|
||||
for matching a compiled regular expression against a given subject string. The
|
||||
"standard" algorithm is the one provided by the \fBpcre_exec()\fP function.
|
||||
This works in the same was as Perl's matching function, and provides a
|
||||
Perl-compatible matching operation.
|
||||
.P
|
||||
An alternative algorithm is provided by the \fBpcre_dfa_exec()\fP function;
|
||||
this operates in a different way, and is not Perl-compatible. It has advantages
|
||||
and disadvantages compared with the standard algorithm, and these are described
|
||||
below.
|
||||
.P
|
||||
When there is only one possible way in which a given subject string can match a
|
||||
pattern, the two algorithms give the same answer. A difference arises, however,
|
||||
when there are multiple possibilities. For example, if the pattern
|
||||
.sp
|
||||
^<.*>
|
||||
.sp
|
||||
is matched against the string
|
||||
.sp
|
||||
<something> <something else> <something further>
|
||||
.sp
|
||||
there are three possible answers. The standard algorithm finds only one of
|
||||
them, whereas the DFA algorithm finds all three.
|
||||
.
|
||||
.SH "REGULAR EXPRESSIONS AS TREES"
|
||||
.rs
|
||||
.sp
|
||||
The set of strings that are matched by a regular expression can be represented
|
||||
as a tree structure. An unlimited repetition in the pattern makes the tree of
|
||||
infinite size, but it is still a tree. Matching the pattern to a given subject
|
||||
string (from a given starting point) can be thought of as a search of the tree.
|
||||
There are two ways to search a tree: depth-first and breadth-first, and these
|
||||
correspond to the two matching algorithms provided by PCRE.
|
||||
.
|
||||
.SH "THE STANDARD MATCHING ALGORITHM"
|
||||
.rs
|
||||
.sp
|
||||
In the terminology of Jeffrey Friedl's book \fIMastering Regular
|
||||
Expressions\fP, the standard algorithm is an "NFA algorithm". It conducts a
|
||||
depth-first search of the pattern tree. That is, it proceeds along a single
|
||||
path through the tree, checking that the subject matches what is required. When
|
||||
there is a mismatch, the algorithm tries any alternatives at the current point,
|
||||
and if they all fail, it backs up to the previous branch point in the tree, and
|
||||
tries the next alternative branch at that level. This often involves backing up
|
||||
(moving to the left) in the subject string as well. The order in which
|
||||
repetition branches are tried is controlled by the greedy or ungreedy nature of
|
||||
the quantifier.
|
||||
.P
|
||||
If a leaf node is reached, a matching string has been found, and at that point
|
||||
the algorithm stops. Thus, if there is more than one possible match, this
|
||||
algorithm returns the first one that it finds. Whether this is the shortest,
|
||||
the longest, or some intermediate length depends on the way the greedy and
|
||||
ungreedy repetition quantifiers are specified in the pattern.
|
||||
.P
|
||||
Because it ends up with a single path through the tree, it is relatively
|
||||
straightforward for this algorithm to keep track of the substrings that are
|
||||
matched by portions of the pattern in parentheses. This provides support for
|
||||
capturing parentheses and back references.
|
||||
.
|
||||
.SH "THE DFA MATCHING ALGORITHM"
|
||||
.rs
|
||||
.sp
|
||||
DFA stands for "deterministic finite automaton", but you do not need to
|
||||
understand the origins of that name. This algorithm conducts a breadth-first
|
||||
search of the tree. Starting from the first matching point in the subject, it
|
||||
scans the subject string from left to right, once, character by character, and
|
||||
as it does this, it remembers all the paths through the tree that represent
|
||||
valid matches.
|
||||
.P
|
||||
The scan continues until either the end of the subject is reached, or there are
|
||||
no more unterminated paths. At this point, terminated paths represent the
|
||||
different matching possibilities (if there are none, the match has failed).
|
||||
Thus, if there is more than one possible match, this algorithm finds all of
|
||||
them, and in particular, it finds the longest. In PCRE, there is an option to
|
||||
stop the algorithm after the first match (which is necessarily the shortest)
|
||||
has been found.
|
||||
.P
|
||||
Note that all the matches that are found start at the same point in the
|
||||
subject. If the pattern
|
||||
.sp
|
||||
cat(er(pillar)?)
|
||||
.sp
|
||||
is matched against the string "the caterpillar catchment", the result will be
|
||||
the three strings "cat", "cater", and "caterpillar" that start at the fourth
|
||||
character of the subject. The algorithm does not automatically move on to find
|
||||
matches that start at later positions.
|
||||
.P
|
||||
There are a number of features of PCRE regular expressions that are not
|
||||
supported by the DFA matching algorithm. They are as follows:
|
||||
.P
|
||||
1. Because the algorithm finds all possible matches, the greedy or ungreedy
|
||||
nature of repetition quantifiers is not relevant. Greedy and ungreedy
|
||||
quantifiers are treated in exactly the same way.
|
||||
.P
|
||||
2. When dealing with multiple paths through the tree simultaneously, it is not
|
||||
straightforward to keep track of captured substrings for the different matching
|
||||
possibilities, and PCRE's implementation of this algorithm does not attempt to
|
||||
do this. This means that no captured substrings are available.
|
||||
.P
|
||||
3. Because no substrings are captured, back references within the pattern are
|
||||
not supported, and cause errors if encountered.
|
||||
.P
|
||||
4. For the same reason, conditional expressions that use a backreference as the
|
||||
condition are not supported.
|
||||
.P
|
||||
5. Callouts are supported, but the value of the \fIcapture_top\fP field is
|
||||
always 1, and the value of the \fIcapture_last\fP field is always -1.
|
||||
.P
|
||||
6.
|
||||
The \eC escape sequence, which (in the standard algorithm) matches a single
|
||||
byte, even in UTF-8 mode, is not supported because the DFA algorithm moves
|
||||
through the subject string one character at a time, for all active paths
|
||||
through the tree.
|
||||
.
|
||||
.SH "ADVANTAGES OF THE DFA ALGORITHM"
|
||||
.rs
|
||||
.sp
|
||||
Using the DFA matching algorithm provides the following advantages:
|
||||
.P
|
||||
1. All possible matches (at a single point in the subject) are automatically
|
||||
found, and in particular, the longest match is found. To find more than one
|
||||
match using the standard algorithm, you have to do kludgy things with
|
||||
callouts.
|
||||
.P
|
||||
2. There is much better support for partial matching. The restrictions on the
|
||||
content of the pattern that apply when using the standard algorithm for partial
|
||||
matching do not apply to the DFA algorithm. For non-anchored patterns, the
|
||||
starting position of a partial match is available.
|
||||
.P
|
||||
3. Because the DFA algorithm scans the subject string just once, and never
|
||||
needs to backtrack, it is possible to pass very long subject strings to the
|
||||
matching function in several pieces, checking for partial matching each time.
|
||||
.
|
||||
.SH "DISADVANTAGES OF THE DFA ALGORITHM"
|
||||
.rs
|
||||
.sp
|
||||
The DFA algorithm suffers from a number of disadvantages:
|
||||
.P
|
||||
1. It is substantially slower than the standard algorithm. This is partly
|
||||
because it has to search for all possible matches, but is also because it is
|
||||
less susceptible to optimization.
|
||||
.P
|
||||
2. Capturing parentheses and back references are not supported.
|
||||
.P
|
||||
3. The "atomic group" feature of PCRE regular expressions is supported, but
|
||||
does not provide the advantage that it does for the standard algorithm.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 06 June 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
203
libs/pcre/doc/pcrepartial.3
Normal file
203
libs/pcre/doc/pcrepartial.3
Normal file
@ -0,0 +1,203 @@
|
||||
.TH PCREPARTIAL 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PARTIAL MATCHING IN PCRE"
|
||||
.rs
|
||||
.sp
|
||||
In normal use of PCRE, if the subject string that is passed to
|
||||
\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP matches as far as it goes, but is
|
||||
too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There
|
||||
are circumstances where it might be helpful to distinguish this case from other
|
||||
cases in which there is no match.
|
||||
.P
|
||||
Consider, for example, an application where a human is required to type in data
|
||||
for a field with specific formatting requirements. An example might be a date
|
||||
in the form \fIddmmmyy\fP, defined by this pattern:
|
||||
.sp
|
||||
^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$
|
||||
.sp
|
||||
If the application sees the user's keystrokes one by one, and can check that
|
||||
what has been typed so far is potentially valid, it is able to raise an error
|
||||
as soon as a mistake is made, possibly beeping and not reflecting the
|
||||
character that has been typed. This immediate feedback is likely to be a better
|
||||
user interface than a check that is delayed until the entire string has been
|
||||
entered.
|
||||
.P
|
||||
PCRE supports the concept of partial matching by means of the PCRE_PARTIAL
|
||||
option, which can be set when calling \fBpcre_exec()\fP or
|
||||
\fBpcre_dfa_exec()\fP. When this flag is set for \fBpcre_exec()\fP, the return
|
||||
code PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if at any time
|
||||
during the matching process the last part of the subject string matched part of
|
||||
the pattern. Unfortunately, for non-anchored matching, it is not possible to
|
||||
obtain the position of the start of the partial match. No captured data is set
|
||||
when PCRE_ERROR_PARTIAL is returned.
|
||||
.P
|
||||
When PCRE_PARTIAL is set for \fBpcre_dfa_exec()\fP, the return code
|
||||
PCRE_ERROR_NOMATCH is converted into PCRE_ERROR_PARTIAL if the end of the
|
||||
subject is reached, there have been no complete matches, but there is still at
|
||||
least one matching possibility. The portion of the string that provided the
|
||||
partial match is set as the first matching string.
|
||||
.P
|
||||
Using PCRE_PARTIAL disables one of PCRE's optimizations. PCRE remembers the
|
||||
last literal byte in a pattern, and abandons matching immediately if such a
|
||||
byte is not present in the subject string. This optimization cannot be used
|
||||
for a subject string that might match only partially.
|
||||
.
|
||||
.
|
||||
.SH "RESTRICTED PATTERNS FOR PCRE_PARTIAL"
|
||||
.rs
|
||||
.sp
|
||||
Because of the way certain internal optimizations are implemented in the
|
||||
\fBpcre_exec()\fP function, the PCRE_PARTIAL option cannot be used with all
|
||||
patterns. These restrictions do not apply when \fBpcre_dfa_exec()\fP is used.
|
||||
For \fBpcre_exec()\fP, repeated single characters such as
|
||||
.sp
|
||||
a{2,4}
|
||||
.sp
|
||||
and repeated single metasequences such as
|
||||
.sp
|
||||
\ed+
|
||||
.sp
|
||||
are not permitted if the maximum number of occurrences is greater than one.
|
||||
Optional items such as \ed? (where the maximum is one) are permitted.
|
||||
Quantifiers with any values are permitted after parentheses, so the invalid
|
||||
examples above can be coded thus:
|
||||
.sp
|
||||
(a){2,4}
|
||||
(\ed)+
|
||||
.sp
|
||||
These constructions run more slowly, but for the kinds of application that are
|
||||
envisaged for this facility, this is not felt to be a major restriction.
|
||||
.P
|
||||
If PCRE_PARTIAL is set for a pattern that does not conform to the restrictions,
|
||||
\fBpcre_exec()\fP returns the error code PCRE_ERROR_BADPARTIAL (-13).
|
||||
.
|
||||
.
|
||||
.SH "EXAMPLE OF PARTIAL MATCHING USING PCRETEST"
|
||||
.rs
|
||||
.sp
|
||||
If the escape sequence \eP is present in a \fBpcretest\fP data line, the
|
||||
PCRE_PARTIAL flag is used for the match. Here is a run of \fBpcretest\fP that
|
||||
uses the date example quoted above:
|
||||
.sp
|
||||
re> /^\ed?\ed(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\ed\ed$/
|
||||
data> 25jun04\eP
|
||||
0: 25jun04
|
||||
1: jun
|
||||
data> 25dec3\eP
|
||||
Partial match
|
||||
data> 3ju\eP
|
||||
Partial match
|
||||
data> 3juj\eP
|
||||
No match
|
||||
data> j\eP
|
||||
No match
|
||||
.sp
|
||||
The first data string is matched completely, so \fBpcretest\fP shows the
|
||||
matched substrings. The remaining four strings do not match the complete
|
||||
pattern, but the first two are partial matches. The same test, using DFA
|
||||
matching (by means of the \eD escape sequence), produces the following output:
|
||||
.sp
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
data> 25jun04\eP\eD
|
||||
0: 25jun04
|
||||
data> 23dec3\eP\eD
|
||||
Partial match: 23dec3
|
||||
data> 3ju\eP\eD
|
||||
Partial match: 3ju
|
||||
data> 3juj\eP\eD
|
||||
No match
|
||||
data> j\eP\eD
|
||||
No match
|
||||
.sp
|
||||
Notice that in this case the portion of the string that was matched is made
|
||||
available.
|
||||
.
|
||||
.
|
||||
.SH "MULTI-SEGMENT MATCHING WITH pcre_dfa_exec()"
|
||||
.rs
|
||||
.sp
|
||||
When a partial match has been found using \fBpcre_dfa_exec()\fP, it is possible
|
||||
to continue the match by providing additional subject data and calling
|
||||
\fBpcre_dfa_exec()\fP again with the PCRE_DFA_RESTART option and the same
|
||||
working space (where details of the previous partial match are stored). Here is
|
||||
an example using \fBpcretest\fP, where the \eR escape sequence sets the
|
||||
PCRE_DFA_RESTART option and the \eD escape sequence requests the use of
|
||||
\fBpcre_dfa_exec()\fP:
|
||||
.sp
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
data> 23ja\eP\eD
|
||||
Partial match: 23ja
|
||||
data> n05\eR\eD
|
||||
0: n05
|
||||
.sp
|
||||
The first call has "23ja" as the subject, and requests partial matching; the
|
||||
second call has "n05" as the subject for the continued (restarted) match.
|
||||
Notice that when the match is complete, only the last part is shown; PCRE does
|
||||
not retain the previously partially-matched string. It is up to the calling
|
||||
program to do that if it needs to.
|
||||
.P
|
||||
This facility can be used to pass very long subject strings to
|
||||
\fBpcre_dfa_exec()\fP. However, some care is needed for certain types of
|
||||
pattern.
|
||||
.P
|
||||
1. If the pattern contains tests for the beginning or end of a line, you need
|
||||
to pass the PCRE_NOTBOL or PCRE_NOTEOL options, as appropriate, when the
|
||||
subject string for any call does not contain the beginning or end of a line.
|
||||
.P
|
||||
2. If the pattern contains backward assertions (including \eb or \eB), you need
|
||||
to arrange for some overlap in the subject strings to allow for this. For
|
||||
example, you could pass the subject in chunks that were 500 bytes long, but in
|
||||
a buffer of 700 bytes, with the starting offset set to 200 and the previous 200
|
||||
bytes at the start of the buffer.
|
||||
.P
|
||||
3. Matching a subject string that is split into multiple segments does not
|
||||
always produce exactly the same result as matching over one single long string.
|
||||
The difference arises when there are multiple matching possibilities, because a
|
||||
partial match result is given only when there are no completed matches in a
|
||||
call to fBpcre_dfa_exec()\fP. This means that as soon as the shortest match has
|
||||
been found, continuation to a new subject segment is no longer possible.
|
||||
Consider this \fBpcretest\fP example:
|
||||
.sp
|
||||
re> /dog(sbody)?/
|
||||
data> do\eP\eD
|
||||
Partial match: do
|
||||
data> gsb\eR\eP\eD
|
||||
0: g
|
||||
data> dogsbody\eD
|
||||
0: dogsbody
|
||||
1: dog
|
||||
.sp
|
||||
The pattern matches the words "dog" or "dogsbody". When the subject is
|
||||
presented in several parts ("do" and "gsb" being the first two) the match stops
|
||||
when "dog" has been found, and it is not possible to continue. On the other
|
||||
hand, if "dogsbody" is presented as a single string, both matches are found.
|
||||
.P
|
||||
Because of this phenomenon, it does not usually make sense to end a pattern
|
||||
that is going to be matched in this way with a variable repeat.
|
||||
.P
|
||||
4. Patterns that contain alternatives at the top level which do not all
|
||||
start with the same pattern item may not work as expected. For example,
|
||||
consider this pattern:
|
||||
.sp
|
||||
1234|3789
|
||||
.sp
|
||||
If the first part of the subject is "ABC123", a partial match of the first
|
||||
alternative is found at offset 3. There is no partial match for the second
|
||||
alternative, because such a match does not start at the same point in the
|
||||
subject string. Attempting to continue with the string "789" does not yield a
|
||||
match because only those alternatives that match at one point in the subject
|
||||
are remembered. The problem arises because the start of the second alternative
|
||||
matches within the first alternative. There is no problem with anchored
|
||||
patterns or patterns such as:
|
||||
.sp
|
||||
1234|ABCD
|
||||
.sp
|
||||
where no string can be a partial match for both alternatives.
|
||||
.
|
||||
.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 16 January 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
1645
libs/pcre/doc/pcrepattern.3
Normal file
1645
libs/pcre/doc/pcrepattern.3
Normal file
File diff suppressed because it is too large
Load Diff
76
libs/pcre/doc/pcreperform.3
Normal file
76
libs/pcre/doc/pcreperform.3
Normal file
@ -0,0 +1,76 @@
|
||||
.TH PCREPERFORM 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE PERFORMANCE"
|
||||
.rs
|
||||
.sp
|
||||
Certain items that may appear in regular expression patterns are more efficient
|
||||
than others. It is more efficient to use a character class like [aeiou] than a
|
||||
set of alternatives such as (a|e|i|o|u). In general, the simplest construction
|
||||
that provides the required behaviour is usually the most efficient. Jeffrey
|
||||
Friedl's book contains a lot of useful general discussion about optimizing
|
||||
regular expressions for efficient performance. This document contains a few
|
||||
observations about PCRE.
|
||||
.P
|
||||
Using Unicode character properties (the \ep, \eP, and \eX escapes) is slow,
|
||||
because PCRE has to scan a structure that contains data for over fifteen
|
||||
thousand characters whenever it needs a character's property. If you can find
|
||||
an alternative pattern that does not use character properties, it will probably
|
||||
be faster.
|
||||
.P
|
||||
When a pattern begins with .* not in parentheses, or in parentheses that are
|
||||
not the subject of a backreference, and the PCRE_DOTALL option is set, the
|
||||
pattern is implicitly anchored by PCRE, since it can match only at the start of
|
||||
a subject string. However, if PCRE_DOTALL is not set, PCRE cannot make this
|
||||
optimization, because the . metacharacter does not then match a newline, and if
|
||||
the subject string contains newlines, the pattern may match from the character
|
||||
immediately following one of them instead of from the very start. For example,
|
||||
the pattern
|
||||
.sp
|
||||
.*second
|
||||
.sp
|
||||
matches the subject "first\enand second" (where \en stands for a newline
|
||||
character), with the match starting at the seventh character. In order to do
|
||||
this, PCRE has to retry the match starting after every newline in the subject.
|
||||
.P
|
||||
If you are using such a pattern with subject strings that do not contain
|
||||
newlines, the best performance is obtained by setting PCRE_DOTALL, or starting
|
||||
the pattern with ^.* or ^.*? to indicate explicit anchoring. That saves PCRE
|
||||
from having to scan along the subject looking for a newline to restart at.
|
||||
.P
|
||||
Beware of patterns that contain nested indefinite repeats. These can take a
|
||||
long time to run when applied to a string that does not match. Consider the
|
||||
pattern fragment
|
||||
.sp
|
||||
(a+)*
|
||||
.sp
|
||||
This can match "aaaa" in 33 different ways, and this number increases very
|
||||
rapidly as the string gets longer. (The * repeat can match 0, 1, 2, 3, or 4
|
||||
times, and for each of those cases other than 0, the + repeats can match
|
||||
different numbers of times.) When the remainder of the pattern is such that the
|
||||
entire match is going to fail, PCRE has in principle to try every possible
|
||||
variation, and this can take an extremely long time.
|
||||
.P
|
||||
An optimization catches some of the more simple cases such as
|
||||
.sp
|
||||
(a+)*b
|
||||
.sp
|
||||
where a literal character follows. Before embarking on the standard matching
|
||||
procedure, PCRE checks that there is a "b" later in the subject string, and if
|
||||
there is not, it fails the match immediately. However, when there is no
|
||||
following literal this optimization cannot be used. You can see the difference
|
||||
by comparing the behaviour of
|
||||
.sp
|
||||
(a+)*\ed
|
||||
.sp
|
||||
with the pattern above. The former gives a failure almost instantly when
|
||||
applied to a whole line of "a" characters, whereas the latter takes an
|
||||
appreciable time with strings longer than about 20 characters.
|
||||
.P
|
||||
In many cases, the solution to this kind of performance issue is to use an
|
||||
atomic group or a possessive quantifier.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 28 February 2005
|
||||
.br
|
||||
Copyright (c) 1997-2005 University of Cambridge.
|
226
libs/pcre/doc/pcreposix.3
Normal file
226
libs/pcre/doc/pcreposix.3
Normal file
@ -0,0 +1,226 @@
|
||||
.TH PCREPOSIX 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions.
|
||||
.SH "SYNOPSIS OF POSIX API"
|
||||
.rs
|
||||
.sp
|
||||
.B #include <pcreposix.h>
|
||||
.PP
|
||||
.SM
|
||||
.br
|
||||
.B int regcomp(regex_t *\fIpreg\fP, const char *\fIpattern\fP,
|
||||
.ti +5n
|
||||
.B int \fIcflags\fP);
|
||||
.PP
|
||||
.br
|
||||
.B int regexec(regex_t *\fIpreg\fP, const char *\fIstring\fP,
|
||||
.ti +5n
|
||||
.B size_t \fInmatch\fP, regmatch_t \fIpmatch\fP[], int \fIeflags\fP);
|
||||
.PP
|
||||
.br
|
||||
.B size_t regerror(int \fIerrcode\fP, const regex_t *\fIpreg\fP,
|
||||
.ti +5n
|
||||
.B char *\fIerrbuf\fP, size_t \fIerrbuf_size\fP);
|
||||
.PP
|
||||
.br
|
||||
.B void regfree(regex_t *\fIpreg\fP);
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
This set of functions provides a POSIX-style API to the PCRE regular expression
|
||||
package. See the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation for a description of PCRE's native API, which contains much
|
||||
additional functionality.
|
||||
.P
|
||||
The functions described here are just wrapper functions that ultimately call
|
||||
the PCRE native API. Their prototypes are defined in the \fBpcreposix.h\fP
|
||||
header file, and on Unix systems the library itself is called
|
||||
\fBpcreposix.a\fP, so can be accessed by adding \fB-lpcreposix\fP to the
|
||||
command for linking an application that uses them. Because the POSIX functions
|
||||
call the native ones, it is also necessary to add \fB-lpcre\fP.
|
||||
.P
|
||||
I have implemented only those option bits that can be reasonably mapped to PCRE
|
||||
native options. In addition, the option REG_EXTENDED is defined with the value
|
||||
zero. This has no effect, but since programs that are written to the POSIX
|
||||
interface often use it, this makes it easier to slot in PCRE as a replacement
|
||||
library. Other POSIX options are not even defined.
|
||||
.P
|
||||
When PCRE is called via these functions, it is only the API that is POSIX-like
|
||||
in style. The syntax and semantics of the regular expressions themselves are
|
||||
still those of Perl, subject to the setting of various PCRE options, as
|
||||
described below. "POSIX-like in style" means that the API approximates to the
|
||||
POSIX definition; it is not fully POSIX-compatible, and in multi-byte encoding
|
||||
domains it is probably even less compatible.
|
||||
.P
|
||||
The header for these functions is supplied as \fBpcreposix.h\fP to avoid any
|
||||
potential clash with other POSIX libraries. It can, of course, be renamed or
|
||||
aliased as \fBregex.h\fP, which is the "correct" name. It provides two
|
||||
structure types, \fIregex_t\fP for compiled internal forms, and
|
||||
\fIregmatch_t\fP for returning captured substrings. It also defines some
|
||||
constants whose names start with "REG_"; these are used for setting options and
|
||||
identifying error codes.
|
||||
.P
|
||||
.SH "COMPILING A PATTERN"
|
||||
.rs
|
||||
.sp
|
||||
The function \fBregcomp()\fP is called to compile a pattern into an
|
||||
internal form. The pattern is a C string terminated by a binary zero, and
|
||||
is passed in the argument \fIpattern\fP. The \fIpreg\fP argument is a pointer
|
||||
to a \fBregex_t\fP structure that is used as a base for storing information
|
||||
about the compiled regular expression.
|
||||
.P
|
||||
The argument \fIcflags\fP is either zero, or contains one or more of the bits
|
||||
defined by the following macros:
|
||||
.sp
|
||||
REG_DOTALL
|
||||
.sp
|
||||
The PCRE_DOTALL option is set when the regular expression is passed for
|
||||
compilation to the native function. Note that REG_DOTALL is not part of the
|
||||
POSIX standard.
|
||||
.sp
|
||||
REG_ICASE
|
||||
.sp
|
||||
The PCRE_CASELESS option is set when the regular expression is passed for
|
||||
compilation to the native function.
|
||||
.sp
|
||||
REG_NEWLINE
|
||||
.sp
|
||||
The PCRE_MULTILINE option is set when the regular expression is passed for
|
||||
compilation to the native function. Note that this does \fInot\fP mimic the
|
||||
defined POSIX behaviour for REG_NEWLINE (see the following section).
|
||||
.sp
|
||||
REG_NOSUB
|
||||
.sp
|
||||
The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
|
||||
for compilation to the native function. In addition, when a pattern that is
|
||||
compiled with this flag is passed to \fBregexec()\fP for matching, the
|
||||
\fInmatch\fP and \fIpmatch\fP arguments are ignored, and no captured strings
|
||||
are returned.
|
||||
.sp
|
||||
REG_UTF8
|
||||
.sp
|
||||
The PCRE_UTF8 option is set when the regular expression is passed for
|
||||
compilation to the native function. This causes the pattern itself and all data
|
||||
strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
|
||||
is not part of the POSIX standard.
|
||||
.P
|
||||
In the absence of these flags, no options are passed to the native function.
|
||||
This means the the regex is compiled with PCRE default semantics. In
|
||||
particular, the way it handles newline characters in the subject string is the
|
||||
Perl way, not the POSIX way. Note that setting PCRE_MULTILINE has only
|
||||
\fIsome\fP of the effects specified for REG_NEWLINE. It does not affect the way
|
||||
newlines are matched by . (they aren't) or by a negative class such as [^a]
|
||||
(they are).
|
||||
.P
|
||||
The yield of \fBregcomp()\fP is zero on success, and non-zero otherwise. The
|
||||
\fIpreg\fP structure is filled in on success, and one member of the structure
|
||||
is public: \fIre_nsub\fP contains the number of capturing subpatterns in
|
||||
the regular expression. Various error codes are defined in the header file.
|
||||
.
|
||||
.
|
||||
.SH "MATCHING NEWLINE CHARACTERS"
|
||||
.rs
|
||||
.sp
|
||||
This area is not simple, because POSIX and Perl take different views of things.
|
||||
It is not possible to get PCRE to obey POSIX semantics, but then PCRE was never
|
||||
intended to be a POSIX engine. The following table lists the different
|
||||
possibilities for matching newline characters in PCRE:
|
||||
.sp
|
||||
Default Change with
|
||||
.sp
|
||||
. matches newline no PCRE_DOTALL
|
||||
newline matches [^a] yes not changeable
|
||||
$ matches \en at end yes PCRE_DOLLARENDONLY
|
||||
$ matches \en in middle no PCRE_MULTILINE
|
||||
^ matches \en in middle no PCRE_MULTILINE
|
||||
.sp
|
||||
This is the equivalent table for POSIX:
|
||||
.sp
|
||||
Default Change with
|
||||
.sp
|
||||
. matches newline yes REG_NEWLINE
|
||||
newline matches [^a] yes REG_NEWLINE
|
||||
$ matches \en at end no REG_NEWLINE
|
||||
$ matches \en in middle no REG_NEWLINE
|
||||
^ matches \en in middle no REG_NEWLINE
|
||||
.sp
|
||||
PCRE's behaviour is the same as Perl's, except that there is no equivalent for
|
||||
PCRE_DOLLAR_ENDONLY in Perl. In both PCRE and Perl, there is no way to stop
|
||||
newline from matching [^a].
|
||||
.P
|
||||
The default POSIX newline handling can be obtained by setting PCRE_DOTALL and
|
||||
PCRE_DOLLAR_ENDONLY, but there is no way to make PCRE behave exactly as for the
|
||||
REG_NEWLINE action.
|
||||
.
|
||||
.
|
||||
.SH "MATCHING A PATTERN"
|
||||
.rs
|
||||
.sp
|
||||
The function \fBregexec()\fP is called to match a compiled pattern \fIpreg\fP
|
||||
against a given \fIstring\fP, which is terminated by a zero byte, subject to
|
||||
the options in \fIeflags\fP. These can be:
|
||||
.sp
|
||||
REG_NOTBOL
|
||||
.sp
|
||||
The PCRE_NOTBOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
.sp
|
||||
REG_NOTEOL
|
||||
.sp
|
||||
The PCRE_NOTEOL option is set when calling the underlying PCRE matching
|
||||
function.
|
||||
.P
|
||||
If the pattern was compiled with the REG_NOSUB flag, no data about any matched
|
||||
strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of
|
||||
\fBregexec()\fP are ignored.
|
||||
.P
|
||||
Otherwise,the portion of the string that was matched, and also any captured
|
||||
substrings, are returned via the \fIpmatch\fP argument, which points to an
|
||||
array of \fInmatch\fP structures of type \fIregmatch_t\fP, containing the
|
||||
members \fIrm_so\fP and \fIrm_eo\fP. These contain the offset to the first
|
||||
character of each substring and the offset to the first character after the end
|
||||
of each substring, respectively. The 0th element of the vector relates to the
|
||||
entire portion of \fIstring\fP that was matched; subsequent elements relate to
|
||||
the capturing subpatterns of the regular expression. Unused entries in the
|
||||
array have both structure members set to -1.
|
||||
.P
|
||||
A successful match yields a zero return; various error codes are defined in the
|
||||
header file, of which REG_NOMATCH is the "expected" failure code.
|
||||
.
|
||||
.
|
||||
.SH "ERROR MESSAGES"
|
||||
.rs
|
||||
.sp
|
||||
The \fBregerror()\fP function maps a non-zero errorcode from either
|
||||
\fBregcomp()\fP or \fBregexec()\fP to a printable message. If \fIpreg\fP is not
|
||||
NULL, the error should have arisen from the use of that structure. A message
|
||||
terminated by a binary zero is placed in \fIerrbuf\fP. The length of the
|
||||
message, including the zero, is limited to \fIerrbuf_size\fP. The yield of the
|
||||
function is the size of buffer needed to hold the whole message.
|
||||
.
|
||||
.
|
||||
.SH MEMORY USAGE
|
||||
.rs
|
||||
.sp
|
||||
Compiling a regular expression causes memory to be allocated and associated
|
||||
with the \fIpreg\fP structure. The function \fBregfree()\fP frees all such
|
||||
memory, after which \fIpreg\fP may no longer be used as a compiled expression.
|
||||
.
|
||||
.
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
Philip Hazel
|
||||
.br
|
||||
University Computing Service,
|
||||
.br
|
||||
Cambridge CB2 3QG, England.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 16 January 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
131
libs/pcre/doc/pcreprecompile.3
Normal file
131
libs/pcre/doc/pcreprecompile.3
Normal file
@ -0,0 +1,131 @@
|
||||
.TH PCREPRECOMPILE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "SAVING AND RE-USING PRECOMPILED PCRE PATTERNS"
|
||||
.rs
|
||||
.sp
|
||||
If you are running an application that uses a large number of regular
|
||||
expression patterns, it may be useful to store them in a precompiled form
|
||||
instead of having to compile them every time the application is run.
|
||||
If you are not using any private character tables (see the
|
||||
.\" HREF
|
||||
\fBpcre_maketables()\fP
|
||||
.\"
|
||||
documentation), this is relatively straightforward. If you are using private
|
||||
tables, it is a little bit more complicated.
|
||||
.P
|
||||
If you save compiled patterns to a file, you can copy them to a different host
|
||||
and run them there. This works even if the new host has the opposite endianness
|
||||
to the one on which the patterns were compiled. There may be a small
|
||||
performance penalty, but it should be insignificant.
|
||||
.
|
||||
.
|
||||
.SH "SAVING A COMPILED PATTERN"
|
||||
.rs
|
||||
.sh
|
||||
The value returned by \fBpcre_compile()\fP points to a single block of memory
|
||||
that holds the compiled pattern and associated data. You can find the length of
|
||||
this block in bytes by calling \fBpcre_fullinfo()\fP with an argument of
|
||||
PCRE_INFO_SIZE. You can then save the data in any appropriate manner. Here is
|
||||
sample code that compiles a pattern and writes it to a file. It assumes that
|
||||
the variable \fIfd\fP refers to a file that is open for output:
|
||||
.sp
|
||||
int erroroffset, rc, size;
|
||||
char *error;
|
||||
pcre *re;
|
||||
.sp
|
||||
re = pcre_compile("my pattern", 0, &error, &erroroffset, NULL);
|
||||
if (re == NULL) { ... handle errors ... }
|
||||
rc = pcre_fullinfo(re, NULL, PCRE_INFO_SIZE, &size);
|
||||
if (rc < 0) { ... handle errors ... }
|
||||
rc = fwrite(re, 1, size, fd);
|
||||
if (rc != size) { ... handle errors ... }
|
||||
.sp
|
||||
In this example, the bytes that comprise the compiled pattern are copied
|
||||
exactly. Note that this is binary data that may contain any of the 256 possible
|
||||
byte values. On systems that make a distinction between binary and non-binary
|
||||
data, be sure that the file is opened for binary output.
|
||||
.P
|
||||
If you want to write more than one pattern to a file, you will have to devise a
|
||||
way of separating them. For binary data, preceding each pattern with its length
|
||||
is probably the most straightforward approach. Another possibility is to write
|
||||
out the data in hexadecimal instead of binary, one pattern to a line.
|
||||
.P
|
||||
Saving compiled patterns in a file is only one possible way of storing them for
|
||||
later use. They could equally well be saved in a database, or in the memory of
|
||||
some daemon process that passes them via sockets to the processes that want
|
||||
them.
|
||||
.P
|
||||
If the pattern has been studied, it is also possible to save the study data in
|
||||
a similar way to the compiled pattern itself. When studying generates
|
||||
additional information, \fBpcre_study()\fP returns a pointer to a
|
||||
\fBpcre_extra\fP data block. Its format is defined in the
|
||||
.\" HTML <a href="pcreapi.html#extradata">
|
||||
.\" </a>
|
||||
section on matching a pattern
|
||||
.\"
|
||||
in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation. The \fIstudy_data\fP field points to the binary study data, and
|
||||
this is what you must save (not the \fBpcre_extra\fP block itself). The length
|
||||
of the study data can be obtained by calling \fBpcre_fullinfo()\fP with an
|
||||
argument of PCRE_INFO_STUDYSIZE. Remember to check that \fBpcre_study()\fP did
|
||||
return a non-NULL value before trying to save the study data.
|
||||
.
|
||||
.
|
||||
.SH "RE-USING A PRECOMPILED PATTERN"
|
||||
.rs
|
||||
.sp
|
||||
Re-using a precompiled pattern is straightforward. Having reloaded it into main
|
||||
memory, you pass its pointer to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP in
|
||||
the usual way. This should work even on another host, and even if that host has
|
||||
the opposite endianness to the one where the pattern was compiled.
|
||||
.P
|
||||
However, if you passed a pointer to custom character tables when the pattern
|
||||
was compiled (the \fItableptr\fP argument of \fBpcre_compile()\fP), you must
|
||||
now pass a similar pointer to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP,
|
||||
because the value saved with the compiled pattern will obviously be nonsense. A
|
||||
field in a \fBpcre_extra()\fP block is used to pass this data, as described in
|
||||
the
|
||||
.\" HTML <a href="pcreapi.html#extradata">
|
||||
.\" </a>
|
||||
section on matching a pattern
|
||||
.\"
|
||||
in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
If you did not provide custom character tables when the pattern was compiled,
|
||||
the pointer in the compiled pattern is NULL, which causes \fBpcre_exec()\fP to
|
||||
use PCRE's internal tables. Thus, you do not need to take any special action at
|
||||
run time in this case.
|
||||
.P
|
||||
If you saved study data with the compiled pattern, you need to create your own
|
||||
\fBpcre_extra\fP data block and set the \fIstudy_data\fP field to point to the
|
||||
reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in the
|
||||
\fIflags\fP field to indicate that study data is present. Then pass the
|
||||
\fBpcre_extra\fP block to \fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP in the
|
||||
usual way.
|
||||
.
|
||||
.
|
||||
.SH "COMPATIBILITY WITH DIFFERENT PCRE RELEASES"
|
||||
.rs
|
||||
.sp
|
||||
The layout of the control block that is at the start of the data that makes up
|
||||
a compiled pattern was changed for release 5.0. If you have any saved patterns
|
||||
that were compiled with previous releases (not a facility that was previously
|
||||
advertised), you will have to recompile them for release 5.0. However, from now
|
||||
on, it should be possible to make changes in a compatible manner.
|
||||
.P
|
||||
Notwithstanding the above, if you have any saved patterns in UTF-8 mode that
|
||||
use \ep or \eP that were compiled with any release up to and including 6.4, you
|
||||
will have to recompile them for release 6.5 and above.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 01 February 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
66
libs/pcre/doc/pcresample.3
Normal file
66
libs/pcre/doc/pcresample.3
Normal file
@ -0,0 +1,66 @@
|
||||
.TH PCRESAMPLE 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE SAMPLE PROGRAM"
|
||||
.rs
|
||||
.sp
|
||||
A simple, complete demonstration program, to get you started with using PCRE,
|
||||
is supplied in the file \fIpcredemo.c\fP in the PCRE distribution.
|
||||
.P
|
||||
The program compiles the regular expression that is its first argument, and
|
||||
matches it against the subject string in its second argument. No PCRE options
|
||||
are set, and default character tables are used. If matching succeeds, the
|
||||
program outputs the portion of the subject that matched, together with the
|
||||
contents of any captured substrings.
|
||||
.P
|
||||
If the -g option is given on the command line, the program then goes on to
|
||||
check for further matches of the same regular expression in the same subject
|
||||
string. The logic is a little bit tricky because of the possibility of matching
|
||||
an empty string. Comments in the code explain what is going on.
|
||||
.P
|
||||
If PCRE is installed in the standard include and library directories for your
|
||||
system, you should be able to compile the demonstration program using this
|
||||
command:
|
||||
.sp
|
||||
gcc -o pcredemo pcredemo.c -lpcre
|
||||
.sp
|
||||
If PCRE is installed elsewhere, you may need to add additional options to the
|
||||
command line. For example, on a Unix-like system that has PCRE installed in
|
||||
\fI/usr/local\fP, you can compile the demonstration program using a command
|
||||
like this:
|
||||
.sp
|
||||
.\" JOINSH
|
||||
gcc -o pcredemo -I/usr/local/include pcredemo.c \e
|
||||
-L/usr/local/lib -lpcre
|
||||
.sp
|
||||
Once you have compiled the demonstration program, you can run simple tests like
|
||||
this:
|
||||
.sp
|
||||
./pcredemo 'cat|dog' 'the cat sat on the mat'
|
||||
./pcredemo -g 'cat|dog' 'the dog sat on the cat'
|
||||
.sp
|
||||
Note that there is a much more comprehensive test program, called
|
||||
.\" HREF
|
||||
\fBpcretest\fP,
|
||||
.\"
|
||||
which supports many more facilities for testing regular expressions and the
|
||||
PCRE library. The \fBpcredemo\fP program is provided as a simple coding
|
||||
example.
|
||||
.P
|
||||
On some operating systems (e.g. Solaris), when PCRE is not installed in the
|
||||
standard library directory, you may get an error like this when you try to run
|
||||
\fBpcredemo\fP:
|
||||
.sp
|
||||
ld.so.1: a.out: fatal: libpcre.so.0: open failed: No such file or directory
|
||||
.sp
|
||||
This is caused by the way shared library support works on those systems. You
|
||||
need to add
|
||||
.sp
|
||||
-R/usr/local/lib
|
||||
.sp
|
||||
(for example) to the compile command to get round this problem.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 09 September 2004
|
||||
.br
|
||||
Copyright (c) 1997-2004 University of Cambridge.
|
115
libs/pcre/doc/pcrestack.3
Normal file
115
libs/pcre/doc/pcrestack.3
Normal file
@ -0,0 +1,115 @@
|
||||
.TH PCRESTACK 3
|
||||
.SH NAME
|
||||
PCRE - Perl-compatible regular expressions
|
||||
.SH "PCRE DISCUSSION OF STACK USAGE"
|
||||
.rs
|
||||
.sp
|
||||
When you call \fBpcre_exec()\fP, it makes use of an internal function called
|
||||
\fBmatch()\fP. This calls itself recursively at branch points in the pattern,
|
||||
in order to remember the state of the match so that it can back up and try a
|
||||
different alternative if the first one fails. As matching proceeds deeper and
|
||||
deeper into the tree of possibilities, the recursion depth increases.
|
||||
.P
|
||||
Not all calls of \fBmatch()\fP increase the recursion depth; for an item such
|
||||
as a* it may be called several times at the same level, after matching
|
||||
different numbers of a's. Furthermore, in a number of cases where the result of
|
||||
the recursive call would immediately be passed back as the result of the
|
||||
current call (a "tail recursion"), the function is just restarted instead.
|
||||
.P
|
||||
The \fBpcre_dfa_exec()\fP function operates in an entirely different way, and
|
||||
hardly uses recursion at all. The limit on its complexity is the amount of
|
||||
workspace it is given. The comments that follow do NOT apply to
|
||||
\fBpcre_dfa_exec()\fP; they are relevant only for \fBpcre_exec()\fP.
|
||||
.P
|
||||
You can set limits on the number of times that \fBmatch()\fP is called, both in
|
||||
total and recursively. If the limit is exceeded, an error occurs. For details,
|
||||
see the
|
||||
.\" HTML <a href="pcreapi.html#extradata">
|
||||
.\" </a>
|
||||
section on extra data for \fBpcre_exec()\fP
|
||||
.\"
|
||||
in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
Each time that \fBmatch()\fP is actually called recursively, it uses memory
|
||||
from the process stack. For certain kinds of pattern and data, very large
|
||||
amounts of stack may be needed, despite the recognition of "tail recursion".
|
||||
You can often reduce the amount of recursion, and therefore the amount of stack
|
||||
used, by modifying the pattern that is being matched. Consider, for example,
|
||||
this pattern:
|
||||
.sp
|
||||
([^<]|<(?!inet))+
|
||||
.sp
|
||||
It matches from wherever it starts until it encounters "<inet" or the end of
|
||||
the data, and is the kind of pattern that might be used when processing an XML
|
||||
file. Each iteration of the outer parentheses matches either one character that
|
||||
is not "<" or a "<" that is not followed by "inet". However, each time a
|
||||
parenthesis is processed, a recursion occurs, so this formulation uses a stack
|
||||
frame for each matched character. For a long string, a lot of stack is
|
||||
required. Consider now this rewritten pattern, which matches exactly the same
|
||||
strings:
|
||||
.sp
|
||||
([^<]++|<(?!inet))
|
||||
.sp
|
||||
This uses very much less stack, because runs of characters that do not contain
|
||||
"<" are "swallowed" in one item inside the parentheses. Recursion happens only
|
||||
when a "<" character that is not followed by "inet" is encountered (and we
|
||||
assume this is relatively rare). A possessive quantifier is used to stop any
|
||||
backtracking into the runs of non-"<" characters, but that is not related to
|
||||
stack usage.
|
||||
.P
|
||||
In environments where stack memory is constrained, you might want to compile
|
||||
PCRE to use heap memory instead of stack for remembering back-up points. This
|
||||
makes it run a lot more slowly, however. Details of how to do this are given in
|
||||
the
|
||||
.\" HREF
|
||||
\fBpcrebuild\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
In Unix-like environments, there is not often a problem with the stack, though
|
||||
the default limit on stack size varies from system to system. Values from 8Mb
|
||||
to 64Mb are common. You can find your default limit by running the command:
|
||||
.sp
|
||||
ulimit -s
|
||||
.sp
|
||||
The effect of running out of stack is often SIGSEGV, though sometimes an error
|
||||
message is given. You can normally increase the limit on stack size by code
|
||||
such as this:
|
||||
.sp
|
||||
struct rlimit rlim;
|
||||
getrlimit(RLIMIT_STACK, &rlim);
|
||||
rlim.rlim_cur = 100*1024*1024;
|
||||
setrlimit(RLIMIT_STACK, &rlim);
|
||||
.sp
|
||||
This reads the current limits (soft and hard) using \fBgetrlimit()\fP, then
|
||||
attempts to increase the soft limit to 100Mb using \fBsetrlimit()\fP. You must
|
||||
do this before calling \fBpcre_exec()\fP.
|
||||
.P
|
||||
PCRE has an internal counter that can be used to limit the depth of recursion,
|
||||
and thus cause \fBpcre_exec()\fP to give an error code before it runs out of
|
||||
stack. By default, the limit is very large, and unlikely ever to operate. It
|
||||
can be changed when PCRE is built, and it can also be set when
|
||||
\fBpcre_exec()\fP is called. For details of these interfaces, see the
|
||||
.\" HREF
|
||||
\fBpcrebuild\fP
|
||||
.\"
|
||||
and
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
As a very rough rule of thumb, you should reckon on about 500 bytes per
|
||||
recursion. Thus, if you want to limit your stack usage to 8Mb, you
|
||||
should set the limit at 16000 recursions. A 64Mb stack, on the other hand, can
|
||||
support around 128000 recursions. The \fBpcretest\fP test program has a command
|
||||
line option (\fB-S\fP) that can be used to increase its stack.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 29 June 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
631
libs/pcre/doc/pcretest.1
Normal file
631
libs/pcre/doc/pcretest.1
Normal file
@ -0,0 +1,631 @@
|
||||
.TH PCRETEST 1
|
||||
.SH NAME
|
||||
pcretest - a program for testing Perl-compatible regular expressions.
|
||||
.SH SYNOPSIS
|
||||
.rs
|
||||
.sp
|
||||
.B pcretest "[options] [source] [destination]"
|
||||
.sp
|
||||
\fBpcretest\fP was written as a test program for the PCRE regular expression
|
||||
library itself, but it can also be used for experimenting with regular
|
||||
expressions. This document describes the features of the test program; for
|
||||
details of the regular expressions themselves, see the
|
||||
.\" HREF
|
||||
\fBpcrepattern\fP
|
||||
.\"
|
||||
documentation. For details of the PCRE library function calls and their
|
||||
options, see the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.
|
||||
.SH OPTIONS
|
||||
.rs
|
||||
.TP 10
|
||||
\fB-C\fP
|
||||
Output the version number of the PCRE library, and all available information
|
||||
about the optional features that are included, and then exit.
|
||||
.TP 10
|
||||
\fB-d\fP
|
||||
Behave as if each regex has the \fB/D\fP (debug) modifier; the internal
|
||||
form is output after compilation.
|
||||
.TP 10
|
||||
\fB-dfa\fP
|
||||
Behave as if each data line contains the \eD escape sequence; this causes the
|
||||
alternative matching function, \fBpcre_dfa_exec()\fP, to be used instead of the
|
||||
standard \fBpcre_exec()\fP function (more detail is given below).
|
||||
.TP 10
|
||||
\fB-i\fP
|
||||
Behave as if each regex has the \fB/I\fP modifier; information about the
|
||||
compiled pattern is given after compilation.
|
||||
.TP 10
|
||||
\fB-m\fP
|
||||
Output the size of each compiled pattern after it has been compiled. This is
|
||||
equivalent to adding \fB/M\fP to each regular expression. For compatibility
|
||||
with earlier versions of pcretest, \fB-s\fP is a synonym for \fB-m\fP.
|
||||
.TP 10
|
||||
\fB-o\fP \fIosize\fP
|
||||
Set the number of elements in the output vector that is used when calling
|
||||
\fBpcre_exec()\fP to be \fIosize\fP. The default value is 45, which is enough
|
||||
for 14 capturing subexpressions. The vector size can be changed for individual
|
||||
matching calls by including \eO in the data line (see below).
|
||||
.TP 10
|
||||
\fB-p\fP
|
||||
Behave as if each regex has the \fB/P\fP modifier; the POSIX wrapper API is
|
||||
used to call PCRE. None of the other options has any effect when \fB-p\fP is
|
||||
set.
|
||||
.TP 10
|
||||
\fB-q\fP
|
||||
Do not output the version number of \fBpcretest\fP at the start of execution.
|
||||
.TP 10
|
||||
\fB-S\fP \fIsize\fP
|
||||
On Unix-like systems, set the size of the runtime stack to \fIsize\fP
|
||||
megabytes.
|
||||
.TP 10
|
||||
\fB-t\fP
|
||||
Run each compile, study, and match many times with a timer, and output
|
||||
resulting time per compile or match (in milliseconds). Do not set \fB-m\fP with
|
||||
\fB-t\fP, because you will then get the size output a zillion times, and the
|
||||
timing will be distorted.
|
||||
.
|
||||
.
|
||||
.SH DESCRIPTION
|
||||
.rs
|
||||
.sp
|
||||
If \fBpcretest\fP is given two filename arguments, it reads from the first and
|
||||
writes to the second. If it is given only one filename argument, it reads from
|
||||
that file and writes to stdout. Otherwise, it reads from stdin and writes to
|
||||
stdout, and prompts for each line of input, using "re>" to prompt for regular
|
||||
expressions, and "data>" to prompt for data lines.
|
||||
.P
|
||||
The program handles any number of sets of input on a single input file. Each
|
||||
set starts with a regular expression, and continues with any number of data
|
||||
lines to be matched against the pattern.
|
||||
.P
|
||||
Each data line is matched separately and independently. If you want to do
|
||||
multi-line matches, you have to use the \en escape sequence (or \er or \er\en,
|
||||
depending on the newline setting) in a single line of input to encode the
|
||||
newline characters. There is no limit on the length of data lines; the input
|
||||
buffer is automatically extended if it is too small.
|
||||
.P
|
||||
An empty line signals the end of the data lines, at which point a new regular
|
||||
expression is read. The regular expressions are given enclosed in any
|
||||
non-alphanumeric delimiters other than backslash, for example:
|
||||
.sp
|
||||
/(a|bc)x+yz/
|
||||
.sp
|
||||
White space before the initial delimiter is ignored. A regular expression may
|
||||
be continued over several input lines, in which case the newline characters are
|
||||
included within it. It is possible to include the delimiter within the pattern
|
||||
by escaping it, for example
|
||||
.sp
|
||||
/abc\e/def/
|
||||
.sp
|
||||
If you do so, the escape and the delimiter form part of the pattern, but since
|
||||
delimiters are always non-alphanumeric, this does not affect its interpretation.
|
||||
If the terminating delimiter is immediately followed by a backslash, for
|
||||
example,
|
||||
.sp
|
||||
/abc/\e
|
||||
.sp
|
||||
then a backslash is added to the end of the pattern. This is done to provide a
|
||||
way of testing the error condition that arises if a pattern finishes with a
|
||||
backslash, because
|
||||
.sp
|
||||
/abc\e/
|
||||
.sp
|
||||
is interpreted as the first line of a pattern that starts with "abc/", causing
|
||||
pcretest to read the next line as a continuation of the regular expression.
|
||||
.
|
||||
.
|
||||
.SH "PATTERN MODIFIERS"
|
||||
.rs
|
||||
.sp
|
||||
A pattern may be followed by any number of modifiers, which are mostly single
|
||||
characters. Following Perl usage, these are referred to below as, for example,
|
||||
"the \fB/i\fP modifier", even though the delimiter of the pattern need not
|
||||
always be a slash, and no slash is used when writing modifiers. Whitespace may
|
||||
appear between the final pattern delimiter and the first modifier, and between
|
||||
the modifiers themselves.
|
||||
.P
|
||||
The \fB/i\fP, \fB/m\fP, \fB/s\fP, and \fB/x\fP modifiers set the PCRE_CASELESS,
|
||||
PCRE_MULTILINE, PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when
|
||||
\fBpcre_compile()\fP is called. These four modifier letters have the same
|
||||
effect as they do in Perl. For example:
|
||||
.sp
|
||||
/caseless/i
|
||||
.sp
|
||||
The following table shows additional modifiers for setting PCRE options that do
|
||||
not correspond to anything in Perl:
|
||||
.sp
|
||||
\fB/A\fP PCRE_ANCHORED
|
||||
\fB/C\fP PCRE_AUTO_CALLOUT
|
||||
\fB/E\fP PCRE_DOLLAR_ENDONLY
|
||||
\fB/f\fP PCRE_FIRSTLINE
|
||||
\fB/J\fP PCRE_DUPNAMES
|
||||
\fB/N\fP PCRE_NO_AUTO_CAPTURE
|
||||
\fB/U\fP PCRE_UNGREEDY
|
||||
\fB/X\fP PCRE_EXTRA
|
||||
\fB/<cr>\fP PCRE_NEWLINE_CR
|
||||
\fB/<lf>\fP PCRE_NEWLINE_LF
|
||||
\fB/<crlf>\fP PCRE_NEWLINE_CRLF
|
||||
.sp
|
||||
Those specifying line endings are literal strings as shown. Details of the
|
||||
meanings of these PCRE options are given in the
|
||||
.\" HREF
|
||||
\fBpcreapi\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.
|
||||
.SS "Finding all matches in a string"
|
||||
.rs
|
||||
.sp
|
||||
Searching for all possible matches within each subject string can be requested
|
||||
by the \fB/g\fP or \fB/G\fP modifier. After finding a match, PCRE is called
|
||||
again to search the remainder of the subject string. The difference between
|
||||
\fB/g\fP and \fB/G\fP is that the former uses the \fIstartoffset\fP argument to
|
||||
\fBpcre_exec()\fP to start searching at a new point within the entire string
|
||||
(which is in effect what Perl does), whereas the latter passes over a shortened
|
||||
substring. This makes a difference to the matching process if the pattern
|
||||
begins with a lookbehind assertion (including \eb or \eB).
|
||||
.P
|
||||
If any call to \fBpcre_exec()\fP in a \fB/g\fP or \fB/G\fP sequence matches an
|
||||
empty string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
|
||||
flags set in order to search for another, non-empty, match at the same point.
|
||||
If this second match fails, the start offset is advanced by one, and the normal
|
||||
match is retried. This imitates the way Perl handles such cases when using the
|
||||
\fB/g\fP modifier or the \fBsplit()\fP function.
|
||||
.
|
||||
.
|
||||
.SS "Other modifiers"
|
||||
.rs
|
||||
.sp
|
||||
There are yet more modifiers for controlling the way \fBpcretest\fP
|
||||
operates.
|
||||
.P
|
||||
The \fB/+\fP modifier requests that as well as outputting the substring that
|
||||
matched the entire pattern, pcretest should in addition output the remainder of
|
||||
the subject string. This is useful for tests where the subject contains
|
||||
multiple copies of the same substring.
|
||||
.P
|
||||
The \fB/L\fP modifier must be followed directly by the name of a locale, for
|
||||
example,
|
||||
.sp
|
||||
/pattern/Lfr_FR
|
||||
.sp
|
||||
For this reason, it must be the last modifier. The given locale is set,
|
||||
\fBpcre_maketables()\fP is called to build a set of character tables for the
|
||||
locale, and this is then passed to \fBpcre_compile()\fP when compiling the
|
||||
regular expression. Without an \fB/L\fP modifier, NULL is passed as the tables
|
||||
pointer; that is, \fB/L\fP applies only to the expression on which it appears.
|
||||
.P
|
||||
The \fB/I\fP modifier requests that \fBpcretest\fP output information about the
|
||||
compiled pattern (whether it is anchored, has a fixed first character, and
|
||||
so on). It does this by calling \fBpcre_fullinfo()\fP after compiling a
|
||||
pattern. If the pattern is studied, the results of that are also output.
|
||||
.P
|
||||
The \fB/D\fP modifier is a PCRE debugging feature, which also assumes \fB/I\fP.
|
||||
It causes the internal form of compiled regular expressions to be output after
|
||||
compilation. If the pattern was studied, the information returned is also
|
||||
output.
|
||||
.P
|
||||
The \fB/F\fP modifier causes \fBpcretest\fP to flip the byte order of the
|
||||
fields in the compiled pattern that contain 2-byte and 4-byte numbers. This
|
||||
facility is for testing the feature in PCRE that allows it to execute patterns
|
||||
that were compiled on a host with a different endianness. This feature is not
|
||||
available when the POSIX interface to PCRE is being used, that is, when the
|
||||
\fB/P\fP pattern modifier is specified. See also the section about saving and
|
||||
reloading compiled patterns below.
|
||||
.P
|
||||
The \fB/S\fP modifier causes \fBpcre_study()\fP to be called after the
|
||||
expression has been compiled, and the results used when the expression is
|
||||
matched.
|
||||
.P
|
||||
The \fB/M\fP modifier causes the size of memory block used to hold the compiled
|
||||
pattern to be output.
|
||||
.P
|
||||
The \fB/P\fP modifier causes \fBpcretest\fP to call PCRE via the POSIX wrapper
|
||||
API rather than its native API. When this is done, all other modifiers except
|
||||
\fB/i\fP, \fB/m\fP, and \fB/+\fP are ignored. REG_ICASE is set if \fB/i\fP is
|
||||
present, and REG_NEWLINE is set if \fB/m\fP is present. The wrapper functions
|
||||
force PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
|
||||
.P
|
||||
The \fB/8\fP modifier causes \fBpcretest\fP to call PCRE with the PCRE_UTF8
|
||||
option set. This turns on support for UTF-8 character handling in PCRE,
|
||||
provided that it was compiled with this support enabled. This modifier also
|
||||
causes any non-printing characters in output strings to be printed using the
|
||||
\ex{hh...} notation if they are valid UTF-8 sequences.
|
||||
.P
|
||||
If the \fB/?\fP modifier is used with \fB/8\fP, it causes \fBpcretest\fP to
|
||||
call \fBpcre_compile()\fP with the PCRE_NO_UTF8_CHECK option, to suppress the
|
||||
checking of the string for UTF-8 validity.
|
||||
.
|
||||
.
|
||||
.SH "DATA LINES"
|
||||
.rs
|
||||
.sp
|
||||
Before each data line is passed to \fBpcre_exec()\fP, leading and trailing
|
||||
whitespace is removed, and it is then scanned for \e escapes. Some of these are
|
||||
pretty esoteric features, intended for checking out some of the more
|
||||
complicated features of PCRE. If you are just testing "ordinary" regular
|
||||
expressions, you probably don't need any of these. The following escapes are
|
||||
recognized:
|
||||
.sp
|
||||
\ea alarm (= BEL)
|
||||
\eb backspace
|
||||
\ee escape
|
||||
\ef formfeed
|
||||
\en newline
|
||||
.\" JOIN
|
||||
\eqdd set the PCRE_MATCH_LIMIT limit to dd
|
||||
(any number of digits)
|
||||
\er carriage return
|
||||
\et tab
|
||||
\ev vertical tab
|
||||
\ennn octal character (up to 3 octal digits)
|
||||
\exhh hexadecimal character (up to 2 hex digits)
|
||||
.\" JOIN
|
||||
\ex{hh...} hexadecimal character, any number of digits
|
||||
in UTF-8 mode
|
||||
.\" JOIN
|
||||
\eA pass the PCRE_ANCHORED option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\eB pass the PCRE_NOTBOL option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\eCdd call pcre_copy_substring() for substring dd
|
||||
after a successful match (number less than 32)
|
||||
.\" JOIN
|
||||
\eCname call pcre_copy_named_substring() for substring
|
||||
"name" after a successful match (name termin-
|
||||
ated by next non alphanumeric character)
|
||||
.\" JOIN
|
||||
\eC+ show the current captured substrings at callout
|
||||
time
|
||||
\eC- do not supply a callout function
|
||||
.\" JOIN
|
||||
\eC!n return 1 instead of 0 when callout number n is
|
||||
reached
|
||||
.\" JOIN
|
||||
\eC!n!m return 1 instead of 0 when callout number n is
|
||||
reached for the nth time
|
||||
.\" JOIN
|
||||
\eC*n pass the number n (may be negative) as callout
|
||||
data; this is used as the callout return value
|
||||
\eD use the \fBpcre_dfa_exec()\fP match function
|
||||
\eF only shortest match for \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\eGdd call pcre_get_substring() for substring dd
|
||||
after a successful match (number less than 32)
|
||||
.\" JOIN
|
||||
\eGname call pcre_get_named_substring() for substring
|
||||
"name" after a successful match (name termin-
|
||||
ated by next non-alphanumeric character)
|
||||
.\" JOIN
|
||||
\eL call pcre_get_substringlist() after a
|
||||
successful match
|
||||
.\" JOIN
|
||||
\eM discover the minimum MATCH_LIMIT and
|
||||
MATCH_LIMIT_RECURSION settings
|
||||
.\" JOIN
|
||||
\eN pass the PCRE_NOTEMPTY option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\eOdd set the size of the output vector passed to
|
||||
\fBpcre_exec()\fP to dd (any number of digits)
|
||||
.\" JOIN
|
||||
\eP pass the PCRE_PARTIAL option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\eQdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd
|
||||
(any number of digits)
|
||||
\eR pass the PCRE_DFA_RESTART option to \fBpcre_dfa_exec()\fP
|
||||
\eS output details of memory get/free calls during matching
|
||||
.\" JOIN
|
||||
\eZ pass the PCRE_NOTEOL option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\e? pass the PCRE_NO_UTF8_CHECK option to
|
||||
\fBpcre_exec()\fP or \fBpcre_dfa_exec()\fP
|
||||
\e>dd start the match at offset dd (any number of digits);
|
||||
.\" JOIN
|
||||
this sets the \fIstartoffset\fP argument for \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\e<cr> pass the PCRE_NEWLINE_CR option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\e<lf> pass the PCRE_NEWLINE_LF option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.\" JOIN
|
||||
\e<crlf> pass the PCRE_NEWLINE_CRLF option to \fBpcre_exec()\fP
|
||||
or \fBpcre_dfa_exec()\fP
|
||||
.sp
|
||||
The escapes that specify line endings are literal strings, exactly as shown.
|
||||
A backslash followed by anything else just escapes the anything else. If the
|
||||
very last character is a backslash, it is ignored. This gives a way of passing
|
||||
an empty line as data, since a real empty line terminates the data input.
|
||||
.P
|
||||
If \eM is present, \fBpcretest\fP calls \fBpcre_exec()\fP several times, with
|
||||
different values in the \fImatch_limit\fP and \fImatch_limit_recursion\fP
|
||||
fields of the \fBpcre_extra\fP data structure, until it finds the minimum
|
||||
numbers for each parameter that allow \fBpcre_exec()\fP to complete. The
|
||||
\fImatch_limit\fP number is a measure of the amount of backtracking that takes
|
||||
place, and checking it out can be instructive. For most simple matches, the
|
||||
number is quite small, but for patterns with very large numbers of matching
|
||||
possibilities, it can become large very quickly with increasing length of
|
||||
subject string. The \fImatch_limit_recursion\fP number is a measure of how much
|
||||
stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed
|
||||
to complete the match attempt.
|
||||
.P
|
||||
When \eO is used, the value specified may be higher or lower than the size set
|
||||
by the \fB-O\fP command line option (or defaulted to 45); \eO applies only to
|
||||
the call of \fBpcre_exec()\fP for the line in which it appears.
|
||||
.P
|
||||
If the \fB/P\fP modifier was present on the pattern, causing the POSIX wrapper
|
||||
API to be used, the only option-setting sequences that have any effect are \eB
|
||||
and \eZ, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to
|
||||
\fBregexec()\fP.
|
||||
.P
|
||||
The use of \ex{hh...} to represent UTF-8 characters is not dependent on the use
|
||||
of the \fB/8\fP modifier on the pattern. It is recognized always. There may be
|
||||
any number of hexadecimal digits inside the braces. The result is from one to
|
||||
six bytes, encoded according to the UTF-8 rules.
|
||||
.
|
||||
.
|
||||
.SH "THE ALTERNATIVE MATCHING FUNCTION"
|
||||
.rs
|
||||
.sp
|
||||
By default, \fBpcretest\fP uses the standard PCRE matching function,
|
||||
\fBpcre_exec()\fP to match each data line. From release 6.0, PCRE supports an
|
||||
alternative matching function, \fBpcre_dfa_test()\fP, which operates in a
|
||||
different way, and has some restrictions. The differences between the two
|
||||
functions are described in the
|
||||
.\" HREF
|
||||
\fBpcrematching\fP
|
||||
.\"
|
||||
documentation.
|
||||
.P
|
||||
If a data line contains the \eD escape sequence, or if the command line
|
||||
contains the \fB-dfa\fP option, the alternative matching function is called.
|
||||
This function finds all possible matches at a given point. If, however, the \eF
|
||||
escape sequence is present in the data line, it stops after the first match is
|
||||
found. This is always the shortest possible match.
|
||||
.
|
||||
.
|
||||
.SH "DEFAULT OUTPUT FROM PCRETEST"
|
||||
.rs
|
||||
.sp
|
||||
This section describes the output when the normal matching function,
|
||||
\fBpcre_exec()\fP, is being used.
|
||||
.P
|
||||
When a match succeeds, pcretest outputs the list of captured substrings that
|
||||
\fBpcre_exec()\fP returns, starting with number 0 for the string that matched
|
||||
the whole pattern. Otherwise, it outputs "No match" or "Partial match"
|
||||
when \fBpcre_exec()\fP returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
|
||||
respectively, and otherwise the PCRE negative error number. Here is an example
|
||||
of an interactive \fBpcretest\fP run.
|
||||
.sp
|
||||
$ pcretest
|
||||
PCRE version 5.00 07-Sep-2004
|
||||
.sp
|
||||
re> /^abc(\ed+)/
|
||||
data> abc123
|
||||
0: abc123
|
||||
1: 123
|
||||
data> xyz
|
||||
No match
|
||||
.sp
|
||||
If the strings contain any non-printing characters, they are output as \e0x
|
||||
escapes, or as \ex{...} escapes if the \fB/8\fP modifier was present on the
|
||||
pattern. If the pattern has the \fB/+\fP modifier, the output for substring 0
|
||||
is followed by the the rest of the subject string, identified by "0+" like
|
||||
this:
|
||||
.sp
|
||||
re> /cat/+
|
||||
data> cataract
|
||||
0: cat
|
||||
0+ aract
|
||||
.sp
|
||||
If the pattern has the \fB/g\fP or \fB/G\fP modifier, the results of successive
|
||||
matching attempts are output in sequence, like this:
|
||||
.sp
|
||||
re> /\eBi(\ew\ew)/g
|
||||
data> Mississippi
|
||||
0: iss
|
||||
1: ss
|
||||
0: iss
|
||||
1: ss
|
||||
0: ipp
|
||||
1: pp
|
||||
.sp
|
||||
"No match" is output only if the first match attempt fails.
|
||||
.P
|
||||
If any of the sequences \fB\eC\fP, \fB\eG\fP, or \fB\eL\fP are present in a
|
||||
data line that is successfully matched, the substrings extracted by the
|
||||
convenience functions are output with C, G, or L after the string number
|
||||
instead of a colon. This is in addition to the normal full list. The string
|
||||
length (that is, the return from the extraction function) is given in
|
||||
parentheses after each string for \fB\eC\fP and \fB\eG\fP.
|
||||
.P
|
||||
Note that while patterns can be continued over several lines (a plain ">"
|
||||
prompt is used for continuations), data lines may not. However newlines can be
|
||||
included in data by means of the \en escape (or \er or \er\en for those newline
|
||||
settings).
|
||||
.
|
||||
.
|
||||
.SH "OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION"
|
||||
.rs
|
||||
.sp
|
||||
When the alternative matching function, \fBpcre_dfa_exec()\fP, is used (by
|
||||
means of the \eD escape sequence or the \fB-dfa\fP command line option), the
|
||||
output consists of a list of all the matches that start at the first point in
|
||||
the subject where there is at least one match. For example:
|
||||
.sp
|
||||
re> /(tang|tangerine|tan)/
|
||||
data> yellow tangerine\eD
|
||||
0: tangerine
|
||||
1: tang
|
||||
2: tan
|
||||
.sp
|
||||
(Using the normal matching function on this data finds only "tang".) The
|
||||
longest matching string is always given first (and numbered zero).
|
||||
.P
|
||||
If \fB/g\P is present on the pattern, the search for further matches resumes
|
||||
at the end of the longest match. For example:
|
||||
.sp
|
||||
re> /(tang|tangerine|tan)/g
|
||||
data> yellow tangerine and tangy sultana\eD
|
||||
0: tangerine
|
||||
1: tang
|
||||
2: tan
|
||||
0: tang
|
||||
1: tan
|
||||
0: tan
|
||||
.sp
|
||||
Since the matching function does not support substring capture, the escape
|
||||
sequences that are concerned with captured substrings are not relevant.
|
||||
.
|
||||
.
|
||||
.SH "RESTARTING AFTER A PARTIAL MATCH"
|
||||
.rs
|
||||
.sp
|
||||
When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
|
||||
indicating that the subject partially matched the pattern, you can restart the
|
||||
match with additional subject data by means of the \eR escape sequence. For
|
||||
example:
|
||||
.sp
|
||||
re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
|
||||
data> 23ja\eP\eD
|
||||
Partial match: 23ja
|
||||
data> n05\eR\eD
|
||||
0: n05
|
||||
.sp
|
||||
For further information about partial matching, see the
|
||||
.\" HREF
|
||||
\fBpcrepartial\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.
|
||||
.SH CALLOUTS
|
||||
.rs
|
||||
.sp
|
||||
If the pattern contains any callout requests, \fBpcretest\fP's callout function
|
||||
is called during matching. This works with both matching functions. By default,
|
||||
the called function displays the callout number, the start and current
|
||||
positions in the text at the callout time, and the next pattern item to be
|
||||
tested. For example, the output
|
||||
.sp
|
||||
--->pqrabcdef
|
||||
0 ^ ^ \ed
|
||||
.sp
|
||||
indicates that callout number 0 occurred for a match attempt starting at the
|
||||
fourth character of the subject string, when the pointer was at the seventh
|
||||
character of the data, and when the next pattern item was \ed. Just one
|
||||
circumflex is output if the start and current positions are the same.
|
||||
.P
|
||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as a
|
||||
result of the \fB/C\fP pattern modifier. In this case, instead of showing the
|
||||
callout number, the offset in the pattern, preceded by a plus, is output. For
|
||||
example:
|
||||
.sp
|
||||
re> /\ed?[A-E]\e*/C
|
||||
data> E*
|
||||
--->E*
|
||||
+0 ^ \ed?
|
||||
+3 ^ [A-E]
|
||||
+8 ^^ \e*
|
||||
+10 ^ ^
|
||||
0: E*
|
||||
.sp
|
||||
The callout function in \fBpcretest\fP returns zero (carry on matching) by
|
||||
default, but you can use a \eC item in a data line (as described above) to
|
||||
change this.
|
||||
.P
|
||||
Inserting callouts can be helpful when using \fBpcretest\fP to check
|
||||
complicated regular expressions. For further information about callouts, see
|
||||
the
|
||||
.\" HREF
|
||||
\fBpcrecallout\fP
|
||||
.\"
|
||||
documentation.
|
||||
.
|
||||
.
|
||||
.SH "SAVING AND RELOADING COMPILED PATTERNS"
|
||||
.rs
|
||||
.sp
|
||||
The facilities described in this section are not available when the POSIX
|
||||
inteface to PCRE is being used, that is, when the \fB/P\fP pattern modifier is
|
||||
specified.
|
||||
.P
|
||||
When the POSIX interface is not in use, you can cause \fBpcretest\fP to write a
|
||||
compiled pattern to a file, by following the modifiers with > and a file name.
|
||||
For example:
|
||||
.sp
|
||||
/pattern/im >/some/file
|
||||
.sp
|
||||
See the
|
||||
.\" HREF
|
||||
\fBpcreprecompile\fP
|
||||
.\"
|
||||
documentation for a discussion about saving and re-using compiled patterns.
|
||||
.P
|
||||
The data that is written is binary. The first eight bytes are the length of the
|
||||
compiled pattern data followed by the length of the optional study data, each
|
||||
written as four bytes in big-endian order (most significant byte first). If
|
||||
there is no study data (either the pattern was not studied, or studying did not
|
||||
return any data), the second length is zero. The lengths are followed by an
|
||||
exact copy of the compiled pattern. If there is additional study data, this
|
||||
follows immediately after the compiled pattern. After writing the file,
|
||||
\fBpcretest\fP expects to read a new pattern.
|
||||
.P
|
||||
A saved pattern can be reloaded into \fBpcretest\fP by specifing < and a file
|
||||
name instead of a pattern. The name of the file must not contain a < character,
|
||||
as otherwise \fBpcretest\fP will interpret the line as a pattern delimited by <
|
||||
characters.
|
||||
For example:
|
||||
.sp
|
||||
re> </some/file
|
||||
Compiled regex loaded from /some/file
|
||||
No study data
|
||||
.sp
|
||||
When the pattern has been loaded, \fBpcretest\fP proceeds to read data lines in
|
||||
the usual way.
|
||||
.P
|
||||
You can copy a file written by \fBpcretest\fP to a different host and reload it
|
||||
there, even if the new host has opposite endianness to the one on which the
|
||||
pattern was compiled. For example, you can compile on an i86 machine and run on
|
||||
a SPARC machine.
|
||||
.P
|
||||
File names for saving and reloading can be absolute or relative, but note that
|
||||
the shell facility of expanding a file name that starts with a tilde (~) is not
|
||||
available.
|
||||
.P
|
||||
The ability to save and reload files in \fBpcretest\fP is intended for testing
|
||||
and experimentation. It is not intended for production use because only a
|
||||
single pattern can be written to a file. Furthermore, there is no facility for
|
||||
supplying custom character tables for use with a reloaded pattern. If the
|
||||
original pattern was compiled with custom tables, an attempt to match a subject
|
||||
string using a reloaded pattern is likely to cause \fBpcretest\fP to crash.
|
||||
Finally, if you attempt to load a file that is not in the correct format, the
|
||||
result is undefined.
|
||||
.
|
||||
.
|
||||
.SH AUTHOR
|
||||
.rs
|
||||
.sp
|
||||
Philip Hazel
|
||||
.br
|
||||
University Computing Service,
|
||||
.br
|
||||
Cambridge CB2 3QG, England.
|
||||
.P
|
||||
.in 0
|
||||
Last updated: 29 June 2006
|
||||
.br
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
569
libs/pcre/doc/pcretest.txt
Normal file
569
libs/pcre/doc/pcretest.txt
Normal file
@ -0,0 +1,569 @@
|
||||
PCRETEST(1) PCRETEST(1)
|
||||
|
||||
|
||||
NAME
|
||||
pcretest - a program for testing Perl-compatible regular expressions.
|
||||
|
||||
|
||||
SYNOPSIS
|
||||
|
||||
pcretest [options] [source] [destination]
|
||||
|
||||
pcretest was written as a test program for the PCRE regular expression
|
||||
library itself, but it can also be used for experimenting with regular
|
||||
expressions. This document describes the features of the test program;
|
||||
for details of the regular expressions themselves, see the pcrepattern
|
||||
documentation. For details of the PCRE library function calls and their
|
||||
options, see the pcreapi documentation.
|
||||
|
||||
|
||||
OPTIONS
|
||||
|
||||
-C Output the version number of the PCRE library, and all avail-
|
||||
able information about the optional features that are
|
||||
included, and then exit.
|
||||
|
||||
-d Behave as if each regex has the /D (debug) modifier; the
|
||||
internal form is output after compilation.
|
||||
|
||||
-dfa Behave as if each data line contains the \D escape sequence;
|
||||
this causes the alternative matching function,
|
||||
pcre_dfa_exec(), to be used instead of the standard
|
||||
pcre_exec() function (more detail is given below).
|
||||
|
||||
-i Behave as if each regex has the /I modifier; information
|
||||
about the compiled pattern is given after compilation.
|
||||
|
||||
-m Output the size of each compiled pattern after it has been
|
||||
compiled. This is equivalent to adding /M to each regular
|
||||
expression. For compatibility with earlier versions of
|
||||
pcretest, -s is a synonym for -m.
|
||||
|
||||
-o osize Set the number of elements in the output vector that is used
|
||||
when calling pcre_exec() to be osize. The default value is
|
||||
45, which is enough for 14 capturing subexpressions. The vec-
|
||||
tor size can be changed for individual matching calls by
|
||||
including \O in the data line (see below).
|
||||
|
||||
-p Behave as if each regex has the /P modifier; the POSIX wrap-
|
||||
per API is used to call PCRE. None of the other options has
|
||||
any effect when -p is set.
|
||||
|
||||
-q Do not output the version number of pcretest at the start of
|
||||
execution.
|
||||
|
||||
-S size On Unix-like systems, set the size of the runtime stack to
|
||||
size megabytes.
|
||||
|
||||
-t Run each compile, study, and match many times with a timer,
|
||||
and output resulting time per compile or match (in millisec-
|
||||
onds). Do not set -m with -t, because you will then get the
|
||||
size output a zillion times, and the timing will be dis-
|
||||
torted.
|
||||
|
||||
|
||||
DESCRIPTION
|
||||
|
||||
If pcretest is given two filename arguments, it reads from the first
|
||||
and writes to the second. If it is given only one filename argument, it
|
||||
reads from that file and writes to stdout. Otherwise, it reads from
|
||||
stdin and writes to stdout, and prompts for each line of input, using
|
||||
"re>" to prompt for regular expressions, and "data>" to prompt for data
|
||||
lines.
|
||||
|
||||
The program handles any number of sets of input on a single input file.
|
||||
Each set starts with a regular expression, and continues with any num-
|
||||
ber of data lines to be matched against the pattern.
|
||||
|
||||
Each data line is matched separately and independently. If you want to
|
||||
do multi-line matches, you have to use the \n escape sequence (or \r or
|
||||
\r\n, depending on the newline setting) in a single line of input to
|
||||
encode the newline characters. There is no limit on the length of data
|
||||
lines; the input buffer is automatically extended if it is too small.
|
||||
|
||||
An empty line signals the end of the data lines, at which point a new
|
||||
regular expression is read. The regular expressions are given enclosed
|
||||
in any non-alphanumeric delimiters other than backslash, for example:
|
||||
|
||||
/(a|bc)x+yz/
|
||||
|
||||
White space before the initial delimiter is ignored. A regular expres-
|
||||
sion may be continued over several input lines, in which case the new-
|
||||
line characters are included within it. It is possible to include the
|
||||
delimiter within the pattern by escaping it, for example
|
||||
|
||||
/abc\/def/
|
||||
|
||||
If you do so, the escape and the delimiter form part of the pattern,
|
||||
but since delimiters are always non-alphanumeric, this does not affect
|
||||
its interpretation. If the terminating delimiter is immediately fol-
|
||||
lowed by a backslash, for example,
|
||||
|
||||
/abc/\
|
||||
|
||||
then a backslash is added to the end of the pattern. This is done to
|
||||
provide a way of testing the error condition that arises if a pattern
|
||||
finishes with a backslash, because
|
||||
|
||||
/abc\/
|
||||
|
||||
is interpreted as the first line of a pattern that starts with "abc/",
|
||||
causing pcretest to read the next line as a continuation of the regular
|
||||
expression.
|
||||
|
||||
|
||||
PATTERN MODIFIERS
|
||||
|
||||
A pattern may be followed by any number of modifiers, which are mostly
|
||||
single characters. Following Perl usage, these are referred to below
|
||||
as, for example, "the /i modifier", even though the delimiter of the
|
||||
pattern need not always be a slash, and no slash is used when writing
|
||||
modifiers. Whitespace may appear between the final pattern delimiter
|
||||
and the first modifier, and between the modifiers themselves.
|
||||
|
||||
The /i, /m, /s, and /x modifiers set the PCRE_CASELESS, PCRE_MULTILINE,
|
||||
PCRE_DOTALL, or PCRE_EXTENDED options, respectively, when pcre_com-
|
||||
pile() is called. These four modifier letters have the same effect as
|
||||
they do in Perl. For example:
|
||||
|
||||
/caseless/i
|
||||
|
||||
The following table shows additional modifiers for setting PCRE options
|
||||
that do not correspond to anything in Perl:
|
||||
|
||||
/A PCRE_ANCHORED
|
||||
/C PCRE_AUTO_CALLOUT
|
||||
/E PCRE_DOLLAR_ENDONLY
|
||||
/f PCRE_FIRSTLINE
|
||||
/J PCRE_DUPNAMES
|
||||
/N PCRE_NO_AUTO_CAPTURE
|
||||
/U PCRE_UNGREEDY
|
||||
/X PCRE_EXTRA
|
||||
/<cr> PCRE_NEWLINE_CR
|
||||
/<lf> PCRE_NEWLINE_LF
|
||||
/<crlf> PCRE_NEWLINE_CRLF
|
||||
|
||||
Those specifying line endings are literal strings as shown. Details of
|
||||
the meanings of these PCRE options are given in the pcreapi documenta-
|
||||
tion.
|
||||
|
||||
Finding all matches in a string
|
||||
|
||||
Searching for all possible matches within each subject string can be
|
||||
requested by the /g or /G modifier. After finding a match, PCRE is
|
||||
called again to search the remainder of the subject string. The differ-
|
||||
ence between /g and /G is that the former uses the startoffset argument
|
||||
to pcre_exec() to start searching at a new point within the entire
|
||||
string (which is in effect what Perl does), whereas the latter passes
|
||||
over a shortened substring. This makes a difference to the matching
|
||||
process if the pattern begins with a lookbehind assertion (including \b
|
||||
or \B).
|
||||
|
||||
If any call to pcre_exec() in a /g or /G sequence matches an empty
|
||||
string, the next call is done with the PCRE_NOTEMPTY and PCRE_ANCHORED
|
||||
flags set in order to search for another, non-empty, match at the same
|
||||
point. If this second match fails, the start offset is advanced by
|
||||
one, and the normal match is retried. This imitates the way Perl han-
|
||||
dles such cases when using the /g modifier or the split() function.
|
||||
|
||||
Other modifiers
|
||||
|
||||
There are yet more modifiers for controlling the way pcretest operates.
|
||||
|
||||
The /+ modifier requests that as well as outputting the substring that
|
||||
matched the entire pattern, pcretest should in addition output the
|
||||
remainder of the subject string. This is useful for tests where the
|
||||
subject contains multiple copies of the same substring.
|
||||
|
||||
The /L modifier must be followed directly by the name of a locale, for
|
||||
example,
|
||||
|
||||
/pattern/Lfr_FR
|
||||
|
||||
For this reason, it must be the last modifier. The given locale is set,
|
||||
pcre_maketables() is called to build a set of character tables for the
|
||||
locale, and this is then passed to pcre_compile() when compiling the
|
||||
regular expression. Without an /L modifier, NULL is passed as the
|
||||
tables pointer; that is, /L applies only to the expression on which it
|
||||
appears.
|
||||
|
||||
The /I modifier requests that pcretest output information about the
|
||||
compiled pattern (whether it is anchored, has a fixed first character,
|
||||
and so on). It does this by calling pcre_fullinfo() after compiling a
|
||||
pattern. If the pattern is studied, the results of that are also out-
|
||||
put.
|
||||
|
||||
The /D modifier is a PCRE debugging feature, which also assumes /I. It
|
||||
causes the internal form of compiled regular expressions to be output
|
||||
after compilation. If the pattern was studied, the information returned
|
||||
is also output.
|
||||
|
||||
The /F modifier causes pcretest to flip the byte order of the fields in
|
||||
the compiled pattern that contain 2-byte and 4-byte numbers. This
|
||||
facility is for testing the feature in PCRE that allows it to execute
|
||||
patterns that were compiled on a host with a different endianness. This
|
||||
feature is not available when the POSIX interface to PCRE is being
|
||||
used, that is, when the /P pattern modifier is specified. See also the
|
||||
section about saving and reloading compiled patterns below.
|
||||
|
||||
The /S modifier causes pcre_study() to be called after the expression
|
||||
has been compiled, and the results used when the expression is matched.
|
||||
|
||||
The /M modifier causes the size of memory block used to hold the com-
|
||||
piled pattern to be output.
|
||||
|
||||
The /P modifier causes pcretest to call PCRE via the POSIX wrapper API
|
||||
rather than its native API. When this is done, all other modifiers
|
||||
except /i, /m, and /+ are ignored. REG_ICASE is set if /i is present,
|
||||
and REG_NEWLINE is set if /m is present. The wrapper functions force
|
||||
PCRE_DOLLAR_ENDONLY always, and PCRE_DOTALL unless REG_NEWLINE is set.
|
||||
|
||||
The /8 modifier causes pcretest to call PCRE with the PCRE_UTF8 option
|
||||
set. This turns on support for UTF-8 character handling in PCRE, pro-
|
||||
vided that it was compiled with this support enabled. This modifier
|
||||
also causes any non-printing characters in output strings to be printed
|
||||
using the \x{hh...} notation if they are valid UTF-8 sequences.
|
||||
|
||||
If the /? modifier is used with /8, it causes pcretest to call
|
||||
pcre_compile() with the PCRE_NO_UTF8_CHECK option, to suppress the
|
||||
checking of the string for UTF-8 validity.
|
||||
|
||||
|
||||
DATA LINES
|
||||
|
||||
Before each data line is passed to pcre_exec(), leading and trailing
|
||||
whitespace is removed, and it is then scanned for \ escapes. Some of
|
||||
these are pretty esoteric features, intended for checking out some of
|
||||
the more complicated features of PCRE. If you are just testing "ordi-
|
||||
nary" regular expressions, you probably don't need any of these. The
|
||||
following escapes are recognized:
|
||||
|
||||
\a alarm (= BEL)
|
||||
\b backspace
|
||||
\e escape
|
||||
\f formfeed
|
||||
\n newline
|
||||
\qdd set the PCRE_MATCH_LIMIT limit to dd
|
||||
(any number of digits)
|
||||
\r carriage return
|
||||
\t tab
|
||||
\v vertical tab
|
||||
\nnn octal character (up to 3 octal digits)
|
||||
\xhh hexadecimal character (up to 2 hex digits)
|
||||
\x{hh...} hexadecimal character, any number of digits
|
||||
in UTF-8 mode
|
||||
\A pass the PCRE_ANCHORED option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\B pass the PCRE_NOTBOL option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\Cdd call pcre_copy_substring() for substring dd
|
||||
after a successful match (number less than 32)
|
||||
\Cname call pcre_copy_named_substring() for substring
|
||||
"name" after a successful match (name termin-
|
||||
ated by next non alphanumeric character)
|
||||
\C+ show the current captured substrings at callout
|
||||
time
|
||||
\C- do not supply a callout function
|
||||
\C!n return 1 instead of 0 when callout number n is
|
||||
reached
|
||||
\C!n!m return 1 instead of 0 when callout number n is
|
||||
reached for the nth time
|
||||
\C*n pass the number n (may be negative) as callout
|
||||
data; this is used as the callout return value
|
||||
\D use the pcre_dfa_exec() match function
|
||||
\F only shortest match for pcre_dfa_exec()
|
||||
\Gdd call pcre_get_substring() for substring dd
|
||||
after a successful match (number less than 32)
|
||||
\Gname call pcre_get_named_substring() for substring
|
||||
"name" after a successful match (name termin-
|
||||
ated by next non-alphanumeric character)
|
||||
\L call pcre_get_substringlist() after a
|
||||
successful match
|
||||
\M discover the minimum MATCH_LIMIT and
|
||||
MATCH_LIMIT_RECURSION settings
|
||||
\N pass the PCRE_NOTEMPTY option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\Odd set the size of the output vector passed to
|
||||
pcre_exec() to dd (any number of digits)
|
||||
\P pass the PCRE_PARTIAL option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\Qdd set the PCRE_MATCH_LIMIT_RECURSION limit to dd
|
||||
(any number of digits)
|
||||
\R pass the PCRE_DFA_RESTART option to pcre_dfa_exec()
|
||||
\S output details of memory get/free calls during matching
|
||||
\Z pass the PCRE_NOTEOL option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\? pass the PCRE_NO_UTF8_CHECK option to
|
||||
pcre_exec() or pcre_dfa_exec()
|
||||
\>dd start the match at offset dd (any number of digits);
|
||||
this sets the startoffset argument for pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\<cr> pass the PCRE_NEWLINE_CR option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\<lf> pass the PCRE_NEWLINE_LF option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
\<crlf> pass the PCRE_NEWLINE_CRLF option to pcre_exec()
|
||||
or pcre_dfa_exec()
|
||||
|
||||
The escapes that specify line endings are literal strings, exactly as
|
||||
shown. A backslash followed by anything else just escapes the anything
|
||||
else. If the very last character is a backslash, it is ignored. This
|
||||
gives a way of passing an empty line as data, since a real empty line
|
||||
terminates the data input.
|
||||
|
||||
If \M is present, pcretest calls pcre_exec() several times, with dif-
|
||||
ferent values in the match_limit and match_limit_recursion fields of
|
||||
the pcre_extra data structure, until it finds the minimum numbers for
|
||||
each parameter that allow pcre_exec() to complete. The match_limit num-
|
||||
ber is a measure of the amount of backtracking that takes place, and
|
||||
checking it out can be instructive. For most simple matches, the number
|
||||
is quite small, but for patterns with very large numbers of matching
|
||||
possibilities, it can become large very quickly with increasing length
|
||||
of subject string. The match_limit_recursion number is a measure of how
|
||||
much stack (or, if PCRE is compiled with NO_RECURSE, how much heap)
|
||||
memory is needed to complete the match attempt.
|
||||
|
||||
When \O is used, the value specified may be higher or lower than the
|
||||
size set by the -O command line option (or defaulted to 45); \O applies
|
||||
only to the call of pcre_exec() for the line in which it appears.
|
||||
|
||||
If the /P modifier was present on the pattern, causing the POSIX wrap-
|
||||
per API to be used, the only option-setting sequences that have any
|
||||
effect are \B and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively,
|
||||
to be passed to regexec().
|
||||
|
||||
The use of \x{hh...} to represent UTF-8 characters is not dependent on
|
||||
the use of the /8 modifier on the pattern. It is recognized always.
|
||||
There may be any number of hexadecimal digits inside the braces. The
|
||||
result is from one to six bytes, encoded according to the UTF-8 rules.
|
||||
|
||||
|
||||
THE ALTERNATIVE MATCHING FUNCTION
|
||||
|
||||
By default, pcretest uses the standard PCRE matching function,
|
||||
pcre_exec() to match each data line. From release 6.0, PCRE supports an
|
||||
alternative matching function, pcre_dfa_test(), which operates in a
|
||||
different way, and has some restrictions. The differences between the
|
||||
two functions are described in the pcrematching documentation.
|
||||
|
||||
If a data line contains the \D escape sequence, or if the command line
|
||||
contains the -dfa option, the alternative matching function is called.
|
||||
This function finds all possible matches at a given point. If, however,
|
||||
the \F escape sequence is present in the data line, it stops after the
|
||||
first match is found. This is always the shortest possible match.
|
||||
|
||||
|
||||
DEFAULT OUTPUT FROM PCRETEST
|
||||
|
||||
This section describes the output when the normal matching function,
|
||||
pcre_exec(), is being used.
|
||||
|
||||
When a match succeeds, pcretest outputs the list of captured substrings
|
||||
that pcre_exec() returns, starting with number 0 for the string that
|
||||
matched the whole pattern. Otherwise, it outputs "No match" or "Partial
|
||||
match" when pcre_exec() returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PAR-
|
||||
TIAL, respectively, and otherwise the PCRE negative error number. Here
|
||||
is an example of an interactive pcretest run.
|
||||
|
||||
$ pcretest
|
||||
PCRE version 5.00 07-Sep-2004
|
||||
|
||||
re> /^abc(\d+)/
|
||||
data> abc123
|
||||
0: abc123
|
||||
1: 123
|
||||
data> xyz
|
||||
No match
|
||||
|
||||
If the strings contain any non-printing characters, they are output as
|
||||
\0x escapes, or as \x{...} escapes if the /8 modifier was present on
|
||||
the pattern. If the pattern has the /+ modifier, the output for sub-
|
||||
string 0 is followed by the the rest of the subject string, identified
|
||||
by "0+" like this:
|
||||
|
||||
re> /cat/+
|
||||
data> cataract
|
||||
0: cat
|
||||
0+ aract
|
||||
|
||||
If the pattern has the /g or /G modifier, the results of successive
|
||||
matching attempts are output in sequence, like this:
|
||||
|
||||
re> /\Bi(\w\w)/g
|
||||
data> Mississippi
|
||||
0: iss
|
||||
1: ss
|
||||
0: iss
|
||||
1: ss
|
||||
0: ipp
|
||||
1: pp
|
||||
|
||||
"No match" is output only if the first match attempt fails.
|
||||
|
||||
If any of the sequences \C, \G, or \L are present in a data line that
|
||||
is successfully matched, the substrings extracted by the convenience
|
||||
functions are output with C, G, or L after the string number instead of
|
||||
a colon. This is in addition to the normal full list. The string length
|
||||
(that is, the return from the extraction function) is given in paren-
|
||||
theses after each string for \C and \G.
|
||||
|
||||
Note that while patterns can be continued over several lines (a plain
|
||||
">" prompt is used for continuations), data lines may not. However new-
|
||||
lines can be included in data by means of the \n escape (or \r or \r\n
|
||||
for those newline settings).
|
||||
|
||||
|
||||
OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
|
||||
|
||||
When the alternative matching function, pcre_dfa_exec(), is used (by
|
||||
means of the \D escape sequence or the -dfa command line option), the
|
||||
output consists of a list of all the matches that start at the first
|
||||
point in the subject where there is at least one match. For example:
|
||||
|
||||
re> /(tang|tangerine|tan)/
|
||||
data> yellow tangerine\D
|
||||
0: tangerine
|
||||
1: tang
|
||||
2: tan
|
||||
|
||||
(Using the normal matching function on this data finds only "tang".)
|
||||
The longest matching string is always given first (and numbered zero).
|
||||
|
||||
If /gP is present on the pattern, the search for further matches
|
||||
resumes at the end of the longest match. For example:
|
||||
|
||||
re> /(tang|tangerine|tan)/g
|
||||
data> yellow tangerine and tangy sultana\D
|
||||
0: tangerine
|
||||
1: tang
|
||||
2: tan
|
||||
0: tang
|
||||
1: tan
|
||||
0: tan
|
||||
|
||||
Since the matching function does not support substring capture, the
|
||||
escape sequences that are concerned with captured substrings are not
|
||||
relevant.
|
||||
|
||||
|
||||
RESTARTING AFTER A PARTIAL MATCH
|
||||
|
||||
When the alternative matching function has given the PCRE_ERROR_PARTIAL
|
||||
return, indicating that the subject partially matched the pattern, you
|
||||
can restart the match with additional subject data by means of the \R
|
||||
escape sequence. For example:
|
||||
|
||||
re> /^?(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)$/
|
||||
data> 23ja\P\D
|
||||
Partial match: 23ja
|
||||
data> n05\R\D
|
||||
0: n05
|
||||
|
||||
For further information about partial matching, see the pcrepartial
|
||||
documentation.
|
||||
|
||||
|
||||
CALLOUTS
|
||||
|
||||
If the pattern contains any callout requests, pcretest's callout func-
|
||||
tion is called during matching. This works with both matching func-
|
||||
tions. By default, the called function displays the callout number, the
|
||||
start and current positions in the text at the callout time, and the
|
||||
next pattern item to be tested. For example, the output
|
||||
|
||||
--->pqrabcdef
|
||||
0 ^ ^ \d
|
||||
|
||||
indicates that callout number 0 occurred for a match attempt starting
|
||||
at the fourth character of the subject string, when the pointer was at
|
||||
the seventh character of the data, and when the next pattern item was
|
||||
\d. Just one circumflex is output if the start and current positions
|
||||
are the same.
|
||||
|
||||
Callouts numbered 255 are assumed to be automatic callouts, inserted as
|
||||
a result of the /C pattern modifier. In this case, instead of showing
|
||||
the callout number, the offset in the pattern, preceded by a plus, is
|
||||
output. For example:
|
||||
|
||||
re> /\d?[A-E]\*/C
|
||||
data> E*
|
||||
--->E*
|
||||
+0 ^ \d?
|
||||
+3 ^ [A-E]
|
||||
+8 ^^ \*
|
||||
+10 ^ ^
|
||||
0: E*
|
||||
|
||||
The callout function in pcretest returns zero (carry on matching) by
|
||||
default, but you can use a \C item in a data line (as described above)
|
||||
to change this.
|
||||
|
||||
Inserting callouts can be helpful when using pcretest to check compli-
|
||||
cated regular expressions. For further information about callouts, see
|
||||
the pcrecallout documentation.
|
||||
|
||||
|
||||
SAVING AND RELOADING COMPILED PATTERNS
|
||||
|
||||
The facilities described in this section are not available when the
|
||||
POSIX inteface to PCRE is being used, that is, when the /P pattern mod-
|
||||
ifier is specified.
|
||||
|
||||
When the POSIX interface is not in use, you can cause pcretest to write
|
||||
a compiled pattern to a file, by following the modifiers with > and a
|
||||
file name. For example:
|
||||
|
||||
/pattern/im >/some/file
|
||||
|
||||
See the pcreprecompile documentation for a discussion about saving and
|
||||
re-using compiled patterns.
|
||||
|
||||
The data that is written is binary. The first eight bytes are the
|
||||
length of the compiled pattern data followed by the length of the
|
||||
optional study data, each written as four bytes in big-endian order
|
||||
(most significant byte first). If there is no study data (either the
|
||||
pattern was not studied, or studying did not return any data), the sec-
|
||||
ond length is zero. The lengths are followed by an exact copy of the
|
||||
compiled pattern. If there is additional study data, this follows imme-
|
||||
diately after the compiled pattern. After writing the file, pcretest
|
||||
expects to read a new pattern.
|
||||
|
||||
A saved pattern can be reloaded into pcretest by specifing < and a file
|
||||
name instead of a pattern. The name of the file must not contain a <
|
||||
character, as otherwise pcretest will interpret the line as a pattern
|
||||
delimited by < characters. For example:
|
||||
|
||||
re> </some/file
|
||||
Compiled regex loaded from /some/file
|
||||
No study data
|
||||
|
||||
When the pattern has been loaded, pcretest proceeds to read data lines
|
||||
in the usual way.
|
||||
|
||||
You can copy a file written by pcretest to a different host and reload
|
||||
it there, even if the new host has opposite endianness to the one on
|
||||
which the pattern was compiled. For example, you can compile on an i86
|
||||
machine and run on a SPARC machine.
|
||||
|
||||
File names for saving and reloading can be absolute or relative, but
|
||||
note that the shell facility of expanding a file name that starts with
|
||||
a tilde (~) is not available.
|
||||
|
||||
The ability to save and reload files in pcretest is intended for test-
|
||||
ing and experimentation. It is not intended for production use because
|
||||
only a single pattern can be written to a file. Furthermore, there is
|
||||
no facility for supplying custom character tables for use with a
|
||||
reloaded pattern. If the original pattern was compiled with custom
|
||||
tables, an attempt to match a subject string using a reloaded pattern
|
||||
is likely to cause pcretest to crash. Finally, if you attempt to load
|
||||
a file that is not in the correct format, the result is undefined.
|
||||
|
||||
|
||||
AUTHOR
|
||||
|
||||
Philip Hazel
|
||||
University Computing Service,
|
||||
Cambridge CB2 3QG, England.
|
||||
|
||||
Last updated: 29 June 2006
|
||||
Copyright (c) 1997-2006 University of Cambridge.
|
33
libs/pcre/doc/perltest.txt
Normal file
33
libs/pcre/doc/perltest.txt
Normal file
@ -0,0 +1,33 @@
|
||||
The perltest program
|
||||
--------------------
|
||||
|
||||
The perltest program tests Perl's regular expressions; it has the same
|
||||
specification as pcretest, and so can be given identical input, except that
|
||||
input patterns can be followed only by Perl's lower case modifiers and /+ (as
|
||||
used by pcretest), which is recognized and handled by the program.
|
||||
|
||||
The data lines are processed as Perl double-quoted strings, so if they contain
|
||||
" $ or @ characters, these have to be escaped. For this reason, all such
|
||||
characters in testinput1 and testinput4 are escaped so that they can be used
|
||||
for perltest as well as for pcretest. The special upper case pattern
|
||||
modifiers such as /A that pcretest recognizes, and its special data line
|
||||
escapes, are not used in these files. The output should be identical, apart
|
||||
from the initial identifying banner.
|
||||
|
||||
The perltest script can also test UTF-8 features. It works as is for Perl 5.8
|
||||
or higher. It recognizes the special modifier /8 that pcretest uses to invoke
|
||||
UTF-8 functionality. The testinput4 file can be fed to perltest to run
|
||||
compatible UTF-8 tests.
|
||||
|
||||
For Perl 5.6, perltest won't work unmodified for the UTF-8 tests. You need to
|
||||
uncomment the "use utf8" lines that it contains. It is best to do this on a
|
||||
copy of the script, because for non-UTF-8 tests, these lines should remain
|
||||
commented out.
|
||||
|
||||
The other testinput files are not suitable for feeding to perltest, since they
|
||||
make use of the special upper case modifiers and escapes that pcretest uses to
|
||||
test some features of PCRE. Some of these files also contains malformed regular
|
||||
expressions, in order to check that PCRE diagnoses them correctly.
|
||||
|
||||
Philip Hazel
|
||||
September 2004
|
251
libs/pcre/install-sh
Executable file
251
libs/pcre/install-sh
Executable file
@ -0,0 +1,251 @@
|
||||
#!/bin/sh
|
||||
#
|
||||
# install - install a program, script, or datafile
|
||||
# This comes from X11R5 (mit/util/scripts/install.sh).
|
||||
#
|
||||
# Copyright 1991 by the Massachusetts Institute of Technology
|
||||
#
|
||||
# Permission to use, copy, modify, distribute, and sell this software and its
|
||||
# documentation for any purpose is hereby granted without fee, provided that
|
||||
# the above copyright notice appear in all copies and that both that
|
||||
# copyright notice and this permission notice appear in supporting
|
||||
# documentation, and that the name of M.I.T. not be used in advertising or
|
||||
# publicity pertaining to distribution of the software without specific,
|
||||
# written prior permission. M.I.T. makes no representations about the
|
||||
# suitability of this software for any purpose. It is provided "as is"
|
||||
# without express or implied warranty.
|
||||
#
|
||||
# Calling this script install-sh is preferred over install.sh, to prevent
|
||||
# `make' implicit rules from creating a file called install from it
|
||||
# when there is no Makefile.
|
||||
#
|
||||
# This script is compatible with the BSD install script, but was written
|
||||
# from scratch. It can only install one file at a time, a restriction
|
||||
# shared with many OS's install programs.
|
||||
|
||||
|
||||
# set DOITPROG to echo to test this script
|
||||
|
||||
# Don't use :- since 4.3BSD and earlier shells don't like it.
|
||||
doit="${DOITPROG-}"
|
||||
|
||||
|
||||
# put in absolute paths if you don't have them in your path; or use env. vars.
|
||||
|
||||
mvprog="${MVPROG-mv}"
|
||||
cpprog="${CPPROG-cp}"
|
||||
chmodprog="${CHMODPROG-chmod}"
|
||||
chownprog="${CHOWNPROG-chown}"
|
||||
chgrpprog="${CHGRPPROG-chgrp}"
|
||||
stripprog="${STRIPPROG-strip}"
|
||||
rmprog="${RMPROG-rm}"
|
||||
mkdirprog="${MKDIRPROG-mkdir}"
|
||||
|
||||
transformbasename=""
|
||||
transform_arg=""
|
||||
instcmd="$mvprog"
|
||||
chmodcmd="$chmodprog 0755"
|
||||
chowncmd=""
|
||||
chgrpcmd=""
|
||||
stripcmd=""
|
||||
rmcmd="$rmprog -f"
|
||||
mvcmd="$mvprog"
|
||||
src=""
|
||||
dst=""
|
||||
dir_arg=""
|
||||
|
||||
while [ x"$1" != x ]; do
|
||||
case $1 in
|
||||
-c) instcmd="$cpprog"
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-d) dir_arg=true
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-m) chmodcmd="$chmodprog $2"
|
||||
shift
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-o) chowncmd="$chownprog $2"
|
||||
shift
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-g) chgrpcmd="$chgrpprog $2"
|
||||
shift
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-s) stripcmd="$stripprog"
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-t=*) transformarg=`echo $1 | sed 's/-t=//'`
|
||||
shift
|
||||
continue;;
|
||||
|
||||
-b=*) transformbasename=`echo $1 | sed 's/-b=//'`
|
||||
shift
|
||||
continue;;
|
||||
|
||||
*) if [ x"$src" = x ]
|
||||
then
|
||||
src=$1
|
||||
else
|
||||
# this colon is to work around a 386BSD /bin/sh bug
|
||||
:
|
||||
dst=$1
|
||||
fi
|
||||
shift
|
||||
continue;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [ x"$src" = x ]
|
||||
then
|
||||
echo "install: no input file specified"
|
||||
exit 1
|
||||
else
|
||||
true
|
||||
fi
|
||||
|
||||
if [ x"$dir_arg" != x ]; then
|
||||
dst=$src
|
||||
src=""
|
||||
|
||||
if [ -d $dst ]; then
|
||||
instcmd=:
|
||||
chmodcmd=""
|
||||
else
|
||||
instcmd=mkdir
|
||||
fi
|
||||
else
|
||||
|
||||
# Waiting for this to be detected by the "$instcmd $src $dsttmp" command
|
||||
# might cause directories to be created, which would be especially bad
|
||||
# if $src (and thus $dsttmp) contains '*'.
|
||||
|
||||
if [ -f $src -o -d $src ]
|
||||
then
|
||||
true
|
||||
else
|
||||
echo "install: $src does not exist"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ x"$dst" = x ]
|
||||
then
|
||||
echo "install: no destination specified"
|
||||
exit 1
|
||||
else
|
||||
true
|
||||
fi
|
||||
|
||||
# If destination is a directory, append the input filename; if your system
|
||||
# does not like double slashes in filenames, you may need to add some logic
|
||||
|
||||
if [ -d $dst ]
|
||||
then
|
||||
dst="$dst"/`basename $src`
|
||||
else
|
||||
true
|
||||
fi
|
||||
fi
|
||||
|
||||
## this sed command emulates the dirname command
|
||||
dstdir=`echo $dst | sed -e 's,[^/]*$,,;s,/$,,;s,^$,.,'`
|
||||
|
||||
# Make sure that the destination directory exists.
|
||||
# this part is taken from Noah Friedman's mkinstalldirs script
|
||||
|
||||
# Skip lots of stat calls in the usual case.
|
||||
if [ ! -d "$dstdir" ]; then
|
||||
defaultIFS='
|
||||
'
|
||||
IFS="${IFS-${defaultIFS}}"
|
||||
|
||||
oIFS="${IFS}"
|
||||
# Some sh's can't handle IFS=/ for some reason.
|
||||
IFS='%'
|
||||
set - `echo ${dstdir} | sed -e 's@/@%@g' -e 's@^%@/@'`
|
||||
IFS="${oIFS}"
|
||||
|
||||
pathcomp=''
|
||||
|
||||
while [ $# -ne 0 ] ; do
|
||||
pathcomp="${pathcomp}${1}"
|
||||
shift
|
||||
|
||||
if [ ! -d "${pathcomp}" ] ;
|
||||
then
|
||||
$mkdirprog "${pathcomp}"
|
||||
else
|
||||
true
|
||||
fi
|
||||
|
||||
pathcomp="${pathcomp}/"
|
||||
done
|
||||
fi
|
||||
|
||||
if [ x"$dir_arg" != x ]
|
||||
then
|
||||
$doit $instcmd $dst &&
|
||||
|
||||
if [ x"$chowncmd" != x ]; then $doit $chowncmd $dst; else true ; fi &&
|
||||
if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dst; else true ; fi &&
|
||||
if [ x"$stripcmd" != x ]; then $doit $stripcmd $dst; else true ; fi &&
|
||||
if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dst; else true ; fi
|
||||
else
|
||||
|
||||
# If we're going to rename the final executable, determine the name now.
|
||||
|
||||
if [ x"$transformarg" = x ]
|
||||
then
|
||||
dstfile=`basename $dst`
|
||||
else
|
||||
dstfile=`basename $dst $transformbasename |
|
||||
sed $transformarg`$transformbasename
|
||||
fi
|
||||
|
||||
# don't allow the sed command to completely eliminate the filename
|
||||
|
||||
if [ x"$dstfile" = x ]
|
||||
then
|
||||
dstfile=`basename $dst`
|
||||
else
|
||||
true
|
||||
fi
|
||||
|
||||
# Make a temp file name in the proper directory.
|
||||
|
||||
dsttmp=$dstdir/#inst.$$#
|
||||
|
||||
# Move or copy the file name to the temp name
|
||||
|
||||
$doit $instcmd $src $dsttmp &&
|
||||
|
||||
trap "rm -f ${dsttmp}" 0 &&
|
||||
|
||||
# and set any options; do chmod last to preserve setuid bits
|
||||
|
||||
# If any of these fail, we abort the whole thing. If we want to
|
||||
# ignore errors from any of these, just make sure not to ignore
|
||||
# errors from the above "$doit $instcmd $src $dsttmp" command.
|
||||
|
||||
if [ x"$chowncmd" != x ]; then $doit $chowncmd $dsttmp; else true;fi &&
|
||||
if [ x"$chgrpcmd" != x ]; then $doit $chgrpcmd $dsttmp; else true;fi &&
|
||||
if [ x"$stripcmd" != x ]; then $doit $stripcmd $dsttmp; else true;fi &&
|
||||
if [ x"$chmodcmd" != x ]; then $doit $chmodcmd $dsttmp; else true;fi &&
|
||||
|
||||
# Now rename the file to the real destination.
|
||||
|
||||
$doit $rmcmd -f $dstdir/$dstfile &&
|
||||
$doit $mvcmd $dsttmp $dstdir/$dstfile
|
||||
|
||||
fi &&
|
||||
|
||||
|
||||
exit 0
|
20
libs/pcre/libpcre.def
Normal file
20
libs/pcre/libpcre.def
Normal file
@ -0,0 +1,20 @@
|
||||
LIBRARY libpcre
|
||||
EXPORTS
|
||||
pcre_malloc
|
||||
pcre_free
|
||||
pcre_config
|
||||
pcre_callout
|
||||
pcre_compile
|
||||
pcre_copy_substring
|
||||
pcre_dfa_exec
|
||||
pcre_exec
|
||||
pcre_get_substring
|
||||
pcre_get_stringnumber
|
||||
pcre_get_substring_list
|
||||
pcre_free_substring
|
||||
pcre_free_substring_list
|
||||
pcre_info
|
||||
pcre_fullinfo
|
||||
pcre_maketables
|
||||
pcre_study
|
||||
pcre_version
|
12
libs/pcre/libpcre.pc.in
Normal file
12
libs/pcre/libpcre.pc.in
Normal file
@ -0,0 +1,12 @@
|
||||
# Package Information for pkg-config
|
||||
|
||||
prefix=@prefix@
|
||||
exec_prefix=@exec_prefix@
|
||||
libdir=@libdir@
|
||||
includedir=@includedir@
|
||||
|
||||
Name: libpcre
|
||||
Description: PCRE - Perl compatible regular expressions C library
|
||||
Version: @PCRE_VERSION@
|
||||
Libs: -L${libdir} -lpcre
|
||||
Cflags: -I${includedir}
|
25
libs/pcre/libpcreposix.def
Normal file
25
libs/pcre/libpcreposix.def
Normal file
@ -0,0 +1,25 @@
|
||||
LIBRARY libpcreposix
|
||||
EXPORTS
|
||||
pcre_malloc
|
||||
pcre_free
|
||||
pcre_config
|
||||
pcre_callout
|
||||
pcre_compile
|
||||
pcre_copy_substring
|
||||
pcre_dfa_exec
|
||||
pcre_exec
|
||||
pcre_get_substring
|
||||
pcre_get_stringnumber
|
||||
pcre_get_substring_list
|
||||
pcre_free_substring
|
||||
pcre_free_substring_list
|
||||
pcre_info
|
||||
pcre_fullinfo
|
||||
pcre_maketables
|
||||
pcre_study
|
||||
pcre_version
|
||||
|
||||
regcomp
|
||||
regexec
|
||||
regerror
|
||||
regfree
|
6971
libs/pcre/ltmain.sh
Normal file
6971
libs/pcre/ltmain.sh
Normal file
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff Show More
Loading…
x
Reference in New Issue
Block a user