Version: 4.2.2
4 Simple Text Parser
This module provides a simple text parser that can read strings
and turn them into data without first building lexems (although it
can be used to either lex or parse).
More complex or faster parsers may require the use of the intergrated
A parser is given a list of matcher procedures and associated action procedures.
A matcher is generally a regexp, the associated action turns
the matched text into something else.
On the input string, the parser recursively looks
for the matcher that matches the earliest
character and applies its action.
no-match-proc is applied to the portion of the string (before the first matched
character) that has not been matched.
The parser has an internal state, the "phase", where it is possible
to define local parsers that only work when the parser is in that phase.
Actions can make the parser switch to a given phase.
Automata transitions can then easily be defined.
Instead of switching to another phase, it is also possible to set the
parser into a "sub-parser" mode, and to provide the sub-parser with a callback
that will be applied only once the sub-parser has returned.
The fastest and easiest way to understand how it works is probably to
look at the examples in the "examples" directory.
Somes simple examples are also given at the end of this page.
Creates a new parser with default behavior no-match-proc, starting in phase phase.
All the outputs generated byt the parser are then appended with appender.
Adds the matcher in and its associated action out
to parser.
The matcher will match only when the parser is in a phase that
returns #t when applied to phase?.
If phase? is a procedure, it will be used as is to match the parser’s phase.
If phase? equals #t it will be changed to (λ args #t)
such that it matches any phase.
Any other value of phase will be turned into a procedure that matches
this value with equal?.
If in is a string it will be turned into a procedure that matches
the corresponding pregexp.
If in is a symbol, it will be turned into a procedure that matches
the corresponding pregexp with word boundaries on both sides, (useful
for matching names or programming languages keywords).
If in is a list, then add-item is called recursively on each member
of in with the same parser, phase? and out.
If in equals #t, it will modify the no-match-proc procedure
to add the corresponding action when phase? applies to the parser.
In the end, in has returns the same kind of values as regexp-match-positions.
out must be a procedure that accepts the same number of arguments as
the number of values returned by the matcher in.
For example, if in is "aa(b+)c(d+)e", then out must
take 3 arguments (one for the whole string, and two for the b’s and the d’s).
If out is not a procedure, it will be turned into a procedure that accepts
any number of arguments and returns out.
4.1 Matcher helpers
The parser
s into a pregexp and returns a procedure
that takes an input string and applies
regexp-match-positions on that string with the pregexp
Same as
re but regexp-quotes
s beforehand, so that the string
is matched exactly.
Same as
txt but adds word-boundaries around
(add-items parser [phase? [search-proc output-proc] ...] ...) |
4.2 Actions
Sets the parser in the phase phase and returns "".
Sets the current parser in sub-parse mode and switches to
The result of the sub-parse is appended with
appender, which by default
is the same as the parser’s.
When the sub-parser has finished parsing
(it has returned with
callback is called with the result of the sub-parse and the result of
callback is added to the current parser result.
Sub-parsers can be called recursively, once in a sub-parsing mode
or in the callback.
Returns "".
By default, the parser agglomerates the return values
of the action procedures.
The function
cons-out can be used to add a value to the parser
without being a return value of an action.
Should be rarely useful.
Adds out to the current parser result and returns
from the current sub-parsing mode.
Parses text with parser, starting in phase phase, which is the current phase
by default.
It is thus possible to call the parser inside the parsing phase, i.e
once a portion of the text has been parsed, it can be given to the parser
itself in some phase to make further transformations.
This is not the same as sub-parsing because there is no callback.
Examples: |
YaïCaïDaï -glitch- CaïDaï -gloutch- | TaïPaïCHaï |
(tree: (root (node1 (leaf1 leaf2) leaf3) (node2 leaf4 (node3 leaf5) leaf6) leaf7)) |