xml-match
syntax
xml-match
provides pattern matching of XML nodes. The pattern notation is based on that of Scheme's syntax-rules/syntax-case macro systems.
The grammar for the xml-match
syntax is given below:
match-form ::=(xml-match
input-expression clause+)
clause ::=[
node-pattern action-expression+]
|[
node-pattern(guard
expression*)
action-expression+]
node-pattern ::= literal-pattern | pat-var-or-cata | element-pattern | list-pattern literal-pattern ::= string | character | number |#t
|#f
attribute-pattern ::= attribute-keyword attr-val-pattern attr-val-pattern ::= literal-pattern | pat-var-or-cata |(
pat-var-or-cata default-value-expr)
element-pattern ::=(
tag-symbol attribute-pattern*)
|(
tag-symbol attribute-pattern* nodeset-pattern)
|(
tag-symbol attribute-pattern* nodeset-pattern?.
pat-var-or-cata)
list-pattern ::=(list
nodeset-pattern)
|(list
nodeset-pattern?.
pat-var-or-cata)
|(list)
nodeset-pattern ::= node-pattern | node-pattern...
| node-pattern nodeset-pattern | node-pattern...
nodeset-pattern pat-var-or-cata ::=(unquote
var-symbol)
|(unquote [
var-symbol*])
|(unquote [
cata-expression->
var-symbol*])
Within a list or element body pattern, ellipses may appear only once, but may be followed by zero or more node patterns.
Guard expressions cannot refer to the return values of catamorphisms.
Ellipses in the output expressions must appear only in an expression context; ellipses are not allowed in a syntactic form.
The sections below illustrate specific aspects of the xml-match
pattern matcher.
The example below illustrates the pattern matching of an XML element:
(xml-match (e i: 1 3 4 5) [(e i: ,d ,a ,b ,c) (list d a b c)] [,otherwise #f])
The element and attribute tags used in patterns are the element and attribute constructors created by define-element
and define-attribute
. Each clause in xml-match
contains two parts: a pattern and one or more expressions which are evaluated if the pattern is successfully match.
Pattern variables are must be "unquoted" in the pattern. The above expression binds d
to 1
, a
to 3
, b
to 4
, and c
to 5
.
A nodeset pattern is designated by a list in the pattern, beginning the identifier list
. The example below illustrates matching a nodeset.
(xml-match '("i" "j" "k" "l" "m") [(list ,a ,b ,c ,d ,e) (list (h4:p a) (h4:p b) (h4:p c) (h4:p d) (h4:p e))])
This example wraps each nodeset item in an HTML paragraph element.
Matching the 'rest' of a nodeset is achieved by using a ". rest)
" pattern at the end of an element or nodeset pattern.
This is illustrated in the example below:
(xml-match (e 3 (f 4 5 6) 7) [(e ,a (f . ,y) ,d) (list a y d)])
The above expression returns (3 (4 5 6) 7)
.
As in syntax-rules
, ellipses may be used to specify a repeated pattern. Note that the pattern "item ...
" specifies zero-or-more matches of the pattern "item
".
The use of ellipses in a pattern is illustrated in the code fragment below, where nested ellipses are used to match the children of repeated instances of an 'a' element, within an element 'd'.
(define x (d (a 1 2 3) (a 4 5) (a 6 7 8) (a 9 10))) (xml-match x [(d (a ,b ...) ...) (list (list b ...) ...)])
The above expression returns a value of ((1 2 3) (4 5) (6 7 8) (9 10))
.
It is possible to specify a default value for an attribute which is used if the attribute is not present in the element being matched. This is illustrated in the following example:
(xml-match (e 3 4 5) [(e z: (,d 1) ,a ,b ,c) (list d a b c)])
The value 1
is used when the attribute 'z' is absent from the element 'e'.
Guards may be added to a pattern clause via the guard
keyword. A guard expression may include zero or more expressions which are evaluated only if the pattern is matched. The body of the clause is only evaluated if the guard expressions evaluate to #t
.
The use of guard expressions is illustrated below:
(xml-match '(a 2 3) ((a ,n) (guard (number? n)) n) ((a ,m ,n) (guard (number? m) (number? n)) (+ m n)))
The example below illustrates the use of explicit recursion within an xml-match form. This example implements a simple calculator for the basic arithmetic operations, which are represented by the XML elements plus, minus, times, and div.
(define simple-eval (lambda (x) (xml-match x [,i (guard (integer? i)) i] [(plus ,x ,y) (+ (simple-eval x) (simple-eval y))] [(times ,x ,y) (* (simple-eval x) (simple-eval y))] [(minus ,x ,y) (- (simple-eval x) (simple-eval y))] [(div ,x ,y) (/ (simple-eval x) (simple-eval y))] [,otherwise (error "simple-eval: invalid expression" x)])))
Using the catamorphism feature of xml-match
, a more concise version of simple-eval
can be written. The pattern ,[x]
recusively invokes the pattern matcher on the value bound in this position.
(define simple-eval (lambda (x) (xml-match x [,i (guard (integer? i)) i] [(plus ,[x] ,[y]) (+ x y)] [(times ,[x] ,[y]) (* x y)] [(minus ,[x] ,[y]) (- x y)] [(div ,[x] ,[y]) (/ x y)] [,otherwise (error "simple-eval: invalid expression" x)])))
It is also possible to explicitly name the operator in the 'cata' position. Where ,[id*]
recurs to the top of the current xml-match
, ,[cata -> id*]
recurs to cata
. cata
must evaluate to a procedure which takes one argument, and returns as many values as there are identifiers following ->
.
Named catamorphism patterns allow processing to be split into multiple, mutually recursive procedures. This is illustrated in the example below: a transformation that formats a "TV Guide" into HTML.
(define (tv-guide->html g) (define (cast-list cl) (xml-match cl [(CastList (CastMember (Character (Name ,ch)) (Actor (Name ,a))) ...) (h4:div (h4:ul (h4:li ch ": " a) ...))])) (define (prog p) (xml-match p [(Program (Start ,start-time) (Duration ,dur) (Series ,series-title) (Description . ,desc)) (h4:div (h4:p start-time (h4:br) series-title (h4:br) desc))] [(Program (Start ,start-time) (Duration ,dur) (Series ,series-title) (Description . ,desc) ,[cast-list -> cl]) (h4:div (h4:p start-time (h4:br) series-title (h4:br) desc) cl)])) (xml-match g [(TVGuide start: ,start-date end: ,end-date (Channel (Name ,nm) ,[prog -> p] ...) ...) (h4:html (h4:head (h4:title "TV Guide")) (h4:body (h4:h1 "TV Guide") (h4:div (h4:h2 nm) p ...) ...))]))
Within the body of an xml-match
form, a slightly extended version of quasiquote
is provided, which allows the use of ellipses. This is illustrated in the example below.
(xml-match '(e 3 4 5 6 7) [(e ,i ... 6 7) `("start" ,(list 'wrap i) ... "end")] [,otherwise #f])
The general pattern is that `(something ,i ...)
is rewritten as `(something ,@i)
.
xml-match-let
and xml-match-let*
syntax
The xml-match-let
and xml-match-let*
forms generalize the let
and let*
forms of Scheme to allow an XML pattern in the binding position, rather than a simple variable.
For example, the expression below:
(xml-match-let ([(a ,i ,j) '(a 1 2)]) (+ i j))
binds the variables i
and j
to 1 and 2 in the XML value given.
The syntax for these forms is given below:
(xml-match-let ([pat expr] ...) expression0 expression ...) (xml-match-let* ([pat expr] ...) expression0 expression ...)