6 Automatically Extracted Comments
The following "documentation" was generated automatically, using a script that I believe is due to Mike Sperber. This documentation has not been read or formatted for scribble, and should be considered only as raw material for use in creating actual documentation.
6.1 ssax.ss
(make-xml-token KIND HEAD) → ??? KIND : KIND HEAD : HEAD
(xml-token? THING) → ??? THING : any/c
(xml-token-kind XML-TOKEN) → ??? XML-TOKEN : symbol?
(xml-token-head XML-TOKEN) → ??? XML-TOKEN : symbol?
(ssax:read-markup-token PORT) → ??? PORT : port?
Here’s a detailed break out of the return values and the position in the PORT when that particular value is returned: PI-token: only PI-target is read. To finish the Processing Instruction and disregard it, call ssax:skip-pi. ssax:read-attributes may be useful as well (for PIs whose content is attribute-value pairs) END-token: The end tag is read completely; the current position is right after the terminating #\> character. COMMENT is read and skipped completely. The current position is right after "–>" that terminates the comment. CDSECT The current position is right after "<!CDATA[" Use ssax:read-cdata-body to read the rest. DECL We have read the keyword (the one that follows "<!") identifying this declaration markup. The current position is after the keyword (usually a whitespace character)
START-token We have read the keyword (GI) of this start tag. No attributes are scanned yet. We don’t know if this tag has an empty content either. Use ssax:complete-start-tag to finish parsing of the token.
(ssax:read-pi-body-as-string PORT) → ??? PORT : port?
(ssax:skip-internal-dtd PORT) → ??? PORT : port?
(ssax:read-cdata-body PORT STR-HANDLER SEED) → ??? PORT : port? STR-HANDLER : procedure? SEED : SEED
The str-handler is a STR-HANDLER, a procedure STRING1 STRING2 SEED. The first STRING1 argument to STR-HANDLER never contains a newline. The second STRING2 argument often will. On the first invocation of the STR-HANDLER, the seed is the one passed to ssax:read-cdata-body as the third argument. The result of this first invocation will be passed as the seed argument to the second invocation of the line consumer, and so on. The result of the last invocation of the STR-HANDLER is returned by the ssax:read-cdata-body. Note a similarity to the fundamental ’fold’ iterator.
Within a CDATA section all characters are taken at their face value, with only three exceptions: CR, LF, and CRLF are treated as line delimiters, and passed as a single #\newline to the STR-HANDLER "]]>" combination is the end of the CDATA section. > is treated as an embedded #\> character Note, < and & are not specially recognized (and are not expanded)!
(ssax:read-char-ref PORT) → ??? PORT : port?
This procedure must be called after we we have read "&#" that introduces a char reference. The procedure reads this reference and returns the corresponding char The current position in PORT will be after ";" that terminates the char reference Faults detected: WFC: XML-Spec.html#wf-Legalchar
According to Section "4.1 Character and Entity References" of the XML Recommendation: "[Definition: A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices.]" Therefore, we use a ucscode->char function to convert a character code into the character – *regardless* of the current character encoding of the input stream.
(ssax:handle-parsed-entity PORT NAME ENTITIES) → ??? PORT : port? NAME : ??? ENTITIES : ???
Expand and handle a parsed-entity reference port - a PORT name - the name of the parsed entity to expand, a symbol entities - see ENTITIES content-handler – procedure PORT ENTITIES SEED that is supposed to return a SEED str-handler - a STR-HANDLER. It is called if the entity in question turns out to be a pre-declared entity
The result is the one returned by CONTENT-HANDLER or STR-HANDLER Faults detected: WFC: XML-Spec.html#wf-entdeclared WFC: XML-Spec.html#norecursion
(make-empty-attlist) → ???
(attlist-add ATTLIST NAME-VALUE-PAIR) → ??? ATTLIST : ??? NAME-VALUE-PAIR : ???
(attlist-null? ATTLIST) → ??? ATTLIST : ???
(attlist-remove-top ATTLIST) → ??? ATTLIST : ???
(attliast->alist) → ???
(attlist-fold) → ???
(ssax:read-attributes PORT ENTITIES) → ??? PORT : port? ENTITIES : ???
The procedure returns an ATTLIST, of Name (as UNRES-NAME), Value (as string) pairs. The current character on the PORT is a non-whitespace character that is not an ncname-starting character.
Note the following rules to keep in mind when reading an ’AttValue’ "Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize it as follows: - a character reference is processed by appending the referenced character to the attribute value - an entity reference is processed by recursively processing the replacement text of the entity [see ENTITIES] [named entities amp lt gt quot apos are assumed pre-declared] - a whitespace character (#x20, #xD, #xA, #x9) is processed by appending #x20 to the normalized value, except that only a single #x20 is appended for a "#xD#xA" sequence that is part of an external parsed entity or the literal entity value of an internal parsed entity - other characters are processed by appending them to the normalized value "
Faults detected: WFC: XML-Spec.html#CleanAttrVals WFC: XML-Spec.html#uniqattspec
(ssax:uri-string->symbol URI-STR) → ??? URI-STR : string?
(ssax:complete-start-tag TAG PORT ELEMS ENTITIES NAMESPACES) → ??? TAG : symbol? PORT : port? ELEMS : ??? ENTITIES : ??? NAMESPACES : ???
This procedure returns several values: ELEM-GI: a RES-NAME. ATTRIBUTES: element’s attributes, an ATTLIST of (RES-NAME . STRING) pairs. The list does NOT include xmlns attributes. NAMESPACES: the input list of namespaces amended with namespace (re-)declarations contained within the start-tag under parsing ELEM-CONTENT-MODEL
On exit, the current position in PORT will be the first character after #\> that terminates the start-tag markup.
Faults detected: VC: XML-Spec.html#enum VC: XML-Spec.html#RequiredAttr VC: XML-Spec.html#FixedAttr VC: XML-Spec.html#ValueType WFC: XML-Spec.html#uniqattspec (after namespaces prefixes are resolved) VC: XML-Spec.html#elementvalid WFC: REC-xml-names/#dt-NSName
Note, although XML Recommendation does not explicitly say it, xmlns and xmlns: attributes don’t have to be declared (although they can be declared, to specify their default value)
Procedure: ssax:complete-start-tag tag-head port elems entities namespaces
(ssax:read-external-id PORT) → ??? PORT : port?
[75] ExternalID ::= 'SYSTEM' S SystemLiteral |
| 'PUBLIC' S PubidLiteral S SystemLiteral |
[11] SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'") |
[12] PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" |
[13] PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] |
| [-'()+,./:=?;!*#@$_%] |
(ssax:read-char-data PORT EXPECT-EOF? STR-HANDLER SEED) → ??? PORT : port? EXPECT-EOF? : boolean? STR-HANDLER : procedure? SEED : ???
port a PORT to read expect-eof? a boolean indicating if EOF is normal, i.e., the character data may be terminated by the EOF. EOF is normal while processing a parsed entity. str-handler a STR-HANDLER seed an argument passed to the first invocation of STR-HANDLER.
The procedure returns two results: SEED and TOKEN. The SEED is the result of the last invocation of STR-HANDLER, or the original seed if STR-HANDLER was never called.
TOKEN can be either an eof-object (this can happen only if expect-eof? was #t), or: - an xml-token describing a START tag or an END-tag; For a start token, the caller has to finish reading it. - an xml-token describing the beginning of a PI. It’s up to an application to read or skip through the rest of this PI; - an xml-token describing a named entity reference.
CDATA sections and character references are expanded inline and never returned. Comments are silently disregarded.
As the XML Recommendation requires, all whitespace in character data must be preserved. However, a CR character (#xD) must be disregarded if it appears before a LF character (#xA), or replaced by a #xA character otherwise. See Secs. 2.10 and 2.11 of the XML Recommendation. See also the canonical XML Recommendation.
(ssax:assert-token TOKEN KIND GI) → ??? TOKEN : ??? KIND : ??? GI : ???
(ssax:make-pi-parser my-pi-handlers) → ??? my-pi-handlers : ???
my-pi-handlers An assoc list of pairs (PI-TAG . PI-HANDLER) where PI-TAG is an NCName symbol, the PI target, and PI-HANDLER is a procedure PORT PI-TAG SEED where PORT points to the first symbol after the PI target. The handler should read the rest of the PI up to and including the combination ’?>’ that terminates the PI. The handler should return a new seed. One of the PI-TAGs may be the symbol *DEFAULT*. The corresponding handler will handle PIs that no other handler will. If the *DEFAULT* PI-TAG is not specified, ssax:make-pi-parser will assume the default handler that skips the body of the PI
The output of the ssax:make-pi-parser is a procedure PORT PI-TAG SEED that will parse the current PI according to the user-specified handlers.
(define-macro ssax:make-pi-parser (lambda (my-pi-handlers) `(lambda (port target seed) (case target ,@(let loop ((pi-handlers my-pi-handlers) (default #f)) (cond ((null? pi-handlers) (if default `((else (,default port target seed))) '((else (ssax:warn port "Skipping PI: " target nl) (ssax:skip-pi port) seed)))) ((eq? '*DEFAULT* (caar pi-handlers)) (loop (cdr pi-handlers) (cdar pi-handlers))) (else (cons `((,(caar pi-handlers)) (,(cdar pi-handlers) port target seed)) (loop (cdr pi-handlers) default)))))))))
(ssax:make-elem-parser my-new-level-seed my-finish-element) → ??? my-new-level-seed : ??? my-finish-element : ???
Create a parser to parse and process one element, including its character content or children elements. The parser is typically applied to the root element of a document.
my-new-level-seed procedure ELEM-GI ATTRIBUTES NAMESPACES EXPECTED-CONTENT SEED where ELEM-GI is a RES-NAME of the element about to be processed. This procedure is to generate the seed to be passed to handlers that process the content of the element. This is the function identified as ’fdown’ in the denotational semantics of the XML parser given in the title comments to this file.
my-finish-element procedure ELEM-GI ATTRIBUTES NAMESPACES PARENT-SEED SEED This procedure is called when parsing of ELEM-GI is finished. The SEED is the result from the last content parser (or from my-new-level-seed if the element has the empty content). PARENT-SEED is the same seed as was passed to my-new-level-seed. The procedure is to generate a seed that will be the result of the element parser. This is the function identified as ’fup’ in the denotational semantics of the XML parser given in the title comments to this file.
my-char-data-handler A STR-HANDLER
my-pi-handlers See ssax:make-pi-handler above
The generated parser is a procedure START-TAG-HEAD PORT ELEMS ENTITIES NAMESPACES PRESERVE-WS? SEED The procedure must be called after the start tag token has been read. START-TAG-HEAD is an UNRES-NAME from the start-element tag. ELEMS is an instance of xml-decl::elems. See ssax:complete-start-tag::preserve-ws?
Faults detected: VC: XML-Spec.html#elementvalid WFC: XML-Spec.html#GIMatch
(ssax:make-parser user-handler-tag user-handler-proc ...) → ??? user-handler-tag : ??? user-handler-proc : ???
user-handler-tag is a symbol that identifies a procedural expression that follows the tag. Given below are tags and signatures of the corresponding procedures. Not all tags have to be specified. If some are omitted, reasonable defaults will apply.
tag: DOCTYPE handler-procedure: PORT DOCNAME SYSTEMID INTERNAL-SUBSET? SEED If internal-subset? is #t, the current position in the port is right after we have read #\[ that begins the internal DTD subset. We must finish reading of this subset before we return (or must call skip-internal-subset if we aren’t interested in reading it). The port at exit must be at the first symbol after the whole DOCTYPE declaration. The handler-procedure must generate four values: ELEMS ENTITIES NAMESPACES SEED See xml-decl::elems for ELEMS. It may be #f to switch off the validation. NAMESPACES will typically contain USER-PREFIXes for selected URI-SYMBs. The default handler-procedure skips the internal subset, if any, and returns (values #f ’() ’() seed)
tag: UNDECL-ROOT handler-procedure: ELEM-GI SEED where ELEM-GI is an UNRES-NAME of the root element. This procedure is called when an XML document under parsing contains _no_ DOCTYPE declaration. The handler-procedure, as a DOCTYPE handler procedure above, must generate four values: ELEMS ENTITIES NAMESPACES SEED The default handler-procedure returns (values #f ’() ’() seed)
tag: DECL-ROOT handler-procedure: ELEM-GI SEED where ELEM-GI is an UNRES-NAME of the root element. This procedure is called when an XML document under parsing does contains the DOCTYPE declaration. The handler-procedure must generate a new SEED (and verify that the name of the root element matches the doctype, if the handler so wishes). The default handler-procedure is the identity function.
tag: NEW-LEVEL-SEED handler-procedure: see ssax:make-elem-parser, my-new-level-seed
tag: FINISH-ELEMENT handler-procedure: see ssax:make-elem-parser, my-finish-element
tag: CHAR-DATA-HANDLER handler-procedure: see ssax:make-elem-parser, my-char-data-handler
tag: PI handler-procedure: see ssax:make-pi-parser The default value is ’()
(ssax:reverse-collect-str LIST-OF-FRAGS) → ??? LIST-OF-FRAGS : ???
6.2 input-parse.ss
(parser-error PORT MESSAGE SPECIALISING-MSG*) → ??? PORT : port? MESSAGE : ??? SPECIALISING-MSG* : ???
6.3 sxml-tree-trans.ss
The nodes that define a range don’t have to have the same immediate parent, don’t have to be on the same level, and the end node of a range doesn’t even have to exist. A replace-range procedure removes nodes from the beginning node of the range up to (but not including) the end node of the range. In addition, the beginning node of the range can be replaced by a node or a list of nodes. The range of nodes is cut while depth-first traversing the forest. If all branches of the node are cut a node is cut as well. The procedure can cut several non-overlapping ranges from a forest.
replace-range:: BEG-PRED x END-PRED x FOREST -> FOREST where type FOREST = (NODE ...) type NODE = Atom | (Name . FOREST) | FOREST
The range of nodes is specified by two predicates, beg-pred and end-pred. beg-pred:: NODE -> #f | FOREST end-pred:: NODE -> #f | FOREST The beg-pred predicate decides on the beginning of the range. The node for which the predicate yields non-#f marks the beginning of the range The non-#f value of the predicate replaces the node. The value can be a list of nodes. The replace-range procedure then traverses the tree and skips all the nodes, until the end-pred yields non-#f. The value of the end-pred replaces the end-range node. The new end node and its brothers will be re-scanned. The predicates are evaluated pre-order. We do not descend into a node that is marked as the beginning of the range.
6.4 sxml-to-html.ss
(string->goodHTML STRING) → ??? STRING : string?
6.5 sxml-to-html-ext.ss
The universal transformation from SXML to HTML. The following rules work for every HTML, present and future