#lang scribble/doc @(require scribble/manual scribble/extract) @(require (for-label "parser.rkt")) @(require (for-label racket/base)) @defmodule[squicky/parser] @title{Squicky: a scheme-based quick wiki parser} @author+email["Norman Gray"]{http://nxg.me.uk} This is a Racket-based parser for a wiki syntax based closely on @hyperlink["http://www.wikicreole.org/"]{WikiCreole}, as described below. @section{Usage} The dialect parsed here is the consensus WikiCreole syntax of @url{http://www.wikicreole.org/}. It handles all of the WikiCreole @hyperlink["http://www.wikicreole.org/wiki/Creole1.0TestCases"]{test cases}, except for one test of wiki-internal links (which is in any case somewhat underspecified). In particular, the supported syntax is @itemlist[ @item{@tt{//italics//}} @item{@tt{**bold**} : A line which begins with @tt{**}, with possible whitespace either side, is a (second-level) bulletted list if the line before it is a bulleted list, but is a paragraph starting with bold text otherwise.} @item{@tt{##monospaced text##} : A line which begins with @tt{##}, with possible whitespace either side, is a (second-level) enumerated list if the line before it is an enumerated list, but is a paragraph starting with monospace text otherwise. [This is not specified in the WikiCreole definition, but is clearly compatible with it].} @item{@tt{ * bulleted list} : (including sublists, the asterisk may or may not be indented)} @item{@tt{ # numbered list} : (including sublists)} @item{@tt{>quoted paragraph} : including multiple levels (this appears to be an extension of WikiCreole).} @item{@tt{[[link to wikipage]]}} @item{@tt{[[URL|description]]}} @item{@tt{{{image.png}}} or @tt{{{image.png|alt text}}} or @tt{{{image.png|att=value;att2=value; or more}}}. In the last case, the @tt{att} indicates any attribute on the HTML @tt{} element, such as @tt{class}; the @tt{att} must immediately follow the semicolon (so the last case parses as @tt{att2='value; or more'}); and if the @tt{att} is omitted, it defaults to @tt{alt}.} @item{@tt{== heading}} @item{@tt{=== subheading}} @item{@tt{==== subsubheading}} @item{@tt{line\\break}} @item{@tt{----} : (four dashes in a row, on a line by themselves) horizontal list} @item{@tt{~e}scaped character, and @tt{~http://url} which isn't linked} @item{@verbatim|{{{{in-line literal text}}}}|}] Blocks of verbatim text (which will typically be rendered to @tt{
} blocks), can be specified with: @verbatim{ {{{ preformatted text }}} } The opening @tt|{{{{}|, and its closing partner, must be on lines by themselves. The newline after the opening marker, and the newline before the closing one, are ignored. Tables look like this: @verbatim{ |=Heading Col 1 |=Heading Col 2 | |Cell 1.1 |Two lines\\in Cell 1.2 | |Cell 2.1 |Cell 2.2 | } To this I add syntax: @itemlist[ @item{@tt{::foo bar baz} : adds, or replaces, the keyword 'foo' with the string 'bar baz'.} @item{@tt{"quoted"} : corresponds to @tt{quoted} (note that's a double-quote character, not two single quotes).} @item{@tt{<>} : adds @tt{ content } to the output.} @item{The @tt{att=value} syntax for @tt{{{}}} is an extension.}] For an example, the following parses some input text, and writes it out as XML. @racketblock[ (require xml squicky/parser) (define (write-xml-to-port wiki-text output-port) (write-xml/content (xexpr->xml `(top (,@(map (lambda (k) (list k (lookup wiki-text k))) (lookup-keys wiki-text))) . ,(body wiki-text))) output-port) (newline output-port)) (write-xml-to-port (parse (current-input-port)) (current-output-port)) ] Suitable input text would be: @verbatim{ ::date 2010 December 12 == Here is a heading Here is some text, with a list comprising: * one * two. That's quite //astonishing!//. } @section{Reference} Parse an input source with the parse function. @(include-previously-extracted "squicky-extracts.rkt" #rx"^parse") @(include-previously-extracted "squicky-extracts.rkt" #rx"^wikitext?") You can retrieve the body of the parsed text as an xexpr. The various creole markup commands are transformed into an HTML-like xexpr, which can then be processed as desired. @(include-previously-extracted "squicky-extracts.rkt" #rx"^body") If there are any keywords in the input text (indicated by @tt{::keyword value}), then these can be retrieved by one of a family of lookup functions: @(include-previously-extracted "squicky-extracts.rkt" #rx"^lookup.*") @(include-previously-extracted "squicky-extracts.rkt" #rx"^set-metadata!")