Version: 0.2
uri: Web Uniform Resource Identifiers (URI and URL) in Racket
License: LGPL 3 Web: http://www.neilvandyke.org/racket-uri/
1 Introduction
WARNING: This package is being actively developed. A future version is expected to introduce some major non-backward-compatible changes.
uri is a Racket code library for parsing, representing, and transforming Web Uniform Resource Identifiers (URI) , which includes Uniform Resource Locators (URL) and Uniform Resource Names (URN). It supports absolute and relative URIs and URI references. RFC2396 is the principal reference used for this implementation. Earlier versions were informed by other RFCs, including RFC2396 and RFC2732.
Goals of this package are correctness, efficiency, and power.
2 Escaping and Unescaping
Several procedures to support escaping and unescaping of URI component strings, as described in [RFC2396 sec. 2.4], are provided. Also provided are escaping and unescaping procedures that also support + as an encoding of a space character, as is used in some HTTP encodings of HTML forms.
These procedures have multiple variants, concerning mutability of the strings they yield, and following the naming convention:
foo-i
Always yields an immutable string (or a new string, if the Scheme implementation does not support immutable string).
foo/new-mutable
Always yields a new, mutable string.
foo/shared-ok
If the output is equal to the input, might yield the input string rather than yielding a copy of it.
Many applications will not call these procedures directly, since most of this library’s interface automatically escapes and unescapes strings as appropriate.
Yields a URI-escaped encoding of string str. If start and end are given, then they designate the substring of str to use. All characters are escaped, except alphanumerics, minus, underscore, period, and tilde. For example.
(uri-escape "a = b/c + d") ==> "a%20%3D%20b%2Fc%20%2B%20d"
Like uri-escape, except encodes space characters as "+" instead of "%20". This should generally only be used to mimic the encoding some Web browsers do of HTML form values. For example:
(uri-plusescape "a = b/c + d") ==> "a+%3D+b%2Fc+%2B+d"
Yields an URI-unescaped string from the encoding in string str. If start and end are given, then they designate the substring of str to use. For example:
(uri-unescape "a%20b+c%20d") ==> "a b+c d"
Like uri-unescape, but also decodes the plus (+) character as to space character. For example:
(uri-unplusescape "a%20b+c%20d") ==> "a b c d"
Yields a URI-escaped string of character chr. For example:
(char->uri-escaped-string #\/) ==> "%2F"
3 URI API
This section describes the “URI string” API, while the next section describes the “URI object,” (uri) API. All procedures in this section yield URIs using immutable strings, and accept URIs as strings (immutable or mutable) or as the opaque objects described in the next section.
3.1 Predicate
!!!
3.2 Converting Strings to URI Objects
!!!
!!!
!!! convenience
3.3 Writing URIs to Ports and Converting URIs to Strings
Displays uri to output port port. For example:
(display-uri "http://s/foo#bar" (current-output-port)) |
|- http://s/foo#bar
(display-uri/nofragment "http://s/foo#bar" (current-output-port))
-| |
|
http://s/foo |
Yields the full string representation of URI uri. Of course this is not needed when using only the string representation of URI, but using this procedure in libraries permits the uri to also be used. For example:
3.4 URI Schemes
URI schemes are currently represented as lowercase Racket symbols and associated data.
Some common URI scheme symbols, as a convenience for Racket code that must be portable to Racket implementations with case-insensitive readers. For example, in some Racket implementations:
Yields the URI scheme of uri, or #f if none can be determined. For example:
(uri-scheme "Http://www") ==> http
Registers integer portnum as the default port number for the server authority component of URI scheme sym.
(register-uri-scheme-hierarchical sym) → any/c |
sym : any/c |
Registers URI scheme sym as having a “hierarchical” form as described in [RFC2396 sec. 3].
3.5 URI Reference Fragment Identifiers
Yields the fragment identifier component of URI (or URI reference) uri as a string, or #f if there is no fragment. uri-fragment yields the fragment in unescaped form, and uri-fragment/escaped yields an escaped form in the unusual case that is desired. For example:
Yields uri without the fragment component. For example:
(uri-without-fragment "http://w/#bar") ==> "http://w/"
Yields a URI that is like uri except with the fragment fragment (or no fragment if fragment is #f). For example:
The uri-with-fragment/escaped variant can be used when the desired fragment string is already in uri-escaped form:
3.6 Hierarchical URIs
This and some of the following subsections concern “hierarchical” generic URI syntax as described in RFC2396 sec. 3.
Yields a Boolean value for whether or not the URI scheme of URI uri is known to have a “hierarchical” generic URI layout. For example:
3.7 Server-Based Naming Authorities
Several procedures extract the server authority values from URIs [RFC2396 sec. 3.2.2].
Yields three values for the server authority of URI uri: the userinfo as a string (or #f), the host as a string (or #f), and the effective port number as an integer (or #f). The effective port number of a server authority defaults to the default of the URI scheme unless overridden. For example (note the effective port number is 21, the default for the ftp scheme):
Yield the respective part of the server authority of uri. See the discussion of uri-server-userinfo+host+portnum.
3.8 Hierarchical Paths
A parsed hierarchical path [RFC2396 sec. 3] is represented in uri as a tuple of a list of path segments and an upcount. The list of path segments does not contain any “.” or “..” relative components, as those are removed during parsing. The upcount is either #f, meaning an absolute path, or an integer 0 or greater, meaning a relative path of that many levels “up.” A path segment without any parameters is represented as either a string or, if empty, #f. For example:
and:
A path segment with parameters is represented as a list, with the first element a string or #f for the path name, and the remaining elements strings for the parameters. For example:
(uri-path-segments "../../a/b;p1/c/d;p2;p3/;p4") |
==> ("a" ("b" "p1") "c" ("d" "p2" "p3") (#f "p4")) |
In the current version of uri, parsed paths are actually represented in reverse, which simplifies path resolution and permits list tails to be shared among potentially large numbers of long paths. For example (uripath is a concept of the “object URI” API):
(("x.html" . #0=("c" . #1=("b" "a"))) |
("y.html" "y" . #0#) |
("z.html" "z" . #1#)) |
Yields the path upcount and the segments of uri as two values. The segments list should be considered immutable, as it might be shared elsewhere. uri-path-upcount+segments/reverse yields the segments list in reverse order, and is the more efficient of the two procedures.
See the documentation for uri-path-upcount+segments.
Yield the components of a parsed URI segment. The values should be considered immutable. For example:
3.9 Attribute-Value Queries
This library provides support for parsing the URI query component [RFC2396 sec. 3.4], as attribute-value lists in the manner of http URI scheme queries. Parsed queries are represented as association lists, in which the car of each pair is the attribute name as a string, and the cdr is either the attribute value as a string or #t if no value given. All strings are uri-unescaped. For example:
(uri-query "?q=fiendish+scheme&case&x=&y=1%2B2") |
==> |
(("q" . "fiendish scheme") ("case" . #t) ("x" . "") ("y" . "1+2")) |
Yields the parsed attribute-value query of uri, or #f if no query. For example:
(uri-query "?x=42&y=1%2B2") ==> (("x" . "42") ("y" . "1+2"))
Yields the value of attribute attr in uri’s query, or #f if uri has no query component or no attr attribute. If the attribute appears multiple times in the query, the value of the first occurrence is used. For example:
(uri-query-value "?x=42&y=1%2B2" "y") ==> "1+2"
Yields the value of attribute attr in uriquery, or #f if there is no such attribute. If the attribute appears multiple times in the query, the value of the first occurrence is used.
3.10 Resolving Relative URI
This subsection concerns resolving relative URI.
Yields a Boolean value for whether or not URI uri is known by the library’s criteria to be absolute.
Yields a URI string that is URI uri possibly resolved with respect to URI base-uri, but not necessarily absolute. As an extension to [RFC2396] rules for resolution, base-uri may be a relative URI.
(resolved-uri "x.html" "http://w/a/b/c.html") |
==> "http://w/a/b/x.html" |
(resolved-uri "//www:80/" "http:") |
==> "http://www/" |
Yields a URI that may be a variation on uri that has been forced to absolute (by, e.g., dropping relative path components, or supplying a missing path). The result might not be an absolute URI, however, due to limitations of the library or insufficient information in the URI. For example:
4 URI Schemes
!!!
!!!
!!!
!!!
!!!
!!!
5 Hierarchical URIs
!!!
!!!
!!!
!!!
!!!
!!!
!!!
!!!
!!!
!!!
!!!
!!!
!!!
5.1 Hierarchical Paths
!!!
!!!
!!!
!!!
Note: Contrary to [RFC2396], we don’t require base to be absolute.
!!!
!!!
!!!
!!!
!!!
!!!
!!!
5.2 Attribute-Value Queries
!!!
!!!
!!!
6 Antiresolution (In-Progress)
!!!
!!!
7 History
Version 0.2 — 2011-08-23 — PLaneT (1 0)
This is a release of some code-in-progress that has been sitting around unreleased for years. It has been changed heavily since the 2004, non-PLaneT release, including getting rid of the "uriobj"-specific operations, so that all operations work on both string and object forms. A few tests fail. Non-backward-compatible API changes are expected.
Version 0.1 — 2004-08-18
Initial release. Incorporates some code from UriFrame.
8 Legal
Copyright (c) 2003–2011 Neil Van Dyke. This program is Free Software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3 of the License (LGPL 3), or (at your option) any later version. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See http://www.gnu.org/licenses/ for details. For other licenses and consulting, please contact the author.
Standard Documentation Format Note: The API
signatures in this documentation are likely incorrect in some regards, such as
indicating type any/c for things that are not, and not indicating when
arguments are optional. This is due to a transitioning from the Texinfo
documentation format to Scribble, which the author intends to finish
someday.