Version 0.4, 2005-06-07, http://www.neilvandyke.org/csv-scm/
by
Neil W. Van Dyke
<neil@neilvandyke.org
>
Copyright © 2004 - 2005 Neil W. Van Dyke. This program is Free
Software; you can redistribute it and/or modify it under the terms of the
GNU Lesser General Public License as published by the Free Software
Foundation; either version 2.1 of the License, or (at your option) any
later version. This program is distributed in the hope that it will be
useful, but without any warranty; without even the implied warranty of
merchantability or fitness for a particular purpose. See
<http://www.gnu.org/copyleft/lesser.html
> for details. For
other license options and consulting, contact the author.
The csv.scm
Scheme library provides utilities for reading various
kinds of what are commonly known as “comma-separated value” (CSV) files.
Since there is no standard CSV format1, this library
permits CSV readers to be constructed from a specification of the
peculiarities of a given variant. A default reader handles the majority of
formats.
One of the main uses of this library is to import data from old crusty legacy applications into Scheme for data conversion and other processing. To that end, this library includes various conveniences for iterating over parsed CSV rows, and for converting CSV input to the SXML 3.0 Scheme XML format.
This library requires R5RS, SRFI-6, SRFI-23, and an integer->char
procedure that accepts ASCII values.
Other implementations of some kind of CSV reading for Scheme include
Gauche's text.csv
module, and the Scsh's record-reader
and
related procedures. This library intends to be portable and more
comprehensive.
CSV readers are constructed using reader specs, which are sets of attribute-value pairs, represented in Scheme as association lists keyed on symbols. Each attribute has a default value if not specified otherwise. The attributes are:
newline-type
lf
, crlf
, or
cr
, corresponding to combinations of line-feed and carriage-return),
any string of one or more line-feed and carriage-return characters
(lax
), or adaptive (adapt
). adapt
attempts to detect
the newline convention at the start of the input and assume that convention
for the remainder of the input. Default: lax
separator-chars
(#\,)
(list of the comma
character)
quote-char
#f
if fields cannot be quoted. Note that there can be only one
quote character. Default: #\"
(double-quote)
quote-doubling-escapes?
quote-char
quote
characters within a quoted field constitute an escape sequence for
including a single quote-char
within the string. Default: #t
comment-chars
()
(null list)
whitespace-chars
strip-leading-whitespace?
and
strip-trailing-whitespace?
attributes described below.
Default: (#\space)
(list of the space character)
strip-leading-whitespace?
#f
strip-trailing-whitespace?
#f
newlines-in-quotes?
#t
CSV readers are procedures that are constructed dynamically to close over a
particular CSV input and yield a parsed row value each time the procedure
is applied. For efficiency reasons, the reader procedures are themselves
constructed by another procedure, make-csv-reader-maker
, for
particular CSV reader specs.
Constructs a CSV reader constructor procedure from the reader-spec, with unspecified attributes having their default values.
For example, given the input file
fruits.csv
with the content:apples | 2 | 0.42 bananas | 20 | 13.69a reader for the file's apparent format can be constructed like:
(define make-food-csv-reader (make-csv-reader-maker '((separator-chars . (#\|)) (strip-leading-whitespace? . #t) (strip-trailing-whitespace? . #t))))The resulting
make-food-csv-reader
procedure accepts one argument, which is either an input port from which to read, or a string from which to read. Our example input file then can be be read by opening an input port on a file and using our new procedure to construct a reader on it:(define next-row (make-food-csv-reader (open-input-file "fruits.csv")))This reader,
next-row
, can then be called repeatedly to yield a parsed representation of each subsequent row. The parsed format is a list of strings, one string for each column. The null list is yielded to indicate that all rows have already been yielded.(next-row) => ("apples" "2" "0.42") (next-row) => ("bananas" "20" "13.69") (next-row) => ()
In addition to being constructed from the result of
make-csv-reader-maker
, CSV readers can also be constructed using
make-csv-reader
.
Construct a CSV reader on the input in, which is an input port or a string. If reader-spec is given, and is not the null list, then a “one-shot” reader constructor is constructed with that spec and used. If reader-spec is not given, or is the null list, then the default CSV reader constructor is used. For example, the reader from the
make-csv-reader-maker
example could alternatively have been constructed like:(define next-row (make-csv-reader (open-input-file "fruits.csv") '((separator-chars . (#\|)) (strip-leading-whitespace? . #t) (strip-trailing-whitespace? . #t)))))
Several convenience procedures are provided for iterating over the CSV rows and for converting the CSV into a list. To the dismay of some Scheme purists, each of these procedures accepts a reader-or-in argument, which can be a CSV reader, an input port, or a string. If not a CSV reader, then the default reader constructor is used. For example, all three of the following are equivalent:
(csv->list string ) == (csv->list (make-csv-reader string )) == (csv->list (make-csv-reader (open-input-string string)))
Similar to Scheme's
for-each
, applies proc, a procedure of one argument, to each parsed CSV row in series. reader-or-in is the CSV reader, input port, or string. The return
Similar to Scheme's
map
, applies proc, a procedure of one argument, to each parsed CSV row in series, and yields a list of the values of each application of proc, in order. reader-or-in is the CSV reader, input port, or string.
Yields a list of CSV row lists from input reader-or-in, which is a CSv reader, input port, or string.
The csv->sxml
procedure can be used to convert CSV to [SXML] format,
for processing with various XML tools.
Reads CSV from input reader-or-in (which is a CSV reader, input port, or string), and yields an SXML representation. If given, row-element is a symbol for the XML row element. If row-element is not given, the default is the symbol
row
. If given col-elements is a list of symbols for the XML column elements. If not given, or there are more columns in a row than given symbols, column element symbols are of the formatcol-
n, where n is the column number (the first column being number 0, not 1).For example, given a CSV-format file
friends.csv
that has the contents:Binoche,Ste. Brune,33-1-2-3 Posey,Main St.,555-5309 Ryder,Cellblock 9,with elements not given, the result is:
(csv->sxml (open-input-file "friends.csv")) => (*TOP* (row (col-0 "Binoche") (col-1 "Ste. Brune") (col-2 "33-1-2-3")) (row (col-0 "Posey") (col-1 "Main St.") (col-2 "555-5309")) (row (col-0 "Ryder") (col-1 "Cellblock 9") (col-2 "")))With elements given, the result is like:
(csv->sxml (open-input-file "friends.csv") 'friend '(name address phone)) => (*TOP* (friend (name "Binoche") (address "Ste. Brune") (phone "33-1-2-3")) (friend (name "Posey") (address "Main St.") (phone "555-5309")) (friend (name "Ryder") (address "Cellblock 9") (phone "")))
The csv.scm
test suite can be enabled by editing the source code
file and loading Testeez.
case
-related bug exhibited under Gauche 0.8 and
0.7.4.2 in csv-internal:make-portreader/positional
. Thanks to
Grzegorz Chrupa/la for reporting.
[1] “The Comma Separated Value (CSV) File Format: Create or parse data in this popular pseudo-standard format,” Web page, viewed 2004-05-26, http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm