Major Section: MISCELLANEOUS
Below we begin a detailed presentation of ACL2 arrays. ACL2's single-threaded objects (see stobj) provide a similar functionality that is generally more efficient but also more restrictive. Related topics:
:default
from the header of a 1- or 2-dimensional array
:dimensions
from the header of a 1- or 2-dimensional array
:maximum-length
from the header of an array
ACL2 provides relatively efficient 1- and 2-dimensional arrays. Arrays are awkward to provide efficiently in an applicative language because the programmer rightly expects to be able to ``modify'' an array object with the effect of changing the behavior of the element accessing function on that object. This, of course, does not make any sense in an applicative setting. The element accessing function is, after all, a function, and its behavior on a given object is immutable. To ``modify'' an array object in an applicative setting we must actually produce a new array object. Arranging for this to be done efficiently is a challenge to the implementors of the language. In addition, the programmer accustomed to the von Neumann view of arrays must learn how to use immutable applicative arrays efficiently.
In this note we explain 1-dimensional arrays. In particular, we explain briefly how to create, access, and ``modify'' them, how they are implemented, and how to program with them. 2-dimensional arrays are dealt with by analogy.
The Logical Description of ACL2 Arrays
An ACL2 1-dimensional array is an object that associates arbitrary
objects with certain integers, called ``indices.'' Every array has a
dimension, dim
, which is a positive integer. The indices of an
array are the consecutive integers from 0
through dim-1
. To obtain
the object associated with the index i
in an array a
, one uses
(aref1 name a i)
. Name
is a symbol that is irrelevant to the
semantics of aref1
but affects the speed with which it computes. We
will talk more about array ``names'' later. To produce a new array
object that is like a
but which associates val
with index i
, one
uses (aset1 name a i val)
.
An ACL2 1-dimensional array is actually an alist. There is no
special ACL2 function for creating arrays; they are generally built
with the standard list processing functions list
and cons
. However,
there is a special ACL2 function, called compress1
, for speeding up
access to the elements of such an alist. We discuss compress1
later.
One element of the alist must be the ``header'' of the array. The
header of a 1-dimensional array with dimension dim
is of the form:
(:HEADER :DIMENSIONS (dim) :MAXIMUM-LENGTH max :DEFAULT obj ; optional :NAME name ; optional :ORDER order ; optional values are < (the default), >, or :none ).
Obj
may be any object and is called the ``default value'' of the array.
Max
must be an integer greater than dim
. Name
must be a
symbol. The :
default
and :name
entries are optional; if
:
default
is omitted, the default value is nil
. The function
header
, when given a name and a 1- or 2-dimensional array, returns the
header of the array. The functions dimensions
,
maximum-length
, and default
are similar and return the
corresponding fields of the header of the array. The role of the
:
dimensions
field is obvious: it specifies the legal indices into
the array. The roles played by the :
maximum-length
and
:
default
fields are described below.
Aside from the header, the other elements of the alist must each be
of the form (i . val)
, where i
is an integer and 0 <= i < dim
, and
val
is an arbitrary object.
The :order
field of the header is ignored for 2-dimensional arrays. For
1-dimensional arrays, it specifies the order of keys (i
, above) when the
array is compressed with compress1
, as described below. An :order
of :none
specifies no reordering of the alist compress1
, and an
order of >
specifies reordering by compress1
so that keys are in
descending order. Otherwise, the alist is reordered by compress1
so
that keys are in ascending order.
(Aref1 name a i)
is guarded so that name
must be a symbol, a
must be
an array and i
must be an index into a
. The value of
(aref1 name a i)
is either (cdr (assoc i a))
or else is the
default value of a
, depending on whether there is a pair in a
whose car
is i
. Note that name
is irrelevant to the value of
an aref1
expression. You might :pe aref1
to see how simple
the definition is.
(Aset1 name a i val)
is guarded analogously to the aref1
expression.
The value of the aset1
expression is essentially
(cons (cons i val) a)
. Again, name
is irrelevant. Note
(aset1 name a i val)
is an array, a'
, with the property that
(aref1 name a' i)
is val
and, except for index i
, all other
indices into a'
produce the same value as in a
. Note also
that if a
is viewed as an alist (which it is) the pair
``binding'' i
to its old value is in a'
but ``covered up'' by
the new pair. Thus, the length of an array grows by one when
aset1
is done.
Because aset1
covers old values with new ones, an array produced by
a sequence of aset1
calls may have many irrelevant pairs in it. The
function compress1
can remove these irrelevant pairs. Thus,
(compress1 name a)
returns an array that is equivalent
(vis-a-vis aref1
) to a
but which may be shorter. For technical
reasons, the alist returned by compress1
may also list the pairs
in a different order than listed in a
.
To prevent arrays from growing excessively long due to repeated aset1
operations, aset1
actually calls compress1
on the new alist
whenever the length of the new alist exceeds the :
maximum-length
entry, max
, in the header of the array. See the definition of
aset1
(for example by using :
pe
). This is primarily just a
mechanism for freeing up cons
space consumed while doing aset1
operations. Note however that this compress1
call is replaced by a
hard error if the header specifies an :order
of :none
.
This completes the logical description of 1-dimensional arrays.
2-dimensional arrays are analogous. The :
dimensions
entry of the
header of a 2-dimensional array should be (dim1 dim2)
. A pair of
indices, i
and j
, is legal iff 0 <= i < dim1
and 0 <= j < dim2
.
The :
maximum-length
must be greater than dim1*dim2
. Aref2
, aset2
,
and compress2
are like their counterparts but take an additional
index
argument. Finally, the pairs in a 2-dimensional array are of
the form ((i . j) . val)
.
The Implementation of ACL2 Arrays
Very informally speaking, the function compress1
``creates'' an
ACL2 array that provides fast access, while the function aref1
``maintains'' fast access. We now describe this informal idea more
carefully.
Aref1
is essentially assoc
. If aref1
were implemented naively the
time taken to access an array element would be linear in the
dimension of the array and the number of ``assignments'' to it (the
number of aset1
calls done to create the array from the initial
alist). This is intolerable; arrays are ``supposed'' to provide
constant-time access and change.
The apparently irrelevant names associated with ACL2 arrays allow us to provide constant-time access and change when arrays are used in ``conventional'' ways. The implementation of arrays makes it clear what we mean by ``conventional.''
Recall that array names are symbols. Behind the scenes, ACL2 associates two objects with each ACL2 array name. The first object is called the ``semantic value'' of the name and is an alist. The second object is called the ``raw lisp array'' and is a Common Lisp array.
When (compress1 name alist)
builds a new alist, a'
, it sets the
semantic value of name
to that new alist. Furthermore, it creates a
Common Lisp array and writes into it all of the index/value pairs of
a'
, initializing unassigned indices with the default value. This
array becomes the raw lisp array of name
. Compress1
then returns
a'
, the semantic value, as its result, as required by the definition
of compress1
.
When (aref1 name a i)
is invoked, aref1
first determines whether the
semantic value of name
is a
(i.e., is eq
to the alist a
). If so,
aref1
can determine the i
th element of a
by invoking Common Lisp's
aref
function on the raw lisp array associated with name. Note that
no linear search of the alist a
is required; the operation is done
in constant time and involves retrieval of two global variables, an
eq
test and jump
, and a raw lisp array access. In fact, an ACL2
array access of this sort is about 5 times slower than a C array
access. On the other hand, if name
has no semantic value or if it
is different from a
, then aref1
determines the answer by linear
search of a
as suggested by the assoc-like
definition of aref1
.
Thus, aref1
always returns the axiomatically specified result. It
returns in constant time if the array being accessed is the current
semantic value of the name used. The ramifications of this are
discussed after we deal with aset1
.
When (aset1 name a i val)
is invoked, aset1
does two cons
es to
create the new array. Call that array a'
. It will be returned as
the answer. (In this discussion we ignore the case in which aset1
does a compress1
.) However, before returning, aset1
determines if
name
's semantic value is a
. If so, it makes the new semantic value
of name
be a'
and it smashes the raw lisp array of name
with val
at
index i
, before returning a'
as the result. Thus, after doing an
aset1
and obtaining a new semantic value a'
, all aref1
s on that new
array will be fast. Any aref1
s on the old semantic value, a
, will
be slow.
To understand the performance implications of this design, consider
the chronological sequence in which ACL2 (Common Lisp) evaluates
expressions: basically inner-most first, left-to-right,
call-by-value. An array use, such as (aref1 name a i)
, is ``fast''
(constant-time) if the alist supplied, a
, is the value returned by
the most recently executed compress1
or aset1
on the name supplied.
In the functional expression of ``conventional'' array processing,
all uses of an array are fast.
The :name
field of the header of an array is completely irrelevant.
Our convention is to store in that field the symbol we mean to use
as the name of the raw lisp array. But no ACL2 function inspects
:name
and its primary value is that it allows the user, by
inspecting the semantic value of the array -- the alist -- to recall
the name of the raw array that probably holds that value. We say
``probably'' since there is no enforcement that the alist was
compressed under the name in the header or that all aset
s used that
name. Such enforcement would be inefficient.
Some Programming Examples
In the following examples we will use ACL2 ``global variables'' to hold several arrays. See @, and see assign.
Let the state
global variable a
be the 1-dimensional compressed
array of dimension 5
constructed below.
ACL2 !>(assign a (compress1 'demo '((:header :dimensions (5) :maximum-length 15 :default uninitialized :name demo) (0 . zero))))Then
(aref1 'demo (@ a) 0)
is zero
and (aref1 'demo (@ a) 1)
is
uninitialized
.Now execute
ACL2 !>(assign b (aset1 'demo (@ a) 1 'one))Then
(aref1 'demo (@ b) 0)
is zero
and (aref1 'demo (@ b) 1)
is
one
.
All of the aref1
s done so far have been ``fast.''
Note that we now have two array objects, one in the global variable
a
and one in the global variable b
. B
was obtained by assigning to
a
. That assignment does not affect the alist a
because this is an
applicative language. Thus, (aref1 'demo (@ a) 1)
must still be
uninitialized
. And if you execute that expression in ACL2 you will
see that indeed it is. However, a rather ugly comment is printed,
namely that this array access is ``slow.'' The reason it is slow is
that the raw lisp array associated with the name demo
is the array
we are calling b
. To access the elements of a
, aref1
must now do a
linear search. Any reference to a
as an array is now
``unconventional;'' in a conventional language like Ada or Common
Lisp it would simply be impossible to refer to the value of the
array before the assignment that produced our b
.
Now let us define a function that counts how many times a given
object, x
, occurs in an array. For simplicity, we will pass in the
name and highest index of the array:
ACL2 !>(defun cnt (name a i x) (declare (xargs :guard (and (array1p name a) (integerp i) (>= i -1) (< i (car (dimensions name a)))) :mode :logic :measure (nfix (+ 1 i)))) (cond ((zp (1+ i)) 0) ; return 0 if i is at most -1 ((equal x (aref1 name a i)) (1+ (cnt name a (1- i) x))) (t (cnt name a (1- i) x))))To determine how many times
zero
appears in (@ b)
we can execute:
ACL2 !>(cnt 'demo (@ b) 4 'zero)The answer is
1
. How many times does uninitialized
appear in
(@ b)
?
ACL2 !>(cnt 'demo (@ b) 4 'uninitialized)The answer is
3
, because positions 2
, 3
and 4
of the array contain
that default value.
Now imagine that we want to assign 'two
to index 2
and then count
how many times the 2nd element of the array occurs in the array.
This specification is actually ambiguous. In assigning to b
we
produce a new array, which we might call c
. Do we mean to count the
occurrences in c
of the 2nd element of b
or the 2nd element of c
?
That is, do we count the occurrences of uninitialized
or the
occurrences of two
? If we mean the former the correct answer is 2
(positions 3
and 4
are uninitialized
in c
); if we mean the latter,
the correct answer is 1
(there is only one occurrence of two
in c
).
Below are ACL2 renderings of the two meanings, which we call
[former]
and [latter]
. (Warning: Our description of these
examples, and of an example [fast former]
that follows, assumes
that only one of these three examples is actually executed; for
example, they are not executed in sequence. See ``A Word of
Warning'' below for more about this issue.)
(cnt 'demo (aset1 'demo (@ b) 2 'two) 4 (aref1 'demo (@ b) 2)) ; [former]Note that in(let ((c (aset1 'demo (@ b) 2 'two))) ; [latter] (cnt 'demo c 4 (aref1 'demo c 2)))
[former]
we create c
in the second argument of the
call to cnt
(although we do not give it a name) and then refer to b
in the fourth argument. This is unconventional because the second
reference to b
in [former]
is no longer the semantic value of demo
.
While ACL2 computes the correct answer, namely 2
, the execution of
the aref1
expression in [former]
is done slowly.A conventional rendering with the same meaning is
(let ((x (aref1 'demo (@ b) 2))) ; [fast former] (cnt 'demo (aset1 'demo (@ b) 2 'two) 4 x))which fetches the 2nd element of
b
before creating c
by
assignment. It is important to understand that [former]
and
[fast former]
mean exactly the same thing: both count the number
of occurrences of uninitialized
in c
. Both are legal ACL2 and
both compute the same answer, 2
. Indeed, we can symbolically
transform [fast former]
into [former]
merely by substituting
the binding of x
for x
in the body of the let
. But [fast former]
can be evaluated faster than [former]
because all of the
references to demo
use the then-current semantic value of
demo
, which is b
in the first line and c
throughout the
execution of the cnt
in the second line. [Fast former]
is
the preferred form, both because of its execution speed and its
clarity. If you were writing in a conventional language you would
have to write something like [fast former]
because there is no
way to refer to the 2nd element of the old value of b
after
smashing b
unless it had been saved first.
We turn now to [latter]
. It is both clear and efficient. It
creates c
by assignment to b
and then it fetches the 2nd element of
c
, two
, and proceeds to count the number of occurrences in c
. The
answer is 1
. [Latter]
is a good example of typical ACL2 array
manipulation: after the assignment to b
that creates c
, c
is used
throughout.
It takes a while to get used to this because most of us have grown
accustomed to the peculiar semantics of arrays in conventional
languages. For example, in raw lisp we might have written something
like the following, treating b
as a ``global variable'':
(cnt 'demo (aset 'demo b 2 'two) 4 (aref 'demo b 2))which sort of resembles
[former]
but actually has the semantics of
[latter]
because the b
from which aref
fetches the 2nd element is
not the same b
used in the aset
! The array b
is destroyed by the
aset
and b
henceforth refers to the array produced by the aset
, as
written more clearly in [latter]
.
A Word of Warning: Users must exercise care when experimenting with
[former]
, [latter]
and [fast former]
. Suppose you have
just created b
with the assignment shown above,
ACL2 !>(assign b (aset1 'demo (@ a) 1 'one))If you then evaluate
[former]
in ACL2 it will complain that the
aref1
is slow and compute the answer, as discussed. Then suppose
you evaluate [latter]
in ACL2. From our discussion you might expect
it to execute fast -- i.e., issue no complaint. But in fact you
will find that it complains repeatedly. The problem is that the
evaluation of [former]
changed the semantic value of demo
so that it
is no longer b
. To try the experiment correctly you must make b
be
the semantic value of demo
again before the next example is
evaluated. One way to do that is to execute
ACL2 !>(assign b (compress1 'demo (@ b)))before each expression. Because of issues like this it is often hard to experiment with ACL2 arrays at the top-level. We find it easier to write functions that use arrays correctly and efficiently than to so use them interactively.
This last assignment also illustrates a very common use of
compress1
. While it was introduced as a means of removing
irrelevant pairs from an array built up by repeated assignments, it
is actually most useful as a way of insuring fast access to the
elements of an array.
Many array processing tasks can be divided into two parts. During
the first part the array is built. During the second part the array
is used extensively but not modified. If your programming task can
be so divided, it might be appropriate to construct the array
entirely with list processing, thereby saving the cost of
maintaining the semantic value of the name while few references are
being made. Once the alist has stabilized, it might be worthwhile
to treat it as an array by calling compress1
, thereby gaining
constant time access to it.
ACL2's theorem prover uses this technique in connection with its
implementation of the notion of whether a rune is disabled or not.
Associated with every rune is a unique integer index
, called its
``nume.'' When each rule is stored, the corresponding nume is
stored as a component of the rule. Theories are lists of runes and
membership in the ``current theory'' indicates that the
corresponding rule is enabled. But these lists are very long and
membership is a linear-time operation. So just before a proof
begins we map the list of runes in the current theory into an alist
that pairs the corresponding numes with t
. Then we compress this
alist into an array. Thus, given a rule we can obtain its nume
(because it is a component) and then determine in constant time
whether it is enabled. The array is never modified during the
proof, i.e., aset1
is never used in this example. From the logical
perspective this code looks quite odd: we have replaced a
linear-time membership test with an apparently linear-time assoc
after going to the trouble of mapping from a list of runes to an
alist of numes. But because the alist of numes is an array, the
``apparently linear-time assoc
'' is more apparent than real; the
operation is constant-time.