1 Examples
2 API
2.1 Schemas
schema/ c
2.2 Propositions
prop?
Proposition
2.3 Queries
query?
database-schema/ c
current-database-schema
query-schema
query-relation
query-union
query-difference
query-intersection
query-product
query-projection
query-selection
query-rename
query-rename*
query-natural-join
query-theta-join
query-semi-join
query-anti-join
query-division
query-left-outer-join
query-right-outer-join
query-outer-join
2.4 Tuples
tuple?
tuple-length
tuple-ref
Tuple
2.5 Relations
relation?
relation-schema
relation-tuples
Relation
2.6 Database
database/ c
database-insert
database-delete
with-database
execute-query
NULL
Database
3 Implementation Notes
Version: 4.2.0.4

Functional Relational Algebra

Jay McCarthy <jay at plt-scheme dot org>

 (require (planet jaymccarthy/fra:2:1))

This package provides a purely functional implementation of Relational Algebra in PLT Scheme.

1 Examples

In this example, we will build a relational database for a university grade book. First, we define a database with a few relations:

  (define GradebookDB
    (Database
     [Students
      [StudentId Name Course]
      (0 "Jonah" 'CS330)
      (1 "Aidan" 'CS142)
      (2 "Sarah" 'CS630)]
     [Projects
      [ProjectId Course Title]
      (0 'CS330 "Garbage Collectors")
      (1 'CS330 "Prolog")
      (2 'CS142 "UFO")
      (3 'CS630 "Theorem Prover")]
     [Grades
      [StudentId ProjectId Grade]
      [0 0 2/3]
      [1 2 99]
      [2 3 -inf.0]]))

At this point GradebookDB is bound to a value obeying database/c that contains three relations: 'Students, 'Projects, and 'Grades. The first S-expression after each relation identifier is the schema/c for that relation. Each S-expression after that is a Tuple that is in the relation. As you can see, any Scheme value can be included in a tuple.

Let’s do some queries!

  (require (planet dherman/set:4/set))
  (define (print-relation r)
    (for ([c (in-list (relation-schema r))])
      (printf "~a\t" c))
    (printf "~n")
    (for ([t (in-set (relation-tuples r))])
      (for ([i (in-range 0 (tuple-length t))])
        (printf "~a\t" (tuple-ref t i)))
      (printf "~n")))

  > (with-database GradebookDB
     (print-relation
      (execute-query
       (query-relation 'Students))))

  StudentId Name Course

  2 Sarah CS630

  1 Aidan CS142

  0 Jonah CS330

As you can see from this interaction, a relation is just a set of tuples, which are a vector-like abstraction. Now for some more interesting queries:

  (define (>30 n)
    (> 30 n))

  > (with-database GradebookDB
     (print-relation
      (execute-query
       (query-selection
        (Proposition (>30 Grade))
        (query-relation 'Grades)))))

  StudentId ProjectId Grade

  0 0 2/3

  2 3 -inf.0

Proposition can be any Scheme value that may be applied.

Suppose that we attempted to use that proposition on a relation that did not have 'Grade in its schema?

  > (with-database GradebookDB
     (query-selection
      (Proposition (>30 Grade))
      (query-relation 'Students)))

  query-selection: Proposition must refer to subset of

  query's attributes: >30((Grade))

As you can see, the error is detected before the query is ever run.

Now, let’s have some joins:

  > (with-database GradebookDB
     (print-relation
      (execute-query
       (query-rename
        'Title 'Project
        (query-projection
         '(Name Course Title Grade)
         (query-natural-join
          (query-relation 'Projects)
          (query-natural-join
           (query-relation 'Grades)
           (query-relation 'Students))))))))

  Name Course Project Grade

  Jonah CS330 Garbage Collectors 2/3

  Aidan CS142 UFO 99

  Sarah CS630 Theorem Prover -inf.0

Finally, some functional database modification:

  > (with-database GradebookDB
     (print-relation
      (execute-query (query-relation 'Students))))

  StudentId Name Course

  2 Sarah CS630

  1 Aidan CS142

  0 Jonah CS330

  > (define GradebookDB+1
      (database-insert
       GradebookDB 'Students
       (Tuple 3 "Omega" (lambda () ((lambda (x) (x x)) (lambda (x) (x x)))))))
  > (with-database GradebookDB+1
     (print-relation
      (execute-query (query-relation 'Students))))

  StudentId Name Course

  3 Omega #<procedure>

  2 Sarah CS630

  1 Aidan CS142

  0 Jonah CS330

  > (define GradebookDB+2
      (database-delete
       GradebookDB+1 'Students
       (Tuple 0 "Jonah" 'CS330)))
  > (with-database GradebookDB+2
     (print-relation
      (execute-query (query-relation 'Students))))

  StudentId Name Course

  1 Aidan CS142

  2 Sarah CS630

  3 Omega #<procedure>

2 API

This section documents the APIs of the package.

2.1 Schemas

Schemas are defined as lists of symbols.

schema/c : contract?

Equivalent to (listof symbol?).

2.2 Propositions

Propositions are used by query-selection to compute sub-relations.

(prop? v)  boolean?
  v : any/c

Returns #t if v is a proposition, #f otherwise.

(Proposition p)
 
p = (not p)
  | (and p p)
  | (or p p)
  | (proc attribute ...)
 

  proc : procedure?

  attribute : identifier?

Returns a proposition. The interpretation of not, and, and or is standard. When a procedure is included in a proposition, the values of the named attributes are extracted from the tuple and applied to the procedure value; if it returns #t, then the tuple is selected, otherwise it is rejected.

2.3 Queries

Queries are used by execute-query to run relational queries.

(query? v)  boolean?
  v : any/c

Returns #t if v is a query, #f otherwise.

database-schema/c : contract?

Equivalent to (-> symbol? schema/c).

current-database-schema : (parameter/c database-schema/c)

Used by query-schema to determine the schema of query-relation queries.

(query-schema q)  schema/c
  q : query?

Returns the schema of the relation q computes.

(query-relation rid)  query?
  rid : symbol?

Query of the relation rid.

(query-union q1 q2)  query?
  q1 : query?
  q2 : query?
(query-difference q1 q2)  query?
  q1 : query?
  q2 : query?
(query-intersection q1 q2)  query?
  q1 : query?
  q2 : query?
(query-product q1 q2)  query?
  q1 : query?
  q2 : query?
(query-projection s q)  query?
  s : schema/c
  q : query?
(query-selection p q)  query?
  p : prop?
  q : query?
(query-rename old-attr new-attr q)  query?
  old-attr : symbol?
  new-attr : symbol?
  q : query?
(query-rename* renaming q)  query?
  renaming : (hash/c symbol? symbol?)
  q : query?
(query-natural-join q1 q2 [equal?])  query?
  q1 : query?
  q2 : query?
  equal? : (any/c any/c . -> . boolean?) = equal?
(query-theta-join p q1 q2)  query?
  p : prop?
  q1 : query?
  q2 : query?
(query-semi-join q1 q2)  query?
  q1 : query?
  q2 : query?
(query-anti-join q1 q2)  query?
  q1 : query?
  q2 : query?
(query-division q1 q2)  query?
  q1 : query?
  q2 : query?
(query-left-outer-join q1 q2)  query?
  q1 : query?
  q2 : query?
(query-right-outer-join q1 q2)  query?
  q1 : query?
  q2 : query?
(query-outer-join q1 q2)  query?
  q1 : query?
  q2 : query?

These construct queries represent the basic operations of relational algebra.

query-rename* applies multiple renamings at once using the dictionary to map old names to new names.

query-natural-join takes an optional equal? argument to compare attribute values for equality.

2.4 Tuples

Tuples are the contents of relations.

(tuple? v)  boolean?
  v : any/c

Returns #t if v is a tuple, #f otherwise.

(tuple-length t)  exact-nonnegative-integer?
  t : tuple?

Returns the length of t.

(tuple-ref t i)  any/c
  t : tuple?
  i : exact-nonnegative-integer?

Returns the ith element of t.

(Tuple elem ...)
 
  elem : any/c

Returns a tuple that contains all the elems.

2.5 Relations

Relations are the contents of databases and the results of queries.

(relation? v)  boolean?
  v : any/c

Returns #t if v is a relation, #f otherwise.

(relation-schema r)  schema/c
  r : relation?

Returns r’s schema.

(relation-tuples r)  (set? tuple?)
  r : relation?

Returns the set of tuples comprising the relation r.

(Relation [attribute ...]
          (elem ...)
          ...)
 

  attribute : identifier?

  elem : any/c

Returns a relation with '(attribute ...) as its schema that contains each (Tuple elem ...) as its tuples.

2.6 Database

database/c : contract?

Equivalent to (hash/c symbol? relation? #:immutable #t).

(database-insert db rid t)  database/c
  db : database/c
  rid : symbol?
  t : tuple?

Returns a database that is identical to db, except t is in the relation rid.

(database-delete db rid t)  database/c
  db : database/c
  rid : symbol?
  t : tuple?

Returns a database that is identical to db, except t is not in the relation rid.

(with-database db e ...)
 
  db : database/c

Executes (begin e ...) with db as the current database.

(execute-query q)  relation?
  q : query?

Computes the relation specified by q using the current database (chosen by with-database).

NULL : any/c

The NULL value that is inserted by the evaluation of query-left-outer-join, query-right-outer-join, or query-outer-join.

(Database
 [relation
  [attribute ...]
  (elem ...)
  ...]
 ...)
 

  relation : identifier?

  attribute : identifier?

  elem : any/c

Returns a database with each 'relation specified as

  (Relation [attribute ...]
            (elem ...)
            ...)

3 Implementation Notes

The current implementation uses immutable hashes as relations, vectors as tuples (except that they can be efficient appended), and lists as schemas. Propositions are structures, but are compiled to procedures (with attribute identifiers resolved to tuple positions) prior to query execution.

execute-query uses a simple query optimizer. It has two passes: first it tries to push selection operations to the leaves of the query to reduce relation sizes prior to products, then it pulls selection operations towards the root (but not passed products) to reduce the number of iterations over all the elements of a tuple. During both passes, some simplifications are performed, such as merging adjacent renamings, projections, and identical (or trivial) selections. This optimization happens independent of any statistics about relation sizes, etc.