Data Collection
The purpose of most simulation models is to collect data to analyze to gain insights into the system being simulated. The PLT Scheme Simulation Collection, (numeric) data subject to automatic data collection is stored in variable structures (i.e. variables).
Data about a variable may either be collected in a time dependent manner, specified using the accumulate
macro, or in time independent manner, specified using the tally
macro.
Currently, either statistics data or history data may be automatically collected for a variable. (Both may in turn be either time dependent or time independent.) History data allows more sophisticated analysis to be performed on the data using other analysis tools. Also, a function to plot history data is provided.
8.1 Variables
A variable represents a numeric variable in the model for which data can automatically be collected, as specified by the model builder.
8.1.1 The variable Structure
Structure:
variable |
Contract: (struct variable ((initial-value (union/c (symbols uninitialized) real?)) (value (union/c (symbols uninitialized) real?)) (time-last-synchronized real?) (statistics (union/c statistics? #f)) (history (union/c history? #f)) (continuous? boolean?) (state-index (integer-in -1 +inf.0)) (get-monitors list?) (set-monitors list?))) |
|
variable
structure represent variables in the simulation model. The variable
structure has the following fields.
Function:
(make-variable initial-value) (make-variable) |
Contract: (case-> (-> real? variable?) (-> variable?)) |
|
'uninitialized
is used.
By default, all variables accumulate statistics on their values. To turn this off, set the statistics
field to #f.
To create continuous variables, see Chapter 10 Continuous Simulation Models.
8.2 Tally and Accumulate
The tally
and accumulate
macros specify data collection for variables.
8.2.1 Tally
tally
Macro:
(tally (variable-statistics variable)) (tally (variable-history variable)) |
|
variable-statistics
specifies that statistics are to be tallied for variable. variable-history
specifies that a history is to be tallied for variable.Each time a variable value is changed, any tallied data collectors are updated with the new value.
8.2.2 Accumulate
accumulate
Macro:
(accumulate (variable-statistics variable)) (accumulate (variable-history variable)) |
|
variable-statistics
specifies that statistics are to be accumulated for variable. variable-history
specifies that a history is to be accumulated for variable.Each time a variable data collector is accessed or before a variable value is changed, any accumulated data collectors are synchronized with the current value over the time since it was last synchronized.
8.3 Statistics and Histories
8.3.1 Statistics
Structure:
statistics |
Contract: (struct statistics ((time-dependent? boolean) (minimum real?) (maximum real?) (n real?) (sum real?) (sum-of-squares real?))) |
|
statistics
structure maintains statistics for a variable. Table 1 shows the statistics that are gathered and how they are computed for both tally
and accumulate
.
Table 1 shows the statistics collected and how they are computed for both tallied and accumulated data collectors.
|
timeC = current simulation time
timeL = simulation time variable was set to its current value
time0 = simulation time the variable was created
X = variable value before change occurs
8.3.2 History
Structure:
history |
Contract: (struct history ((time-dependent? boolean) (initial-time real?) (n real?) (values list?) (last-value-cell (union/c pair? #f)) (durations list?) (last-duration-cell (union/c pair? #f)))) |
|
history
structure maintains a history of the values of a variable. For accumulated histories (i.e. those specified using the accumulate
macro), the durations for each value are also computed.
8.3.2.1 History Graphics
Function:
(history-plot history title) (history-plot history) |
Contract: (case-> (-> history? string? any) (-> history? any)) |
|
"History"
is used if title is not specified.
8.3.3 Example - Tally and Accumulate Example
This example shows how the tally
and accumulate
macros work. Two variables are created, tallied and accumulated. Statistics and history data are collected for each - using tally
for the variable tallied and accumulate
for the variable accumulated. The process test-process
iterates through a list of values and durations, setting each of the variables to the specified value for the specified duration of time. Representative statistics (n
, sum
, and mean
) are printed and the histories plotted for each of the variables.
;; Test Tally and Accumulate (require (planet "simulation-with-graphics.ss" ("williams" "simulation.plt"))) (define tallied #f) (define accumulated #f) (define-process (test-process value-duration-list) (let loop ((vdl value-duration-list)) (when (not (null? vdl)) (let ((value (caar vdl)) (duration (cadar vdl))) (set-variable-value! tallied value) (set-variable-value! accumulated value) (wait duration) (loop (cdr vdl)))))) (define (main value-duration-list) (with-new-simulation-environment (set! tallied (make-variable)) (tally (variable-statistics tallied)) (tally (variable-history tallied)) (set! accumulated (make-variable)) (accumulate (variable-statistics accumulated)) (accumulate (variable-history accumulated)) (schedule (at 0.0) (test-process value-duration-list)) (start-simulation) (printf "--- Test Tally and Accumulate ---~n") (printf "~n--- Tally ---~n") (printf "N = ~a~n" (variable-n tallied)) (printf "Sum = ~a~n" (variable-sum tallied)) (printf "Mean = ~a~n" (variable-mean tallied)) (printf "~a~n" (history-plot (variable-history tallied))) (printf "~n--- Accumulate ---~n") (printf "N = ~a~n" (variable-n accumulated)) (printf "Sum = ~a~n" (variable-sum accumulated)) (printf "Mean = ~a~n" (variable-mean accumulated)) (printf "~a~n" (history-plot (variable-history accumulated)))))
Here are the results of running the program for the following value, duration pairs: ((1 2)(2 1)(3 2)(4 3)). That is, each variable will have a value of 1 for 2 units of time (from time 0 to time 2), a value of 2 for 1 unit of time (from time 2 to time 3), a value of 3 for 2 units of time (from time 3 to time 5), and a value of 4 for 3 units of time (from time 5 to time 8). The simulation ends at time 8.
>(main '((1 2)(2 1)(3 2)(4 3))) --- Test Tally and Accumulate --- --- Tally --- N = 4 Sum = 10.0 Mean = 2.5
--- Accumulate --- N = 8.0 Sum = 22.0 Mean = 2.75
>
8.3.4 Variable Monitors
Variable monitors are discussed in Chapter ?? Monitors.
8.4 Example - Data Collection
The previous examples (Examples 0, 1, and 2) relied on printf
statements to print the output of the simulation model. This was sufficient to show how the models worked, but would be impractical for large models. This example is the same simulation model as Example 2 (using the with-resource
instead of the individual calls to resource-request
and resource-relinquish
), but with the printf
statements removed.
No explicit variables are needed for this example since resources already provide variables for their satisfied
and queue
fields - since they are in turn implemented using sets.
Note that the statement:
(accumulate (variable-statistics (resource-queue-variable-n attendant)))
isn't actually needed since statistics are accumulated for any variable by default. It is included as an example. Note that the corresponding accumulate
is not included for the satisfied
field and the statistics are still available.
; Example 3 - Data Collection (require (planet "simulation-with-graphics.ss" ("williams" "simulation.plt"))) (require (planet "random-distributions.ss" ("williams" "science.plt"))) (define n-attendants 2) (define attendant #f) (define-process (generator n) (do ((i 0 (+ i 1))) ((= i n) (void)) (wait (random-exponential 4.0)) (schedule now (customer i)))) (define-process (customer i) (with-resource (attendant) (work (random-flat 2.0 10.0)))) (define (run-simulation n) (with-new-simulation-environment (set! attendant (make-resource n-attendants)) (schedule (at 0.0) (generator n)) (accumulate (variable-statistics (resource-queue-variable-n attendant))) (accumulate (variable-history (resource-queue-variable-n attendant))) (start-simulation) (printf "--- Example 3 - Data Collection ---~n") (printf "Maximum queue length = ~a~n" (variable-maximum (resource-queue-variable-n attendant))) (printf "Average queue length = ~a~n" (variable-mean (resource-queue-variable-n attendant))) (printf "Variance = ~a~n" (variable-variance (resource-queue-variable-n attendant))) (printf "Utilization = ~a~n" (variable-mean (resource-satisfied-variable-n attendant))) (printf "Variance = ~a~n" (variable-variance (resource-satisfied-variable-n attendant))) (print (history-plot (variable-history (resource-queue-variable-n attendant))))))
Here is the output for the example when run for 1000 customers.
>(run-simulation 1000) --- Example 3 - Data Collection --- Maximum queue length = 8 Average queue length = 0.9120534884951139 Variance = 2.2420855874934826 Utilization = 1.4320511974417858 Variance = 0.5885107114317054
>
8.5 Data Collection Across Multiple Simulation Runs
Even as simplistic as our example has been, it is still useful in illustrating some advanced data collection techniques. In particular, we will show how to collect statistics across multiple runs.
8.5.1 Open Loop Processing
Open Loop processing is a technique where a resource is considered to have an infinite number of units. That is, no process will ever block waiting for such a resource. Statistics on the demand for such resources can be collected by looking at the resource-satisfied-variable-n
variable. Typically, this is done across multiple simulation runs.
In the simulation collection we denote an open-loop resource by specifying an infinite number of units when it is created. In PLT Scheme, +inf.0
denoted (positive infinity).
8.5.1.1 Example - Open Loop Processing
This example collects statistics on the maximum number of attendants required in the system (e.g. a measure of demand) when there is no blocking.
There is an outer simulation environment that exists solely for data collection and a variable max-attendants
to gather statistics on the maximum number of attendants required. Note that these statistics must be tallied at this level because (simulated) time does not exist across multiple simulation runs.
The inner loop creates a new simulation environment for each simulation run. This ensures each run is properly initialized. It is in this inner loop that the attendant resource is create with an infinite number of units - (make-resource +inf.0)
. When the simulation in the inner loop terminates, the max-attendants
variable is updated with the maximum number of attendants from the simulation. This is done with:
(set-variable-value! max-attendants (variable-maximum (resource-satisfied-variable-n attendant)))
Finally, the statistics and histogram of the maximum attendants across all of the simulation runs is printed.
; Open Loop Example (require (planet "simulation-with-graphics.ss" ("williams" "simulation.plt"))) (require (planet "random-distributions.ss" ("williams" "science.plt"))) (define attendant #f) (define (generator n) (do ((i 0 (+ i 1))) ((= i n) (void)) (wait (random-exponential 4.0)) (schedule now (customer i)))) (define-process (customer i) (with-resource (attendant) (wait/work (random-flat 2.0 10.0)))) (define (run-simulation n1 n2) (with-new-simulation-environment (let ((max-attendants (make-variable))) (tally (variable-statistics max-attendants)) (tally (variable-history max-attendants)) (do ((i 1 (+ i 1))) ((> i n1) (void)) (with-new-simulation-environment (set! attendant (make-resource +inf.0)) (schedule (at 0.0) (generator n2)) (start-simulation) (set-variable-value! max-attendants (variable-maximum (resource-satisfied-variable-n attendant))))) (printf "--- Open Loop Example ---~n") (printf "Number of experiments = ~a~n" (variable-n max-attendants)) (printf "Minimum maximum attendants = ~a~n" (variable-minimum max-attendants)) (printf "Maximum maximum attendants = ~a~n" (variable-maximum max-attendants)) (printf "Mean maximum attendants = ~a~n" (variable-mean max-attendants)) (printf "Variance = ~a~n" (variable-variance max-attendants)) (print (history-plot (variable-history max-attendants) "Maximum Attendants")) (newline))))
The following shows the output of the simulation for 1000 run of 1000 customers each.
>(run-simulation 1000 1000) --- Open Loop Example --- Number of experiments = 1000 Minimum maximum attendants = 6 Maximum maximum attendants = 11 Mean maximum attendants = 7.525 Variance = 0.6653749999999903
>
8.5.2 Closed Loop Processing
Closed Loop processing is the "normal" processing where the number of units of a resource is specified and processes are queued (i.e. blocked) when there are not sufficient units of the resource to satisfy a request. Statistics on the utilitization for such resources can be collected by looking at the resource-queue-variable-n
variable. Typically, this is done across multiple simulation runs.
8.5.2.1 Example - Closed Loop Processing
This example collects statistics on the average attendant queue length in the system (e.g. a measure of utilization) when there is a specified number of attendants.
There is an outer simulation environment that exists solely for data collection and a variable avg-queue-length
to gather statistics on the average attendant queue length. Note that these statistics must be tallied at this level because (simulated) time does not exist across multiple simulation runs.
The inner loop creates a new simulation environment for each simulation run. This ensures each run is properly initialized. It is in this inner loop that the attendant resource is create with the specified number of units - (make-resource n-attendants)
. When the simulation in the inner loop terminates, the avg-queue-length
variable is updated with the average attendant queue length the simulation. This is done with:
(set-variable-value! avg-queue-length (variable-mean (resource-queue-variable-n attendant)))
Finally, the statistics and histogram of the average attendant attendant queue length across all of the simulation runs is printed.
; Closed Loop Example (require (planet "simulation-with-graphics.ss" ("williams" "simulation.plt"))) (require (planet "random-distributions.ss" ("williams" "science.plt"))) (define n-attendants 2) (define attendant #f) (define-process (generator n) (do ((i 0 (+ i 1))) ((= i n) (void)) (wait (random-exponential 4.0)) (schedule now (customer i)))) (define-process (customer i) (with-resource (attendant) (work (random-flat 2.0 10.0)))) (define (run-simulation n1 n2) (let ((avg-queue-length (make-variable))) (tally (variable-statistics avg-queue-length)) (tally (variable-history avg-queue-length)) (do ((i 1 (+ i 1))) ((> i n1) (void)) (with-new-simulation-environment (set! attendant (make-resource n-attendants)) (schedule (at 0.0) (generator n2)) (start-simulation) (set-variable-value! avg-queue-length (variable-mean (resource-queue-variable-n attendant))))) (printf "--- Closed Loop Example ---~n") (printf "Number of attendants = ~a~n" n-attendants) (printf "Number of experiments = ~a~n" (variable-n avg-queue-length)) (printf "Minimum average queue length = ~a~n" (variable-minimum avg-queue-length)) (printf "Maximum average queue length = ~a~n" (variable-maximum avg-queue-length)) (printf "Mean average queue length = ~a~n" (variable-mean avg-queue-length)) (printf "Variance = ~a~n" (variable-variance avg-queue-length)) (print (history-plot (variable-history avg-queue-length) "Average Queue Length")) (newline)))
The following shows the output of the simulation for 1000 run of 1000 customers each.
>(run-simulation 1000 1000) --- Closed Loop Example --- Number of attendants = 2 Number of experiments = 1000 Minimum average queue length = 0.5792057912006373 Maximum average queue length = 3.182757214703683 Mean average queue length = 1.1123279920475524 Variance = 0.08869696318792064
>