8 Statistics
This chapter describes the statistical functions provided by the Science Collection. The basic statistical functions include functions to compute the mean, variance, and standard deviation, More advanced functions allow you to calculate absolute deviation, skewness, and kurtosis, as well as the median and arbitrary percentiles. The algorithms use recurrance relations to compute average quantities in a stable way, without large intermediate values that might overflow.
The functions described in this chapter are defined in the "statistics.rkt" file in the Science Collection and are made available using the form:
8.1 Running Statistics
A running statistics object accumulates a minimal set of statistics (n, min, max, mean, variance, and standard deviation) for a set of data values. A running statistics object does not require that a sequence (e.g., list or vector) of the data value be maintained.
Returns #t is x is a running statistics object.
Returns a new, empty running statistics object.
Resets the running statistics object s.
Updates the running statistice object s with the value of x.
Returns the number of values that have been added to the running statistics object
s. This value is zero initially and after a call to
statistics-reset!.
Returns the minimum value that has been added to the running statistics object
s. This value is
+inf.0 initially and after a call to
statistics-reset!.
Returns the maximum value that has been added to the running statistics object
s. This value is
-inf.0 initially and after a call to
statistics-reset!.
Returns the arithmetic mean of the values that have been added to the running statistics object
s. This value is zero initially and after a call to
statistics-reset!.
Returns the estimated, or sample, variance of the values that have been added to the running statistics object
s. This value is zero initially and after a call to
statistics-reset!.
Returns the standard deviation of the values that have been added to the running statistics object
s. This is the square root of the value returned by
statistics-variance.
8.2 Running Statistics Example
This example generated 100 random numbers between 0.0 and 10.0 and maintains running statistics on the values.
Produces the following results.
Running Statistics Example |
n = 100 |
min = 0.11100957474903939 |
max = 9.938914540059452 |
mean = 5.466640451797567 |
variance = 8.677003172428925 |
standard deviation = 2.945675333846031 |
8.3 Mean, Standard Deviation, and Variance
Returns the arithmetic mean of "data".
|
data : sequence-of-real? |
|
data : sequence-of-real? |
Returns the aritnmetic mean and the estimated, or sample, variance of data as multiple values. These values are computed in a single pass through data.
Returns the estimated, or sample, variance of data relative to the given value of mean. If mean is not provided, the variance is relative to the arithmetic mean and is computed in a single pass through data.
Returns the estimated, or sample, standard deviation of datarelative to the given value of mean. If mean is not provided, the standard deviation is relative to the arithmetic mean and is computed in a single pass through data. The standard deviation is defined as the square root of the variance.
Returns the total sum of squates of data aout the mean. If mean is not provided, it is calculated by a call to (mean data).
Returns an unbiased estimate of the variance of data when the population mean, mean, of the underlying distribution is known a priori.
|
data : sequence-of-real? |
mean : real? |
|
→ (>=/c 0.0) |
data : sequence-of-real? |
mean : real? |
Returns the standard deviation of
data for a fixed population mean,
mean. The result is the square root of the
variance-with-fixed-mean function.
8.4 Absolute Deviation
Returns the absolute devistion of data relative to the given value of the mean, mean. If mean is not provided, it is calculated by a call to (mean data). This function is also useful if you want to calculate the absolute deviation to any value other than the mean, such as zero or the median.
8.5 Higher Moments (Skewness and Kurtosis)
Returns the skewness of
data using the given values of the mean,
mean, and standard deviation,
sd. The
skewness measures the symmetry of the tails of a distribution. If
mean and
sd are not provided, they are calculated by a call to
mean-and-variance.
Returns the kurtosis of
data using the given values of the mean,
mean, and standard deviation,
sd. The
kurtosis measures how sharply peaked a distribution is relative to its width. If
mean and
sd are not provided, they are calculated by a call to
mean-and-variance.
8.6 Autocorrelation
Returns the lag-1 autocorrelation of data using the given value of the mean, mean. If mean is not provided, it is calculated by a call to (mean data).
8.7 Covariance
(covariance data1 data2 mean1 mean2) → real? |
data1 : nonempty-sequence-of-real? |
data2 : nonempty-sequence-of-real? |
mean1 : real? |
mean2 : real? |
|
data1 : nonempty-sequence-of-real? |
data2 : nonempty-sequence-of-real? |
mean1 : real? |
mean2 : real? |
(covariance data1 data2) → real? |
data1 : nonempty-sequence-of-real? |
data2 : nonempty-sequence-of-real? |
(unchecked-covariance data1 data2) → real? |
data1 : nonempty-sequence-of-real? |
data2 : nonempty-sequence-of-real? |
Returns the covariance of
data1 and
data2 using the given values of
mean1 and
mean2. If the values of
mean1 and
mean2 are not given, they are calculated using calls to
(mean data1) and
(mean data2), respectively.
8.8 Correlation
(correlation data1 data2) → real? |
data1 : nonempty-sequence-of-real? |
data2 : nonempty-sequence-of-real? |
(unchecked-correlation data1 data2) → real? |
data1 : nonempty-sequence-of-real? |
data2 : nonempty-sequence-of-real? |
Returns the Pearson correlation coefficient between data1 and data2.
8.9 Weighted Samples
Returns the weighted mean of data using weights, weights.
Returns the weighted variance of
data using weights,
weights, and the given weighted mean,
wmean. If
wmean is not provided, it is calculated by a call to
(weighted-mean weights data).
|
weights : sequence-of-real? |
data : sequence-of-real? |
wmean : real? |
|
weights : sequence-of-real? |
data : sequence-of-real? |
wmean : real? |
(weighted-standard-deviation weights data) → (>=/c 0.0) |
weights : sequence-of-real? |
data : sequence-of-real? |
|
weights : sequence-of-real? |
data : sequence-of-real? |
Returns the weighted standard deviation of
data using weights,
weights. The
standard deviation is defined as the square root of the variance. The result is the square root of the corresponding
weighted-variance function.
|
weights : sequence-of-real? |
data : sequence-of-reals? |
wmean : real? |
|
→ (>=/c 0.0) |
weights : sequence-of-real? |
data : sequence-of-reals? |
wmean : real? |
Returns an unbiased estimate of the weighted variance of data using weights, weights, when the weighted population mean, wmean, of the underlying population is known a priori.
|
→ (>=/c 0.0) |
weights : sequence-of-real? |
data : sequence-of-real? |
wmean : real? |
|
→ (>=/c 0.0) |
weights : sequence-of-real? |
data : sequence-of-real? |
wmean : real? |
Returns the weighted standard deviation of
data using weights,
weights, with a fixed population mean,
wmean. The result is the square root of the
weighted-variance-with-fixed-mean function.
|
weights : sequence-of-real? |
data : sequence-of-real? |
wmean : real? |
|
weights : sequence-of-real? |
data : sequence-of-real? |
wmean : real? |
(weighted-absolute-deviation weights data) → (>=/c 0.0) |
weights : sequence-of-real? |
data : sequence-of-real? |
|
weights : sequence-of-real? |
data : sequence-of-real? |
Returns the weighted absolute devistion of
data using weights,
weights, relative to the given value of the weighted mean,
wmean. If
wmean is not provided, it is calculated by a call to
(weighted-mean weights data). This function is also useful if you want to calculate the weighted absolute deviation to any value other than the mean, such as zero or the weighted median.
(weighted-skew weights data wmean wsd) → (>=/c 0.0) |
weights : sequence-of-real? |
data : sequence-of-real? |
wmean : real? |
wsd : (>=/c 0.0) |
|
weights : sequence-of-real? |
data : sequence-of-real? |
wmean : real? |
wsd : (>=/c 0.0) |
(weighted-skew weights data) → (>=/c 0.0) |
weights : sequence-of-real? |
data : sequence-of-real? |
(unchecked-weighted-skew weights data) → (>=/c 0.0) |
weights : sequence-of-real? |
data : sequence-of-real? |
Returns the weighted skewness of
data using weights,
weights, using the given values of the weighted mean,
wmean, and weighted standard deviation,
wsd. The
skewness measures the symmetry of the tails of a distribution. If
wmean and
wsd are not provided, they are calculated by calls to
(weighted-mean weights data) and
(weighted-standard-deviation weights data wmean).
Returns the weighted kurtosis of
data using weights,
weights, using the given values of the weighted mean,
wmean, and weighted standard deviation,
wsd. The
kurtosis measures how sharply peaked a distribution is relative to its width. If
wmean and
wsd are not provided, they are calculated by calls to
(weighted-mean weights data) and
(weighted-standard-deviation weights data wmean).
8.10 Maximum and Minimum
Returns the maximum value in data.
Returns the minimum value in data.
|
data : nonempty-sequence-of-real? |
|
data : nonempty-sequence-of-real? |
Returns the minimum and maximum values on data as multiple values.
Returns the index of the maximum value in data. When there are several equal maximum elements, the index of the first one is chosen.
Returns the index of the minimum value in data. When there are several equal minimum elements, the index of the first one is chosen.
Returns the indices of the minimum and maximum values in data as multiple values. When there are several equal minimum or maximum elements, the index of the first ones are chosen.
8.11 Median and Quantiles
Thw median and quantile functions described in this section operate on sorted data. The contracts for these functions enforce this. Also, for convenience we use quantiles measured on a scale of 0 to 1 instead of percentiles, which use a scale of 0 to 100).
Returns the median value of sorted-data. When the dataset has an odd number of elements, the median is the value of element (n - 1)/2. When the dataset has an even number of elements, the median is the mean of the two nearest middle values, elements (n - 1)/2 and n/2.
Returns a quantile value of sorted-data. The quantile is determined by the value f, a fraction between 0 and 1. For example to compute the 75th percentile, f should have the value 0.75.
The quantile is found by interpolation using the formula:
quantile = 1 - delta(x[i]) + delta(x(i + 1))
where i is floor((n - 1) × f) and delta is (n - 1) × f - 1.
8.12 Statistics Example
This example generates two vectors from a unit Gaussian distribution and a vector of conse squared weighting data. All of the vectors are of length 1,000. Thes data are used to test all of the statistics functions.
Produces the following output:
Statistics Example |
mean = 0.03457693091555611 |
variance = 1.0285343857083435 |
standard deviation = 1.0141668431320083 |
variance from 0.0 = 1.028701415474174 |
standard deviation from 0.0 = 1.014249188056946 |
absolute deviation = 0.7987180852601665 |
absolute deviation from 0.0 = 0.7987898146946209 |
skew = 0.04340293467117837 |
kurtosis = 0.17722452271702993 |
lag-1 autocorrelation = 0.0029930889831972143 |
covariance = 0.005782911085590894 |
weighted mean = 0.05096139259270008 |
weighted variance = 1.0500293763787367 |
weighted standard deviation = 1.0247094107007786 |
weighted variance from 0.0 = 1.0510513958491579 |
weighted standard deviation from 0.0 = 1.0252079768755011 |
weighted absolute deviation = 0.8054378524718832 |
weighted absolute deviation from 0.0 = 0.8052440544958938 |
weighted skew = 0.046448729539282155 |
weighted kurtosis = 0.3050060704791675 |
maximum = 3.731148814104969 |
minimum = -3.327265864298485 |
index of maximum value = 502 |
index of minimum value = 476 |
median = 0.019281803306206644 |
10% quantile = -1.243869878615807 |
20% quantile = -0.7816243947573505 |
30% quantile = -0.4708703241429585 |
40% quantile = -0.2299309332835332 |
50% quantile = 0.019281803306206644 |
60% quantile = 0.30022966479982344 |
70% quantile = 0.5317978807508836 |
80% quantile = 0.832291888537874 |
90% quantile = 1.3061151234700463 |