csharpfftfsharpintegrationinterpolationlinear-algebramathdifferentiationmatrixnumericsrandomregressionstatisticsmathnet
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
621 lines
40 KiB
621 lines
40 KiB
<!DOCTYPE html>
|
|
<html lang="en">
|
|
<head>
|
|
<meta charset="utf-8"/>
|
|
<title>Descriptive Statistics
|
|
</title>
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
|
|
<meta name="description" content="Math.NET Numerics, providing methods and algorithms for numerical computations in science, engineering and every day use. .Net 4, .Net 3.5, SL5, Win8, WP8, PCL 47 and 136, Mono, Xamarin Android/iOS."/>
|
|
<meta name="author" content="Christoph Ruegg, Marcus Cuda, Jurgen Van Gael"/>
|
|
|
|
<script src="https://code.jquery.com/jquery-1.8.0.js"></script>
|
|
<script src="https://code.jquery.com/ui/1.8.23/jquery-ui.js"></script>
|
|
<script src="https://netdna.bootstrapcdn.com/twitter-bootstrap/2.2.1/js/bootstrap.min.js"></script>
|
|
<link href="https://netdna.bootstrapcdn.com/twitter-bootstrap/2.2.1/css/bootstrap-combined.min.css" rel="stylesheet"/>
|
|
|
|
<link type="text/css" rel="stylesheet" href="https://numerics.mathdotnet.com/content/style.css" />
|
|
<style>
|
|
#main table:not(.pre) {
|
|
border: 1px solid #dddddd;
|
|
max-width: 100%;
|
|
border-style: solid;
|
|
border-width: 1px;
|
|
border-color: gray;
|
|
border-collapse: collapse;
|
|
border-right-width: 1px;
|
|
border-bottom-width: 1px;
|
|
margin-top: 15px;
|
|
margin-bottom: 25px;
|
|
}
|
|
#main table:not(.pre) th, #main table:not(.pre) td {
|
|
border: 1px solid #dddddd;
|
|
padding: 6px;
|
|
}
|
|
#main table:not(.pre) th p, #main table:not(.pre) td p {
|
|
margin-bottom: 5px;
|
|
}
|
|
</style>
|
|
<script type="text/javascript" src="https://numerics.mathdotnet.com/content/tips.js"></script>
|
|
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
|
|
<!--[if lt IE 9]>
|
|
<script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script>
|
|
<![endif]-->
|
|
|
|
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
|
|
</head>
|
|
<body>
|
|
<div class="container">
|
|
<div class="masthead">
|
|
<ul class="nav nav-pills pull-right">
|
|
<li><a href="https://www.mathdotnet.com">Math.NET Project</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com">Math.NET Numerics</a></li>
|
|
<li><a href="https://github.com/mathnet/mathnet-numerics">GitHub</a></li>
|
|
</ul>
|
|
<h3 class="muted">Math.NET Numerics</h3>
|
|
</div>
|
|
<hr />
|
|
<div class="row">
|
|
<div class="span9" id="main">
|
|
|
|
<h1><a name="Descriptive-Statistics" class="anchor" href="#Descriptive-Statistics">Descriptive Statistics</a></h1>
|
|
<h2><a name="Initialization" class="anchor" href="#Initialization">Initialization</a></h2>
|
|
<p>We need to reference Math.NET Numerics and open the statistics namespace:</p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">using</span> MathNet.Numerics.Statistics;
|
|
</code></pre></td></tr></table>
|
|
<h2><a name="Univariate-Statistical-Analysis" class="anchor" href="#Univariate-Statistical-Analysis">Univariate Statistical Analysis</a></h2>
|
|
<p>The primary class for statistical analysis is <code>Statistics</code> which provides common
|
|
descriptive statics as static extension methods to <code>IEnumerable<double></code> sequences.
|
|
However, various statistics can be computed much more efficiently if the data source
|
|
has known properties or structure, that's why the following classes provide specialized
|
|
static implementations:</p>
|
|
<ul>
|
|
<li>
|
|
<strong>ArrayStatistics</strong> provides routines optimized for single-dimensional arrays. Some
|
|
of these routines end with the <code>Inplace</code> suffix, indicating that they reorder the
|
|
input array slightly towards being sorted during execution - without fully sorting
|
|
them, which could be expensive.
|
|
</li>
|
|
<li>
|
|
<strong>SortedArrayStatistics</strong> provides routines optimized for an array sorting ascendingly.
|
|
Especially order-statistics are very efficient this way, some even with constant time complexity.
|
|
</li>
|
|
<li>
|
|
<strong>StreamingStatistics</strong> processes large amounts of data without keeping them in memory.
|
|
Useful if data larger than local memory is streamed directly from a disk or network.
|
|
</li>
|
|
</ul>
|
|
<p>Another alternative, in case you need to gather a whole set of statistical characteristics
|
|
in one pass, is provided by the <code>DescriptiveStatistics</code> class:</p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l"> 1: </span>
|
|
<span class="l"> 2: </span>
|
|
<span class="l"> 3: </span>
|
|
<span class="l"> 4: </span>
|
|
<span class="l"> 5: </span>
|
|
<span class="l"> 6: </span>
|
|
<span class="l"> 7: </span>
|
|
<span class="l"> 8: </span>
|
|
<span class="l"> 9: </span>
|
|
<span class="l">10: </span>
|
|
<span class="l">11: </span>
|
|
<span class="l">12: </span>
|
|
<span class="l">13: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">var</span> samples <span class="o">=</span> <span class="k">new</span> ChiSquare(<span class="n">5</span>).Samples().Take(<span class="n">1000</span>);
|
|
<span class="k">var</span> statistics <span class="o">=</span> <span class="k">new</span> DescriptiveStatistics(samples);
|
|
|
|
<span class="k">var</span> largestElement <span class="o">=</span> statistics.Maximum;
|
|
<span class="k">var</span> smallestElement <span class="o">=</span> statistics.Minimum;
|
|
<span class="k">var</span> median <span class="o">=</span> statistics.Median;
|
|
|
|
<span class="k">var</span> mean <span class="o">=</span> statistics.Mean;
|
|
<span class="k">var</span> variance <span class="o">=</span> statistics.Variance;
|
|
<span class="k">var</span> stdDev <span class="o">=</span> statistics.StandardDeviation;
|
|
|
|
<span class="k">var</span> kurtosis <span class="o">=</span> statistics.Kurtosis;
|
|
<span class="k">var</span> skewness <span class="o">=</span> statistics.Skewness;
|
|
</code></pre></td></tr></table>
|
|
<h2><a name="Minimum-amp-Maximum" class="anchor" href="#Minimum-amp-Maximum">Minimum & Maximum</a></h2>
|
|
<p>The minimum and maximum values of a sample set can be evaluated with the <code>Minimum</code> and <code>Maximum</code>
|
|
functions of all four classes: <code>Statistics</code>, <code>ArrayStatistics</code>, <code>SortedArrayStatistics</code>
|
|
and <code>StreamingStatistics</code>. The one in <code>SortedArrayStatistics</code> is the fastest with constant
|
|
time complexity, but expects the array to be sorted ascendingly.</p>
|
|
<p>Both min and max are directly affected by outliers and are therefore no robust statistics at all.
|
|
For a more robust alternative, consider using Quantiles instead.</p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
<span class="l">3: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">var</span> samples <span class="o">=</span> <span class="k">new</span> ChiSquare(<span class="n">5</span>).Samples().Take(<span class="n">1000</span>).ToArray();
|
|
<span class="k">var</span> largestElement <span class="o">=</span> samples.Maximum();
|
|
<span class="k">var</span> smallestElement <span class="o">=</span> samples.Minimum();
|
|
</code></pre></td></tr></table>
|
|
<h2><a name="Mean" class="anchor" href="#Mean">Mean</a></h2>
|
|
<p>The <em>arithmetic mean</em> or <em>average</em> of the provided samples. In statistics, the sample mean is
|
|
a measure of the central tendency and estimates the expected value of the distribution.
|
|
The mean is affected by outliers, so if you need a more robust estimate consider to use the Median instead.</p>
|
|
<p><code>Statistics.Mean(data)</code>
|
|
<code>StreamingStatistics.Mean(stream)</code>
|
|
<code>ArrayStatistics.Mean(data)</code></p>
|
|
<p><span class="math">\[\overline{x} = \frac{1}{N}\sum_{i=1}^N x_i\]</span></p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
<span class="l">3: </span>
|
|
<span class="l">4: </span>
|
|
<span class="l">5: </span>
|
|
<span class="l">6: </span>
|
|
<span class="l">7: </span>
|
|
<span class="l">8: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span class="i">whiteNoise</span> <span class="o">=</span> <span class="i">Generate</span><span class="o">.</span><span class="i">Normal</span>(<span class="n">1000</span>, <span class="i">mean</span><span class="o">=</span><span class="n">10.0</span>, <span class="i">standardDeviation</span><span class="o">=</span><span class="n">2.0</span>)
|
|
<span class="fsi">val samples : float [] = [|12.90021939; 9.631515037; 7.810008046; 14.13301053; ...|] </span>
|
|
<span class="i">Statistics</span><span class="o">.</span><span class="i">Mean</span> <span class="i">whiteNoise</span>
|
|
<span class="fsi">val it : float = 10.02162347</span>
|
|
|
|
<span class="k">let</span> <span class="i">wave</span> <span class="o">=</span> <span class="i">Generate</span><span class="o">.</span><span class="i">Sinusoidal</span>(<span class="n">1000</span>, <span class="i">samplingRate</span><span class="o">=</span><span class="n">100.</span>, <span class="i">frequency</span><span class="o">=</span><span class="n">5.</span>, <span class="i">amplitude</span><span class="o">=</span><span class="n">0.5</span>)
|
|
<span class="i">Statistics</span><span class="o">.</span><span class="i">Mean</span> <span class="i">wave</span>
|
|
<span class="fsi">val it : float = -4.133520783e-17</span>
|
|
</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
<h2><a name="Variance-and-Standard-Deviation" class="anchor" href="#Variance-and-Standard-Deviation">Variance and Standard Deviation</a></h2>
|
|
<p>Variance <span class="math">\(\sigma^2\)</span> and the Standard Deviation <span class="math">\(\sigma\)</span> are measures of how far the samples are spread out.</p>
|
|
<p>If the whole population is available, the functions with the Population-prefix
|
|
will evaluate the respective measures with an <span class="math">\(N\)</span> normalizer for a population of size <span class="math">\(N\)</span>.</p>
|
|
<p><code>Statistics.PopulationVariance(population)</code>
|
|
<code>Statistics.PopulationStandardDeviation(population)</code></p>
|
|
<p><span class="math">\[\sigma^2 = \frac{1}{N}\sum_{i=1}^N (x_i - \mu)^2\]</span></p>
|
|
<p>On the other hand, if only a sample of the full population is available, the functions
|
|
without the Population-prefix will estimate unbiased population measures by applying
|
|
Bessel's correction with an <span class="math">\(N-1\)</span> normalizer to a sample set of size <span class="math">\(N\)</span>.</p>
|
|
<p><code>Statistics.Variance(samples)</code>
|
|
<code>Statistics.StandardDeviation(samples)</code></p>
|
|
<p><span class="math">\[s^2 = \frac{1}{N-1}\sum_{i=1}^N (x_i - \overline{x})^2\]</span></p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
<span class="l">3: </span>
|
|
<span class="l">4: </span>
|
|
<span class="l">5: </span>
|
|
<span class="l">6: </span>
|
|
<span class="l">7: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Statistics</span><span class="o">.</span><span class="i">Variance</span> <span class="i">whiteNoise</span>
|
|
<span class="fsi">val it : float = 3.819436094</span>
|
|
<span class="i">Statistics</span><span class="o">.</span><span class="i">StandardDeviation</span> <span class="i">whiteNoise</span>
|
|
<span class="fsi">val it : float = 1.954337764</span>
|
|
|
|
<span class="i">Statistics</span><span class="o">.</span><span class="i">Variance</span> <span class="i">wave</span>
|
|
<span class="fsi">val it : float = 0.1251251251</span>
|
|
</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
<h4><a name="Combined-Routines" class="anchor" href="#Combined-Routines">Combined Routines</a></h4>
|
|
<p>Since mean and variance are often needed together, there are routines
|
|
that evaluate both in a single pass:</p>
|
|
<p><code>Statistics.MeanVariance(samples)</code>
|
|
<code>ArrayStatistics.MeanVariance(samples)</code>
|
|
<code>StreamingStatistics.MeanVariance(samples)</code></p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Statistics</span><span class="o">.</span><span class="i">MeanVariance</span> <span class="i">whiteNoise</span>
|
|
<span class="fsi">val it : float * float = (10.02162347, 3.819436094)</span>
|
|
</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
<h2><a name="Covariance" class="anchor" href="#Covariance">Covariance</a></h2>
|
|
<p>The sample covariance is an estimation of the Covariance, a measure of how much two random
|
|
variables change together. Similarly to the variance above, there are two versions in order to
|
|
apply Bessel's correction to bias in case of sample data.</p>
|
|
<p><code>Statistics.Covariance(samples1, samples2)</code></p>
|
|
<p><span class="math">\[q = \frac{1}{N-1}\sum_{i=1}^N (x_i - \overline{x})(y_i - \overline{y})\]</span></p>
|
|
<p><code>Statistics.PopulationCovariance(population1, population2)</code></p>
|
|
<p><span class="math">\[q = \frac{1}{N}\sum_{i=1}^N (x_i - \mu_x)(y_i - \mu_y)\]</span></p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
<span class="l">3: </span>
|
|
<span class="l">4: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Statistics</span><span class="o">.</span><span class="i">Covariance</span>(<span class="i">whiteNoise</span>, <span class="i">whiteNoise</span>)
|
|
<span class="fsi">val it : float = 3.819436094</span>
|
|
<span class="i">Statistics</span><span class="o">.</span><span class="i">Covariance</span>(<span class="i">whiteNoise</span>, <span class="i">wave</span>)
|
|
<span class="fsi">val it : float = 0.04397985084</span>
|
|
</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
<h2><a name="Order-Statistics" class="anchor" href="#Order-Statistics">Order Statistics</a></h2>
|
|
<h4><a name="Order-Statistic" class="anchor" href="#Order-Statistic">Order Statistic</a></h4>
|
|
<p>The k-th order statistic of a sample set is the k-th smallest value. Note that,
|
|
as an exception to most of Math.NET Numerics, the order k is one-based, meaning
|
|
the smallest value is the order statistic of order 1 (there is no order 0).</p>
|
|
<p><code>Statistics.OrderStatistic(data, order)</code>
|
|
<code>SortedArrayStatistics.OrderStatistic(data, order)</code></p>
|
|
<p>If the samples are sorted ascendingly, this is trivial and can be evaluated in constant time,
|
|
which is what the <code>SortedArrayStatistics</code> implementation does.</p>
|
|
<p>If you have the samples in an array which is not (guaranteed to be) sorted,
|
|
but if it is fine if the array does incrementally get sorted over multiple calls,
|
|
you can also use the following in-place implementation. It is usually faster
|
|
than fully sorting the array, unless you need to compute it for more than a handful orders.</p>
|
|
<p><code>ArrayStatistics.OrderStatisticInplace(data, order)</code></p>
|
|
<p>For convenience there's also an option that returns a function <code>Func<int, double></code>,
|
|
mapping from order to the resulting order statistic. Internally it sorts a copy of the
|
|
provided data and then on each invocation uses efficient sorted algorithms:</p>
|
|
<p><code>Statistics.OrderStatisticFunc(data)</code></p>
|
|
<p>Such Inplace and Func variants are a common pattern throughout the Statistics class
|
|
and also the rest of the library.</p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l"> 1: </span>
|
|
<span class="l"> 2: </span>
|
|
<span class="l"> 3: </span>
|
|
<span class="l"> 4: </span>
|
|
<span class="l"> 5: </span>
|
|
<span class="l"> 6: </span>
|
|
<span class="l"> 7: </span>
|
|
<span class="l"> 8: </span>
|
|
<span class="l"> 9: </span>
|
|
<span class="l">10: </span>
|
|
<span class="l">11: </span>
|
|
<span class="l">12: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Statistics</span><span class="o">.</span><span class="i">OrderStatistic</span>(<span class="i">whiteNoise</span>, <span class="n">1</span>)
|
|
<span class="fsi">val it : float = 3.633070184</span>
|
|
<span class="i">Statistics</span><span class="o">.</span><span class="i">OrderStatistic</span>(<span class="i">whiteNoise</span>, <span class="n">1000</span>)
|
|
<span class="fsi">val it : float = 16.65183566</span>
|
|
|
|
<span class="k">let</span> <span class="i">os</span> <span class="o">=</span> <span class="i">Statistics</span><span class="o">.</span><span class="i">orderStatisticFunc</span> <span class="i">whiteNoise</span>
|
|
<span class="i">os</span> <span class="n">250</span>
|
|
<span class="fsi">val it : float = 8.645491746</span>
|
|
<span class="i">os</span> <span class="n">500</span>
|
|
<span class="fsi">val it : float = 10.11872428</span>
|
|
<span class="i">os</span> <span class="n">750</span>
|
|
<span class="fsi">val it : float = 11.33170746</span>
|
|
</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
<h4><a name="Median" class="anchor" href="#Median">Median</a></h4>
|
|
<p>Median is a robust indicator of central tendency and much less affected by outliers
|
|
than the sample mean. The median is estimated by the value exactly in the middle of
|
|
the sorted set of samples and thus separating the higher half of the data from the lower half.</p>
|
|
<p><code>Statistics.Median(data)</code>
|
|
<code>SortedArrayStatistics.Median(data)</code>
|
|
<code>ArrayStatistics.MedianInplace(data)</code></p>
|
|
<p>The median is only unique if the sample size is odd. This implementation internally
|
|
uses the default quantile definition, which is equivalent to mode 8 in R and is approximately
|
|
median-unbiased regardless of the sample distribution. If you need another convention, use
|
|
<code>QuantileCustom</code> instead, see below for details.</p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
<span class="l">3: </span>
|
|
<span class="l">4: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Statistics</span><span class="o">.</span><span class="i">Median</span> <span class="i">whiteNoise</span>
|
|
<span class="fsi">val it : float = 10.11872428</span>
|
|
<span class="i">Statistics</span><span class="o">.</span><span class="i">Median</span> <span class="i">wave</span>
|
|
<span class="fsi">val it : float = -2.452600839e-16</span>
|
|
</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
<h4><a name="Quartiles-and-the-5-number-summary" class="anchor" href="#Quartiles-and-the-5-number-summary">Quartiles and the 5-number summary</a></h4>
|
|
<p>Quartiles group the ascendingly sorted data into four equal groups, where each
|
|
group represents a quarter of the data. The lower quartile is estimated by
|
|
the middle number between the first two groups and the upper quartile by the middle
|
|
number between the remaining two groups. The middle number between the two middle groups
|
|
estimates the median as discussed above.</p>
|
|
<p><code>Statistics.LowerQuartile(data)</code>
|
|
<code>Statistics.UpperQuartile(data)</code>
|
|
<code>SortedArrayStatistics.LowerQuartile(data)</code>
|
|
<code>SortedArrayStatistics.UpperQuartile(data)</code>
|
|
<code>ArrayStatistics.LowerQuartileInplace(data)</code>
|
|
<code>ArrayStatistics.UpperQuartileInplace(data)</code></p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
<span class="l">3: </span>
|
|
<span class="l">4: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Statistics</span><span class="o">.</span><span class="i">LowerQuartile</span> <span class="i">whiteNoise</span>
|
|
<span class="fsi">val it : float = 8.645491746</span>
|
|
<span class="i">Statistics</span><span class="o">.</span><span class="i">UpperQuartile</span> <span class="i">whiteNoise</span>
|
|
<span class="fsi">val it : float = 11.33213732</span>
|
|
</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
<p>Using that data we can provide a useful set of indicators usually named 5-number summary,
|
|
which consists of the minimum value, the lower quartile, the median, the upper quartile and
|
|
the maximum value. All these values can be visualized in the popular box plot diagrams.</p>
|
|
<p><code>Statistics.FiveNumberSummary(data)</code>
|
|
<code>SortedArrayStatistics.FiveNumberSummary(data)</code>
|
|
<code>ArrayStatistics.FiveNumberSummaryInplace(data)</code></p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
<span class="l">3: </span>
|
|
<span class="l">4: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Statistics</span><span class="o">.</span><span class="i">FiveNumberSummary</span> <span class="i">whiteNoise</span>
|
|
<span class="fsi">val it : float [] = [|3.633070184; 8.645937823; 10.12165054; 11.33213732; 16.65183566|] </span>
|
|
<span class="i">Statistics</span><span class="o">.</span><span class="i">FiveNumberSummary</span> <span class="i">wave</span>
|
|
<span class="fsi">val it : float [] = [|-0.5; -0.3584185509; -2.452600839e-16; 0.3584185509; 0.5|] </span>
|
|
</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
<p>The difference between the upper and the lower quartile is called inter-quartile range (IQR)
|
|
and is a robust indicator of spread. In box plots the IQR is the total height of the box.</p>
|
|
<p><code>Statistics.InterquartileRange(data)</code>
|
|
<code>SortedArrayStatistics.InterquartileRange(data)</code>
|
|
<code>ArrayStatistics.InterquartileRangeInplace(data)</code></p>
|
|
<p>Just like median, quartiles use the default R8 quantile definition internally.</p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Statistics</span><span class="o">.</span><span class="i">InterquartileRange</span> <span class="i">whiteNoise</span>
|
|
<span class="fsi">val it : float = 2.686199498</span>
|
|
</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
<h4><a name="Percentiles" class="anchor" href="#Percentiles">Percentiles</a></h4>
|
|
<p>Percentiles extend the concept further by grouping the sorted values into 100
|
|
equal groups and looking at the 101 places (0,1,..,100) between and around them.
|
|
The 0-percentile represents the minimum value, 25 the first quartile, 50 the median,
|
|
75 the upper quartile and 100 the maximum value.</p>
|
|
<p><code>Statistics.Percentile(data, p)</code>
|
|
<code>Statistics.PercentileFunc(data)</code>
|
|
<code>SortedArrayStatistics.Percentile(data, p)</code>
|
|
<code>ArrayStatistics.PercentileInplace(data, p)</code></p>
|
|
<p>Just like median, percentiles use the default R8 quantile definition internally.</p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
<span class="l">3: </span>
|
|
<span class="l">4: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Statistics</span><span class="o">.</span><span class="i">Percentile</span>(<span class="i">whiteNoise</span>, <span class="n">5</span>)
|
|
<span class="fsi">val it : float = 6.693373507</span>
|
|
<span class="i">Statistics</span><span class="o">.</span><span class="i">Percentile</span>(<span class="i">whiteNoise</span>, <span class="n">98</span>)
|
|
<span class="fsi">val it : float = 13.97580653</span>
|
|
</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
<h4><a name="Quantiles" class="anchor" href="#Quantiles">Quantiles</a></h4>
|
|
<p>Instead of grouping into 4 or 100 boxes, quantiles generalize the concept to an infinite number
|
|
of boxes and thus to arbitrary real numbers <span class="math">\(\tau\)</span> between 0.0 and 1.0, where 0.0 represents the
|
|
minimum value, 0.5 the median and 1.0 the maximum value. Quantiles are closely related to
|
|
the inverse cumulative distribution function of the sample distribution.</p>
|
|
<p><code>Statistics.Quantile(data, tau)</code>
|
|
<code>Statistics.QuantileFunc(data)</code>
|
|
<code>SortedArrayStatistics.Quantile(data, tau)</code>
|
|
<code>ArrayStatistics.QuantileInplace(data, tau)</code></p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Statistics</span><span class="o">.</span><span class="i">Quantile</span>(<span class="i">whiteNoise</span>, <span class="n">0.98</span>)
|
|
<span class="fsi">val it : float = 13.97580653</span>
|
|
</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
<h4><a name="Quantile-Conventions-and-Compatibility" class="anchor" href="#Quantile-Conventions-and-Compatibility">Quantile Conventions and Compatibility</a></h4>
|
|
<p>Remember that all these descriptive statistics do not <em>compute</em> but merely <em>estimate</em>
|
|
statistical indicators of the value distribution. In the case of quantiles,
|
|
there is usually not a single number between the two groups specified by <span class="math">\(\tau\)</span>.
|
|
There are multiple ways to deal with this: the R project supports 9 modes and Mathematica
|
|
and SciPy have their own way to parametrize the behavior.</p>
|
|
<p>The <code>QuantileCustom</code> functions support all 9 modes from the R-project, which includes the one
|
|
used by Microsoft Excel, and also the 4-parameter variant of Mathematica:</p>
|
|
<p><code>Statistics.QuantileCustom(data, tau, definition)</code>
|
|
<code>Statistics.QuantileCustomFunc(data, definition)</code>
|
|
<code>SortedArrayStatistics.QuantileCustom(data, tau, a, b, c, d)</code>
|
|
<code>SortedArrayStatistics.QuantileCustom(data, tau, definition)</code>
|
|
<code>ArrayStatistics.QuantileCustomInplace(data, tau, a, b, c, d)</code>
|
|
<code>ArrayStatistics.QuantileCustomInplace(data, tau, definition)</code></p>
|
|
<p>The <code>QuantileDefinition</code> enumeration has the following options:</p>
|
|
<ul>
|
|
<li><strong>R1</strong>, SAS3, EmpiricalInvCDF</li>
|
|
<li><strong>R2</strong>, SAS5, EmpiricalInvCDFAverage</li>
|
|
<li><strong>R3</strong>, SAS2, Nearest</li>
|
|
<li><strong>R4</strong>, SAS1, California</li>
|
|
<li><strong>R5</strong>, Hydrology, Hazen</li>
|
|
<li><strong>R6</strong>, SAS4, Nist, Weibull, SPSS</li>
|
|
<li><strong>R7</strong>, Excel, Mode, S</li>
|
|
<li><strong>R8</strong>, Median, Default</li>
|
|
<li>
|
|
<strong>R9</strong>, Normal
|
|
|
|
[lang=fsharp]
|
|
Statistics.QuantileCustom(whiteNoise, 0.98, QuantileDefinition.R3)
|
|
// [fsi:val it : float = 13.97113209]
|
|
Statistics.QuantileCustom(whiteNoise, 0.98, QuantileDefinition.Excel)
|
|
// [fsi:val it : float = 13.97127374]
|
|
</li>
|
|
</ul>
|
|
<h2><a name="Rank-Statistics" class="anchor" href="#Rank-Statistics">Rank Statistics</a></h2>
|
|
<h4><a name="Ranks" class="anchor" href="#Ranks">Ranks</a></h4>
|
|
<p>Rank statistics are the counterpart to order statistics. The <code>Ranks</code> function evaluates the rank
|
|
of each sample and returns them as an array of doubles. The return type is double instead of int
|
|
in order to deal with ties, if one of the values appears multiple times.
|
|
Similar to <code>QuantileDefinition</code>, the <code>RankDefinition</code> enumeration controls how ties should be handled:</p>
|
|
<ul>
|
|
<li><strong>Average</strong>, Default: Replace ties with their mean (causing non-integer ranks).</li>
|
|
<li><strong>Min</strong>, Sports: Replace ties with their minimum, as typical in sports ranking.</li>
|
|
<li><strong>Max</strong>: Replace ties with their maximum.</li>
|
|
<li><strong>First</strong>: Permutation with increasing values at each index of ties.</li>
|
|
<li><strong>EmpiricalCDF</strong></li>
|
|
</ul>
|
|
<p><code>Statistics.Ranks(data, definition)</code>
|
|
<code>SortedArrayStatistics.Ranks(data, definition)</code>
|
|
<code>ArrayStatistics.RanksInplace(data, definition)</code></p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
<span class="l">3: </span>
|
|
<span class="l">4: </span>
|
|
<span class="l">5: </span>
|
|
<span class="l">6: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Statistics</span><span class="o">.</span><span class="i">Ranks</span>(<span class="i">whiteNoise</span>)
|
|
<span class="fsi">val it : float [] = [|634.0; 736.0; 405.0; 395.0; 197.0; 167.0; 722.0; 44.0; ...|] </span>
|
|
<span class="i">Statistics</span><span class="o">.</span><span class="i">Ranks</span>([| <span class="n">13.0</span>; <span class="n">14.0</span>; <span class="n">11.0</span>; <span class="n">12.0</span>; <span class="n">13.0</span> |], <span class="i">RankDefinition</span><span class="o">.</span><span class="i">Average</span>)
|
|
<span class="fsi">val it : float [] = [|3.5; 5.0; 1.0; 2.0; 3.5|] </span>
|
|
<span class="i">Statistics</span><span class="o">.</span><span class="i">Ranks</span>([| <span class="n">13.0</span>; <span class="n">14.0</span>; <span class="n">11.0</span>; <span class="n">12.0</span>; <span class="n">13.0</span> |], <span class="i">RankDefinition</span><span class="o">.</span><span class="i">Sports</span>)
|
|
<span class="fsi">val it : float [] = [|3.0; 5.0; 1.0; 2.0; 3.0|] </span>
|
|
</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
<h4><a name="Quantile-Rank" class="anchor" href="#Quantile-Rank">Quantile Rank</a></h4>
|
|
<p>Counterpart of the <code>Quantile</code> function, estimates <span class="math">\(\tau\)</span> of the provided <span class="math">\(\tau\)</span>-quantile value
|
|
<span class="math">\(x\)</span> from the provided samples. The <span class="math">\(\tau\)</span>-quantile is the data value where the cumulative distribution
|
|
function crosses <span class="math">\(\tau\)</span>.</p>
|
|
<p><code>Statistics.QuantileRank(data, x, definition)</code>
|
|
<code>Statistics.QuantileRankFunc(data, definition)</code>
|
|
<code>SortedArrayStatistics.QuantileRank(data, x, definition)</code></p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
<span class="l">3: </span>
|
|
<span class="l">4: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="i">Statistics</span><span class="o">.</span><span class="i">QuantileRank</span>(<span class="i">whiteNoise</span>, <span class="n">13.0</span>)
|
|
<span class="fsi">val it : float = 0.9370045563</span>
|
|
<span class="i">Statistics</span><span class="o">.</span><span class="i">QuantileRank</span>(<span class="i">whiteNoise</span>, <span class="n">6.7</span>, <span class="i">RankDefinition</span><span class="o">.</span><span class="i">Average</span>)
|
|
<span class="fsi">val it : float = 0.04960610389</span>
|
|
</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
<h2><a name="Empirical-Distribution-Functions" class="anchor" href="#Empirical-Distribution-Functions">Empirical Distribution Functions</a></h2>
|
|
<p><code>Statistics.EmpiricalCDF(data, x)</code>
|
|
<code>Statistics.EmpiricalCDFFunc(data)</code>
|
|
<code>Statistics.EmpiricalInvCDF(data, tau)</code>
|
|
<code>Statistics.EmpiricalInvCDFFunc(data)</code>
|
|
<code>SortedArrayStatistics.EmpiricalCDF(data, x)</code></p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l"> 1: </span>
|
|
<span class="l"> 2: </span>
|
|
<span class="l"> 3: </span>
|
|
<span class="l"> 4: </span>
|
|
<span class="l"> 5: </span>
|
|
<span class="l"> 6: </span>
|
|
<span class="l"> 7: </span>
|
|
<span class="l"> 8: </span>
|
|
<span class="l"> 9: </span>
|
|
<span class="l">10: </span>
|
|
<span class="l">11: </span>
|
|
<span class="l">12: </span>
|
|
<span class="l">13: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span class="i">ecdf</span> <span class="o">=</span> <span class="i">Statistics</span><span class="o">.</span><span class="i">EmpiricalCDFFunc</span> <span class="i">whiteNoise</span>
|
|
<span class="i">Generate</span><span class="o">.</span><span class="i">LinearSpacedMap</span>(<span class="n">20</span>, <span class="i">start</span><span class="o">=</span><span class="n">3.0</span>, <span class="i">stop</span><span class="o">=</span><span class="n">17.0</span>, <span class="i">map</span><span class="o">=</span><span class="i">ecdf</span>)
|
|
<span class="fsi">val it : float [] =</span>
|
|
<span class="fsi"> [|0.0; 0.001; 0.002; 0.005; 0.022; 0.05; 0.094; 0.172; 0.278; 0.423; 0.555; </span>
|
|
<span class="fsi"> 0.705; 0.843; 0.921; 0.944; 0.983; 0.992; 0.997; 0.999; 1.0|] </span>
|
|
|
|
<span class="k">let</span> <span class="i">eicdf</span> <span class="o">=</span> <span class="i">Statistics</span><span class="o">.</span><span class="i">empiricalInvCDFFunc</span> <span class="i">whiteNoise</span>
|
|
[ <span class="k">for</span> <span class="i">tau</span> <span class="k">in</span> <span class="n">0.0</span><span class="o">..</span><span class="n">0.05</span><span class="o">..</span><span class="n">1.0</span> <span class="k">-></span> <span class="i">eicdf</span> <span class="i">tau</span> ]
|
|
<span class="fsi">val it : float [] =</span>
|
|
<span class="fsi"> [3.633070184; 6.682142043; 7.520000817; 8.040513497; 8.347587493; </span>
|
|
<span class="fsi"> 8.645491746; 9.02681611; 9.298987151; 9.522627142; 9.819352699; 10.11872428; </span>
|
|
<span class="fsi"> 10.35991046; 10.57530906; 10.8259542; 11.08605473; 11.33170746; 11.54356436; </span>
|
|
<span class="fsi"> 11.90973541; 12.4294346; 13.36889423; 16.65183566] </span>
|
|
</code></pre></td>
|
|
</tr>
|
|
</table>
|
|
<h2><a name="Histograms" class="anchor" href="#Histograms">Histograms</a></h2>
|
|
<p>A histogram can be computed using the <a href="https://numerics.mathdotnet.com/api/MathNet.Numerics.Statistics/Histogram.htm">Histogram</a> class. Its constructor takes
|
|
the samples enumerable, the number of buckets to create, plus optionally the range
|
|
(minimum, maximum) of the sample data if available.</p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">var</span> histogram <span class="o">=</span> <span class="k">new</span> Histogram(samples, <span class="n">10</span>);
|
|
<span class="k">var</span> bucket<span class="n">3</span>count <span class="o">=</span> histogram[<span class="n">2</span>].Count;
|
|
</code></pre></td></tr></table>
|
|
<h2><a name="Correlation" class="anchor" href="#Correlation">Correlation</a></h2>
|
|
<p>The <code>Correlation</code> class supports computing Pearson's product-momentum and Spearman's ranked
|
|
correlation coefficient, as well as their correlation matrix for a set of vectors.</p>
|
|
<p>Code Sample: Computing the correlation coefficient of 1000 samples of f(x) = 2x and g(x) = x^2:</p>
|
|
<table class="pre"><tr><td class="lines"><pre class="fssnip"><span class="l">1: </span>
|
|
<span class="l">2: </span>
|
|
<span class="l">3: </span>
|
|
</pre></td>
|
|
<td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">double</span>[] dataF <span class="o">=</span> Generate.LinearSpacedMap(<span class="n">1000</span>, <span class="n">0</span>, <span class="n">100</span>, x <span class="o">=</span><span class="o">></span> <span class="n">2</span>*x);
|
|
<span class="k">double</span>[] dataG <span class="o">=</span> Generate.LinearSpacedMap(<span class="n">1000</span>, <span class="n">0</span>, <span class="n">100</span>, x <span class="o">=</span><span class="o">></span> x*x);
|
|
<span class="k">double</span> correlation <span class="o">=</span> Correlation.Pearson(dataF, dataG);
|
|
</code></pre></td></tr></table>
|
|
|
|
|
|
</div>
|
|
<div class="span3">
|
|
<ul class="nav nav-list" id="menu">
|
|
|
|
<li class="nav-header">Math.NET Numerics</li>
|
|
<li><a href="https://numerics.mathdotnet.com/Packages.html">NuGet & Binaries</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com/ReleaseNotes.html">Release Notes</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com/License.html">MIT/X11 License</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com/Compatibility.html">Platform Support</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com/api/">Class Reference</a></li>
|
|
<li><a href="https://github.com/mathnet/mathnet-numerics/issues">Issues & Bugs</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com/Users.html">Who is using Math.NET?</a></li>
|
|
|
|
<li class="nav-header">Contributing</li>
|
|
<li><a href="https://numerics.mathdotnet.com/Contributors.html">Contributors</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com/Contributing.html">Contributing</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com/Build.html">Build & Tools</a></li>
|
|
<li><a href="http://feedback.mathdotnet.com/forums/2060-math-net-numerics">Your Ideas</a></li>
|
|
|
|
<li class="nav-header">Getting Help</li>
|
|
<li><a href="https://discuss.mathdotnet.com/c/numerics">Discuss</a></li>
|
|
<li><a href="https://stackoverflow.com/questions/tagged/mathdotnet">Stack Overflow</a></li>
|
|
|
|
<li class="nav-header">Getting Started</li>
|
|
<li><a href="https://numerics.mathdotnet.com/">Getting started</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com/Constants.html">Constants</a></li>
|
|
<li>Floating-Point Numbers</li>
|
|
<li>Arbitrary Precision Numbers</li>
|
|
<li>Complex Numbers</li>
|
|
<li><a href="https://numerics.mathdotnet.com/Matrix.html">Matrices and Vectors</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com/Euclid.html">Euclid & Number Theory</a></li>
|
|
<li>Combinatorics</li>
|
|
|
|
<li class="nav-header">Evaluation</li>
|
|
<li><a href="https://numerics.mathdotnet.com/Functions.html">Special Functions</a></li>
|
|
<li>Differentiation</li>
|
|
<li><a href="https://numerics.mathdotnet.com/Integration.html">Integration</a></li>
|
|
|
|
<li class="nav-header">Statistics/Probability</li>
|
|
<li><a href="https://numerics.mathdotnet.com/DescriptiveStatistics.html">Descriptive Statistics</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com/Probability.html">Probability Distributions</a></li>
|
|
|
|
<li class="nav-header">Generation</li>
|
|
<li><a href="https://numerics.mathdotnet.com/Generate.html">Generating Data</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com/Random.html">Random Numbers</a></li>
|
|
|
|
<li class="nav-header">Transformation</li>
|
|
<li>Fourier Transform (FFT)</li>
|
|
<li>Filtering & DSP</li>
|
|
<li>Window Functions</li>
|
|
|
|
<li class="nav-header">Solving Equations</li>
|
|
<li><a href="https://numerics.mathdotnet.com/LinearEquations.html">Linear Equation Systems</a></li>
|
|
<li>Nonlinear Root Finding</li>
|
|
|
|
<li class="nav-header">Optimization</li>
|
|
<li>Linear Least Squares</li>
|
|
<li>Nonlinear Optimization</li>
|
|
<li><a href="https://numerics.mathdotnet.com/Distance.html">Distance Metrics</a></li>
|
|
|
|
<li class="nav-header">Curve Fitting</li>
|
|
<li><a href="https://numerics.mathdotnet.com/Regression.html">Regression</a></li>
|
|
<li>Interpolation</li>
|
|
<li>Fourier Approximation</li>
|
|
|
|
<li class="nav-header">Native Providers</li>
|
|
<li><a href="https://numerics.mathdotnet.com/MKL.html">Intel MKL</a></li>
|
|
|
|
<li class="nav-header">Working Together</li>
|
|
<li><a href="https://numerics.mathdotnet.com/CSV.html">Delimited Text Files (CSV)</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com/MatrixMarket.html">NIST MatrixMarket</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com/MatlabFiles.html">MATLAB</a></li>
|
|
<li><a href="https://numerics.mathdotnet.com/IFSharpNotebook.html">IF# Notebook</a></li>
|
|
<li>FsLab & Deedle</li>
|
|
<li>Microsoft Excel</li>
|
|
<li>numl.net machine learning</li>
|
|
<li>R-project</li>
|
|
|
|
</ul>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</body>
|
|
</html>
|
|
|