Math.NET Numerics
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 

471 lines
41 KiB

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Descriptive Statistics
</title>
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta name="author" content="Christoph Ruegg, Marcus Cuda, Jurgen Van Gael">
<link rel="stylesheet" id="theme_link" href="https://cdnjs.cloudflare.com/ajax/libs/bootswatch/4.6.0/materia/bootstrap.min.css">
<script src="https://code.jquery.com/jquery-3.4.1.min.js"></script>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@4.6.0/dist/js/bootstrap.bundle.min.js" integrity="sha384-Piv4xVNRyMGpqkS2by6br4gNJ7DXjqk09RmUpJ8jgGtD7zP9yug3goQfGII0yAns" crossorigin="anonymous"></script>
<script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML"></script>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico">
<link type="text/css" rel="stylesheet" href="https://numerics.mathdotnet.com/content/navbar-fixed-left.css" />
<link type="text/css" rel="stylesheet" href="https://numerics.mathdotnet.com/content/fsdocs-default.css" />
<link type="text/css" rel="stylesheet" href="https://numerics.mathdotnet.com/content/fsdocs-custom.css" />
<script type="text/javascript" src="https://numerics.mathdotnet.com/content/fsdocs-tips.js"></script>
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
<!-- BEGIN SEARCH BOX: this adds support for the search box -->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/JavaScript-autoComplete/1.0.4/auto-complete.css" />
<!-- END SEARCH BOX: this adds support for the search box -->
</head>
<body>
<nav class="navbar navbar-expand-md navbar-light bg-secondary fixed-left" id="fsdocs-nav">
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarsExampleDefault" aria-controls="navbarsExampleDefault" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse navbar-nav-scroll" id="navbarsExampleDefault">
<a href="https://numerics.mathdotnet.com/"><img id="fsdocs-logo" src="/logo.png" /></a>
<!-- BEGIN SEARCH BOX: this adds support for the search box -->
<div id="header">
<div class="searchbox" id="fsdocs-searchbox">
<label for="search-by">
<i class="fas fa-search"></i>
</label>
<input data-search-input="" id="search-by" type="search" placeholder="Search..." />
<span data-search-clear="">
<i class="fas fa-times"></i>
</span>
</div>
</div>
<!-- END SEARCH BOX: this adds support for the search box -->
<ul class="navbar-nav">
<li class="nav-header">Math.NET Numerics</li>
<li class="nav-item"><a class="nav-link" href="Packages.html">NuGet & Binaries</a></li>
<li class="nav-item"><a class="nav-link" href="ReleaseNotes.html">Release Notes</a></li>
<li class="nav-item"><a class="nav-link" href="https://github.com/mathnet/mathnet-numerics/blob/master/LICENSE.md">MIT License</a></li>
<li class="nav-item"><a class="nav-link" href="Compatibility.html">Platform Support</a></li>
<li class="nav-item"><a class="nav-link" href="https://numerics.mathdotnet.com/api/">Class Reference</a></li>
<li class="nav-item"><a class="nav-link" href="https://github.com/mathnet/mathnet-numerics/issues">Issues & Bugs</a></li>
<li class="nav-item"><a class="nav-link" href="Users.html">Who is using Math.NET?</a></li>
<li class="nav-header">Contributing</li>
<li class="nav-item"><a class="nav-link" href="Contributors.html">Contributors</a></li>
<li class="nav-item"><a class="nav-link" href="Contributing.html">Contributing</a></li>
<li class="nav-item"><a class="nav-link" href="Build.html">Build & Tools</a></li>
<li class="nav-item"><a class="nav-link" href="https://github.com/mathnet/mathnet-numerics/discussions/categories/ideas">Your Ideas</a></li>
<li class="nav-header">Getting Help</li>
<li class="nav-item"><a class="nav-link" href="https://discuss.mathdotnet.com/c/numerics">Discuss</a></li>
<li class="nav-item"><a class="nav-link" href="https://stackoverflow.com/questions/tagged/mathdotnet">Stack Overflow</a></li>
<li class="nav-header">Getting Started</li>
<l class="nav-item"i><a class="nav-link" href="/">Getting started</a></li>
<li class="nav-item"><a class="nav-link" href="Constants.html">Constants</a></li>
<li class="nav-item"><a class="nav-link" href="Matrix.html">Matrices and Vectors</a></li>
<li class="nav-item"><a class="nav-link" href="Euclid.html">Euclid & Number Theory</a></li>
<li class="nav-item">Combinatorics</li>
<li class="nav-header">Evaluation</li>
<li class="nav-item"><a class="nav-link" href="Functions.html">Special Functions</a></li>
<li class="nav-item"><a class="nav-link" href="Integration.html">Integration</a></li>
<li class="nav-header">Statistics/Probability</li>
<li class="nav-item"><a class="nav-link" href="DescriptiveStatistics.html">Descriptive Statistics</a></li>
<li class="nav-item"><a class="nav-link" href="Probability.html">Probability Distributions</a></li>
<li class="nav-header">Generation</li>
<li class="nav-item"><a class="nav-link" href="Generate.html">Generating Data</a></li>
<li class="nav-item"><a class="nav-link" href="Random.html">Random Numbers</a></li>
<li class="nav-header">Solving Equations</li>
<li class="nav-item"><a class="nav-link" href="LinearEquations.html">Linear Equation Systems</a></li>
<li class="nav-header">Optimization</li>
<li class="nav-item"><a class="nav-link" href="Distance.html">Distance Metrics</a></li>
<li class="nav-header">Curve Fitting</li>
<li class="nav-item"><a class="nav-link" href="Regression.html">Regression</a></li>
<li class="nav-header">Native Providers</li>
<li class="nav-item"><a class="nav-link" href="MKL.html">Intel MKL</a></li>
<li class="nav-header">Working Together</li>
<li class="nav-item"><a class="nav-link" href="CSV.html">Delimited Text Files (CSV)</a></li>
<li class="nav-item"><a class="nav-link" href="MatrixMarket.html">NIST MatrixMarket</a></li>
<li class="nav-item"><a class="nav-link" href="MatlabFiles.html">MATLAB</a></li>
<li class="nav-item"><a class="nav-link" href="IFSharpNotebook.html">IF# Notebook</a></li>
</ul>
</div>
</nav>
<div class="container">
<div class="masthead">
<h3 class="muted">
<a href="https://numerics.mathdotnet.com">Math.NET Numerics</a> |
<a href="https://www.mathdotnet.com">Math.NET Project</a> |
<a href="https://github.com/mathnet/mathnet-numerics">GitHub</a>
</h3>
</div>
<hr />
<div class="container" id="fsdocs-content">
<h1><a name="Descriptive-Statistics" class="anchor" href="#Descriptive-Statistics">Descriptive Statistics</a></h1>
<h2><a name="Initialization" class="anchor" href="#Initialization">Initialization</a></h2>
<p>We need to reference Math.NET Numerics and open the statistics namespace:</p>
<table class="pre"><tr><td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">using</span> MathNet.Numerics.Statistics;
</code></pre></td></tr></table>
<h2><a name="Univariate-Statistical-Analysis" class="anchor" href="#Univariate-Statistical-Analysis">Univariate Statistical Analysis</a></h2>
<p>The primary class for statistical analysis is <code>Statistics</code> which provides common
descriptive statics as static extension methods to <code>IEnumerable&lt;double&gt;</code> sequences.
However, various statistics can be computed much more efficiently if the data source
has known properties or structure, that's why the following classes provide specialized
static implementations:</p>
<ul>
<li>
<strong>ArrayStatistics</strong> provides routines optimized for single-dimensional arrays. Some
of these routines end with the <code>Inplace</code> suffix, indicating that they reorder the
input array slightly towards being sorted during execution - without fully sorting
them, which could be expensive.
</li>
<li>
<strong>SortedArrayStatistics</strong> provides routines optimized for an array sorting ascendingly.
Especially order-statistics are very efficient this way, some even with constant time complexity.
</li>
<li>
<strong>StreamingStatistics</strong> processes large amounts of data without keeping them in memory.
Useful if data larger than local memory is streamed directly from a disk or network.
</li>
</ul>
<p>Another alternative, in case you need to gather a whole set of statistical characteristics
in one pass, is provided by the <code>DescriptiveStatistics</code> class:</p>
<table class="pre"><tr><td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">var</span> samples <span class="o">=</span> <span class="k">new</span> ChiSquare(<span class="n">5</span>).Samples().Take(<span class="n">1000</span>);
<span class="k">var</span> statistics <span class="o">=</span> <span class="k">new</span> DescriptiveStatistics(samples);
<span class="k">var</span> largestElement <span class="o">=</span> statistics.Maximum;
<span class="k">var</span> smallestElement <span class="o">=</span> statistics.Minimum;
<span class="k">var</span> median <span class="o">=</span> statistics.Median;
<span class="k">var</span> mean <span class="o">=</span> statistics.Mean;
<span class="k">var</span> variance <span class="o">=</span> statistics.Variance;
<span class="k">var</span> stdDev <span class="o">=</span> statistics.StandardDeviation;
<span class="k">var</span> kurtosis <span class="o">=</span> statistics.Kurtosis;
<span class="k">var</span> skewness <span class="o">=</span> statistics.Skewness;
</code></pre></td></tr></table>
<h2><a name="Minimum-amp-Maximum" class="anchor" href="#Minimum-amp-Maximum">Minimum &amp; Maximum</a></h2>
<p>The minimum and maximum values of a sample set can be evaluated with the <code>Minimum</code> and <code>Maximum</code>
functions of all four classes: <code>Statistics</code>, <code>ArrayStatistics</code>, <code>SortedArrayStatistics</code>
and <code>StreamingStatistics</code>. The one in <code>SortedArrayStatistics</code> is the fastest with constant
time complexity, but expects the array to be sorted ascendingly.</p>
<p>Both min and max are directly affected by outliers and are therefore no robust statistics at all.
For a more robust alternative, consider using Quantiles instead.</p>
<table class="pre"><tr><td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">var</span> samples <span class="o">=</span> <span class="k">new</span> ChiSquare(<span class="n">5</span>).Samples().Take(<span class="n">1000</span>).ToArray();
<span class="k">var</span> largestElement <span class="o">=</span> samples.Maximum();
<span class="k">var</span> smallestElement <span class="o">=</span> samples.Minimum();
</code></pre></td></tr></table>
<h2><a name="Mean" class="anchor" href="#Mean">Mean</a></h2>
<p>The <em>arithmetic mean</em> or <em>average</em> of the provided samples. In statistics, the sample mean is
a measure of the central tendency and estimates the expected value of the distribution.
The mean is affected by outliers, so if you need a more robust estimate consider to use the Median instead.</p>
<p><code>Statistics.Mean(data)</code>
<code>StreamingStatistics.Mean(stream)</code>
<code>ArrayStatistics.Mean(data)</code></p>
<p><span class="math">\[\overline{x} = \frac{1}{N}\sum_{i=1}^N x_i\]</span></p>
<pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs1', 1)" onmouseover="showTip(event, 'fs1', 1)" class="id">whiteNoise</span> <span class="o">=</span> <span class="id">Generate</span><span class="pn">.</span><span class="id">Normal</span><span class="pn">(</span><span class="n">1000</span><span class="pn">,</span> <span class="id">mean</span><span class="o">=</span><span class="n">10.0</span><span class="pn">,</span> <span class="id">standardDeviation</span><span class="o">=</span><span class="n">2.0</span><span class="pn">)</span>
<span class="fsi">val samples : float [] = [|12.90021939; 9.631515037; 7.810008046; 14.13301053; ...|] </span>
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Mean</span> <span onmouseout="hideTip(event, 'fs1', 2)" onmouseover="showTip(event, 'fs1', 2)" class="id">whiteNoise</span>
<span class="fsi">val it : float = 10.02162347</span>
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs2', 3)" onmouseover="showTip(event, 'fs2', 3)" class="id">wave</span> <span class="o">=</span> <span class="id">Generate</span><span class="pn">.</span><span class="id">Sinusoidal</span><span class="pn">(</span><span class="n">1000</span><span class="pn">,</span> <span class="id">samplingRate</span><span class="o">=</span><span class="n">100.</span><span class="pn">,</span> <span class="id">frequency</span><span class="o">=</span><span class="n">5.</span><span class="pn">,</span> <span class="id">amplitude</span><span class="o">=</span><span class="n">0.5</span><span class="pn">)</span>
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Mean</span> <span onmouseout="hideTip(event, 'fs2', 4)" onmouseover="showTip(event, 'fs2', 4)" class="id">wave</span>
<span class="fsi">val it : float = -4.133520783e-17</span>
</code></pre>
<h2><a name="Variance-and-Standard-Deviation" class="anchor" href="#Variance-and-Standard-Deviation">Variance and Standard Deviation</a></h2>
<p>Variance <span class="math">\(\sigma^2\)</span> and the Standard Deviation <span class="math">\(\sigma\)</span> are measures of how far the samples are spread out.</p>
<p>If the whole population is available, the functions with the Population-prefix
will evaluate the respective measures with an <span class="math">\(N\)</span> normalizer for a population of size <span class="math">\(N\)</span>.</p>
<p><code>Statistics.PopulationVariance(population)</code>
<code>Statistics.PopulationStandardDeviation(population)</code></p>
<p><span class="math">\[\sigma^2 = \frac{1}{N}\sum_{i=1}^N (x_i - \mu)^2\]</span></p>
<p>On the other hand, if only a sample of the full population is available, the functions
without the Population-prefix will estimate unbiased population measures by applying
Bessel's correction with an <span class="math">\(N-1\)</span> normalizer to a sample set of size <span class="math">\(N\)</span>.</p>
<p><code>Statistics.Variance(samples)</code>
<code>Statistics.StandardDeviation(samples)</code></p>
<p><span class="math">\[s^2 = \frac{1}{N-1}\sum_{i=1}^N (x_i - \overline{x})^2\]</span></p>
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">Variance</span> <span onmouseout="hideTip(event, 'fs1', 5)" onmouseover="showTip(event, 'fs1', 5)" class="id">whiteNoise</span>
<span class="fsi">val it : float = 3.819436094</span>
<span class="id">Statistics</span><span class="pn">.</span><span class="id">StandardDeviation</span> <span onmouseout="hideTip(event, 'fs1', 6)" onmouseover="showTip(event, 'fs1', 6)" class="id">whiteNoise</span>
<span class="fsi">val it : float = 1.954337764</span>
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Variance</span> <span onmouseout="hideTip(event, 'fs2', 7)" onmouseover="showTip(event, 'fs2', 7)" class="id">wave</span>
<span class="fsi">val it : float = 0.1251251251</span>
</code></pre>
<h4><a name="Combined-Routines" class="anchor" href="#Combined-Routines">Combined Routines</a></h4>
<p>Since mean and variance are often needed together, there are routines
that evaluate both in a single pass:</p>
<p><code>Statistics.MeanVariance(samples)</code>
<code>ArrayStatistics.MeanVariance(samples)</code>
<code>StreamingStatistics.MeanVariance(samples)</code></p>
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">MeanVariance</span> <span onmouseout="hideTip(event, 'fs1', 8)" onmouseover="showTip(event, 'fs1', 8)" class="id">whiteNoise</span>
<span class="fsi">val it : float * float = (10.02162347, 3.819436094)</span>
</code></pre>
<h2><a name="Covariance" class="anchor" href="#Covariance">Covariance</a></h2>
<p>The sample covariance is an estimation of the Covariance, a measure of how much two random
variables change together. Similarly to the variance above, there are two versions in order to
apply Bessel's correction to bias in case of sample data.</p>
<p><code>Statistics.Covariance(samples1, samples2)</code></p>
<p><span class="math">\[q = \frac{1}{N-1}\sum_{i=1}^N (x_i - \overline{x})(y_i - \overline{y})\]</span></p>
<p><code>Statistics.PopulationCovariance(population1, population2)</code></p>
<p><span class="math">\[q = \frac{1}{N}\sum_{i=1}^N (x_i - \mu_x)(y_i - \mu_y)\]</span></p>
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">Covariance</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 9)" onmouseover="showTip(event, 'fs1', 9)" class="id">whiteNoise</span><span class="pn">,</span> <span onmouseout="hideTip(event, 'fs1', 10)" onmouseover="showTip(event, 'fs1', 10)" class="id">whiteNoise</span><span class="pn">)</span>
<span class="fsi">val it : float = 3.819436094</span>
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Covariance</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 11)" onmouseover="showTip(event, 'fs1', 11)" class="id">whiteNoise</span><span class="pn">,</span> <span onmouseout="hideTip(event, 'fs2', 12)" onmouseover="showTip(event, 'fs2', 12)" class="id">wave</span><span class="pn">)</span>
<span class="fsi">val it : float = 0.04397985084</span>
</code></pre>
<h2><a name="Order-Statistics" class="anchor" href="#Order-Statistics">Order Statistics</a></h2>
<h4><a name="Order-Statistic" class="anchor" href="#Order-Statistic">Order Statistic</a></h4>
<p>The k-th order statistic of a sample set is the k-th smallest value. Note that,
as an exception to most of Math.NET Numerics, the order k is one-based, meaning
the smallest value is the order statistic of order 1 (there is no order 0).</p>
<p><code>Statistics.OrderStatistic(data, order)</code>
<code>SortedArrayStatistics.OrderStatistic(data, order)</code></p>
<p>If the samples are sorted ascendingly, this is trivial and can be evaluated in constant time,
which is what the <code>SortedArrayStatistics</code> implementation does.</p>
<p>If you have the samples in an array which is not (guaranteed to be) sorted,
but if it is fine if the array does incrementally get sorted over multiple calls,
you can also use the following in-place implementation. It is usually faster
than fully sorting the array, unless you need to compute it for more than a handful orders.</p>
<p><code>ArrayStatistics.OrderStatisticInplace(data, order)</code></p>
<p>For convenience there's also an option that returns a function <code>Func&lt;int, double&gt;</code>,
mapping from order to the resulting order statistic. Internally it sorts a copy of the
provided data and then on each invocation uses efficient sorted algorithms:</p>
<p><code>Statistics.OrderStatisticFunc(data)</code></p>
<p>Such Inplace and Func variants are a common pattern throughout the Statistics class
and also the rest of the library.</p>
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">OrderStatistic</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 13)" onmouseover="showTip(event, 'fs1', 13)" class="id">whiteNoise</span><span class="pn">,</span> <span class="n">1</span><span class="pn">)</span>
<span class="fsi">val it : float = 3.633070184</span>
<span class="id">Statistics</span><span class="pn">.</span><span class="id">OrderStatistic</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 14)" onmouseover="showTip(event, 'fs1', 14)" class="id">whiteNoise</span><span class="pn">,</span> <span class="n">1000</span><span class="pn">)</span>
<span class="fsi">val it : float = 16.65183566</span>
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs3', 15)" onmouseover="showTip(event, 'fs3', 15)" class="fn">os</span> <span class="o">=</span> <span class="id">Statistics</span><span class="pn">.</span><span class="id">orderStatisticFunc</span> <span onmouseout="hideTip(event, 'fs1', 16)" onmouseover="showTip(event, 'fs1', 16)" class="id">whiteNoise</span>
<span onmouseout="hideTip(event, 'fs3', 17)" onmouseover="showTip(event, 'fs3', 17)" class="fn">os</span> <span class="n">250</span>
<span class="fsi">val it : float = 8.645491746</span>
<span onmouseout="hideTip(event, 'fs3', 18)" onmouseover="showTip(event, 'fs3', 18)" class="fn">os</span> <span class="n">500</span>
<span class="fsi">val it : float = 10.11872428</span>
<span onmouseout="hideTip(event, 'fs3', 19)" onmouseover="showTip(event, 'fs3', 19)" class="fn">os</span> <span class="n">750</span>
<span class="fsi">val it : float = 11.33170746</span>
</code></pre>
<h4><a name="Median" class="anchor" href="#Median">Median</a></h4>
<p>Median is a robust indicator of central tendency and much less affected by outliers
than the sample mean. The median is estimated by the value exactly in the middle of
the sorted set of samples and thus separating the higher half of the data from the lower half.</p>
<p><code>Statistics.Median(data)</code>
<code>SortedArrayStatistics.Median(data)</code>
<code>ArrayStatistics.MedianInplace(data)</code></p>
<p>The median is only unique if the sample size is odd. This implementation internally
uses the default quantile definition, which is equivalent to mode 8 in R and is approximately
median-unbiased regardless of the sample distribution. If you need another convention, use
<code>QuantileCustom</code> instead, see below for details.</p>
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">Median</span> <span onmouseout="hideTip(event, 'fs1', 20)" onmouseover="showTip(event, 'fs1', 20)" class="id">whiteNoise</span>
<span class="fsi">val it : float = 10.11872428</span>
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Median</span> <span onmouseout="hideTip(event, 'fs2', 21)" onmouseover="showTip(event, 'fs2', 21)" class="id">wave</span>
<span class="fsi">val it : float = -2.452600839e-16</span>
</code></pre>
<h4><a name="Quartiles-and-the-5-number-summary" class="anchor" href="#Quartiles-and-the-5-number-summary">Quartiles and the 5-number summary</a></h4>
<p>Quartiles group the ascendingly sorted data into four equal groups, where each
group represents a quarter of the data. The lower quartile is estimated by
the middle number between the first two groups and the upper quartile by the middle
number between the remaining two groups. The middle number between the two middle groups
estimates the median as discussed above.</p>
<p><code>Statistics.LowerQuartile(data)</code>
<code>Statistics.UpperQuartile(data)</code>
<code>SortedArrayStatistics.LowerQuartile(data)</code>
<code>SortedArrayStatistics.UpperQuartile(data)</code>
<code>ArrayStatistics.LowerQuartileInplace(data)</code>
<code>ArrayStatistics.UpperQuartileInplace(data)</code></p>
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">LowerQuartile</span> <span onmouseout="hideTip(event, 'fs1', 22)" onmouseover="showTip(event, 'fs1', 22)" class="id">whiteNoise</span>
<span class="fsi">val it : float = 8.645491746</span>
<span class="id">Statistics</span><span class="pn">.</span><span class="id">UpperQuartile</span> <span onmouseout="hideTip(event, 'fs1', 23)" onmouseover="showTip(event, 'fs1', 23)" class="id">whiteNoise</span>
<span class="fsi">val it : float = 11.33213732</span>
</code></pre>
<p>Using that data we can provide a useful set of indicators usually named 5-number summary,
which consists of the minimum value, the lower quartile, the median, the upper quartile and
the maximum value. All these values can be visualized in the popular box plot diagrams.</p>
<p><code>Statistics.FiveNumberSummary(data)</code>
<code>SortedArrayStatistics.FiveNumberSummary(data)</code>
<code>ArrayStatistics.FiveNumberSummaryInplace(data)</code></p>
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">FiveNumberSummary</span> <span onmouseout="hideTip(event, 'fs1', 24)" onmouseover="showTip(event, 'fs1', 24)" class="id">whiteNoise</span>
<span class="fsi">val it : float [] = [|3.633070184; 8.645937823; 10.12165054; 11.33213732; 16.65183566|] </span>
<span class="id">Statistics</span><span class="pn">.</span><span class="id">FiveNumberSummary</span> <span onmouseout="hideTip(event, 'fs2', 25)" onmouseover="showTip(event, 'fs2', 25)" class="id">wave</span>
<span class="fsi">val it : float [] = [|-0.5; -0.3584185509; -2.452600839e-16; 0.3584185509; 0.5|] </span>
</code></pre>
<p>The difference between the upper and the lower quartile is called inter-quartile range (IQR)
and is a robust indicator of spread. In box plots the IQR is the total height of the box.</p>
<p><code>Statistics.InterquartileRange(data)</code>
<code>SortedArrayStatistics.InterquartileRange(data)</code>
<code>ArrayStatistics.InterquartileRangeInplace(data)</code></p>
<p>Just like median, quartiles use the default R8 quantile definition internally.</p>
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">InterquartileRange</span> <span onmouseout="hideTip(event, 'fs1', 26)" onmouseover="showTip(event, 'fs1', 26)" class="id">whiteNoise</span>
<span class="fsi">val it : float = 2.686199498</span>
</code></pre>
<h4><a name="Percentiles" class="anchor" href="#Percentiles">Percentiles</a></h4>
<p>Percentiles extend the concept further by grouping the sorted values into 100
equal groups and looking at the 101 places (0,1,..,100) between and around them.
The 0-percentile represents the minimum value, 25 the first quartile, 50 the median,
75 the upper quartile and 100 the maximum value.</p>
<p><code>Statistics.Percentile(data, p)</code>
<code>Statistics.PercentileFunc(data)</code>
<code>SortedArrayStatistics.Percentile(data, p)</code>
<code>ArrayStatistics.PercentileInplace(data, p)</code></p>
<p>Just like median, percentiles use the default R8 quantile definition internally.</p>
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">Percentile</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 27)" onmouseover="showTip(event, 'fs1', 27)" class="id">whiteNoise</span><span class="pn">,</span> <span class="n">5</span><span class="pn">)</span>
<span class="fsi">val it : float = 6.693373507</span>
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Percentile</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 28)" onmouseover="showTip(event, 'fs1', 28)" class="id">whiteNoise</span><span class="pn">,</span> <span class="n">98</span><span class="pn">)</span>
<span class="fsi">val it : float = 13.97580653</span>
</code></pre>
<h4><a name="Quantiles" class="anchor" href="#Quantiles">Quantiles</a></h4>
<p>Instead of grouping into 4 or 100 boxes, quantiles generalize the concept to an infinite number
of boxes and thus to arbitrary real numbers <span class="math">\(\tau\)</span> between 0.0 and 1.0, where 0.0 represents the
minimum value, 0.5 the median and 1.0 the maximum value. Quantiles are closely related to
the inverse cumulative distribution function of the sample distribution.</p>
<p><code>Statistics.Quantile(data, tau)</code>
<code>Statistics.QuantileFunc(data)</code>
<code>SortedArrayStatistics.Quantile(data, tau)</code>
<code>ArrayStatistics.QuantileInplace(data, tau)</code></p>
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">Quantile</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 29)" onmouseover="showTip(event, 'fs1', 29)" class="id">whiteNoise</span><span class="pn">,</span> <span class="n">0.98</span><span class="pn">)</span>
<span class="fsi">val it : float = 13.97580653</span>
</code></pre>
<h4><a name="Quantile-Conventions-and-Compatibility" class="anchor" href="#Quantile-Conventions-and-Compatibility">Quantile Conventions and Compatibility</a></h4>
<p>Remember that all these descriptive statistics do not <em>compute</em> but merely <em>estimate</em>
statistical indicators of the value distribution. In the case of quantiles,
there is usually not a single number between the two groups specified by <span class="math">\(\tau\)</span>.
There are multiple ways to deal with this: the R project supports 9 modes and Mathematica
and SciPy have their own way to parametrize the behavior.</p>
<p>The <code>QuantileCustom</code> functions support all 9 modes from the R-project, which includes the one
used by Microsoft Excel, and also the 4-parameter variant of Mathematica:</p>
<p><code>Statistics.QuantileCustom(data, tau, definition)</code>
<code>Statistics.QuantileCustomFunc(data, definition)</code>
<code>SortedArrayStatistics.QuantileCustom(data, tau, a, b, c, d)</code>
<code>SortedArrayStatistics.QuantileCustom(data, tau, definition)</code>
<code>ArrayStatistics.QuantileCustomInplace(data, tau, a, b, c, d)</code>
<code>ArrayStatistics.QuantileCustomInplace(data, tau, definition)</code></p>
<p>The <code>QuantileDefinition</code> enumeration has the following options:</p>
<ul>
<li><strong>R1</strong>, SAS3, EmpiricalInvCDF</li>
<li><strong>R2</strong>, SAS5, EmpiricalInvCDFAverage</li>
<li><strong>R3</strong>, SAS2, Nearest</li>
<li><strong>R4</strong>, SAS1, California</li>
<li><strong>R5</strong>, Hydrology, Hazen</li>
<li><strong>R6</strong>, SAS4, Nist, Weibull, SPSS</li>
<li><strong>R7</strong>, Excel, Mode, S</li>
<li><strong>R8</strong>, Median, Default</li>
<li>
<strong>R9</strong>, Normal
[lang=fsharp]
Statistics.QuantileCustom(whiteNoise, 0.98, QuantileDefinition.R3)
// [fsi:val it : float = 13.97113209]
Statistics.QuantileCustom(whiteNoise, 0.98, QuantileDefinition.Excel)
// [fsi:val it : float = 13.97127374]
</li>
</ul>
<h2><a name="Rank-Statistics" class="anchor" href="#Rank-Statistics">Rank Statistics</a></h2>
<h4><a name="Ranks" class="anchor" href="#Ranks">Ranks</a></h4>
<p>Rank statistics are the counterpart to order statistics. The <code>Ranks</code> function evaluates the rank
of each sample and returns them as an array of doubles. The return type is double instead of int
in order to deal with ties, if one of the values appears multiple times.
Similar to <code>QuantileDefinition</code>, the <code>RankDefinition</code> enumeration controls how ties should be handled:</p>
<ul>
<li><strong>Average</strong>, Default: Replace ties with their mean (causing non-integer ranks).</li>
<li><strong>Min</strong>, Sports: Replace ties with their minimum, as typical in sports ranking.</li>
<li><strong>Max</strong>: Replace ties with their maximum.</li>
<li><strong>First</strong>: Permutation with increasing values at each index of ties.</li>
<li><strong>EmpiricalCDF</strong></li>
</ul>
<p><code>Statistics.Ranks(data, definition)</code>
<code>SortedArrayStatistics.Ranks(data, definition)</code>
<code>ArrayStatistics.RanksInplace(data, definition)</code></p>
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">Ranks</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 30)" onmouseover="showTip(event, 'fs1', 30)" class="id">whiteNoise</span><span class="pn">)</span>
<span class="fsi">val it : float [] = [|634.0; 736.0; 405.0; 395.0; 197.0; 167.0; 722.0; 44.0; ...|] </span>
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Ranks</span><span class="pn">(</span><span class="pn">[|</span> <span class="n">13.0</span><span class="pn">;</span> <span class="n">14.0</span><span class="pn">;</span> <span class="n">11.0</span><span class="pn">;</span> <span class="n">12.0</span><span class="pn">;</span> <span class="n">13.0</span> <span class="pn">|]</span><span class="pn">,</span> <span class="id">RankDefinition</span><span class="pn">.</span><span class="id">Average</span><span class="pn">)</span>
<span class="fsi">val it : float [] = [|3.5; 5.0; 1.0; 2.0; 3.5|] </span>
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Ranks</span><span class="pn">(</span><span class="pn">[|</span> <span class="n">13.0</span><span class="pn">;</span> <span class="n">14.0</span><span class="pn">;</span> <span class="n">11.0</span><span class="pn">;</span> <span class="n">12.0</span><span class="pn">;</span> <span class="n">13.0</span> <span class="pn">|]</span><span class="pn">,</span> <span class="id">RankDefinition</span><span class="pn">.</span><span class="id">Sports</span><span class="pn">)</span>
<span class="fsi">val it : float [] = [|3.0; 5.0; 1.0; 2.0; 3.0|] </span>
</code></pre>
<h4><a name="Quantile-Rank" class="anchor" href="#Quantile-Rank">Quantile Rank</a></h4>
<p>Counterpart of the <code>Quantile</code> function, estimates <span class="math">\(\tau\)</span> of the provided <span class="math">\(\tau\)</span>-quantile value
<span class="math">\(x\)</span> from the provided samples. The <span class="math">\(\tau\)</span>-quantile is the data value where the cumulative distribution
function crosses <span class="math">\(\tau\)</span>.</p>
<p><code>Statistics.QuantileRank(data, x, definition)</code>
<code>Statistics.QuantileRankFunc(data, definition)</code>
<code>SortedArrayStatistics.QuantileRank(data, x, definition)</code></p>
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">QuantileRank</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 31)" onmouseover="showTip(event, 'fs1', 31)" class="id">whiteNoise</span><span class="pn">,</span> <span class="n">13.0</span><span class="pn">)</span>
<span class="fsi">val it : float = 0.9370045563</span>
<span class="id">Statistics</span><span class="pn">.</span><span class="id">QuantileRank</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 32)" onmouseover="showTip(event, 'fs1', 32)" class="id">whiteNoise</span><span class="pn">,</span> <span class="n">6.7</span><span class="pn">,</span> <span class="id">RankDefinition</span><span class="pn">.</span><span class="id">Average</span><span class="pn">)</span>
<span class="fsi">val it : float = 0.04960610389</span>
</code></pre>
<h2><a name="Empirical-Distribution-Functions" class="anchor" href="#Empirical-Distribution-Functions">Empirical Distribution Functions</a></h2>
<p><code>Statistics.EmpiricalCDF(data, x)</code>
<code>Statistics.EmpiricalCDFFunc(data)</code>
<code>Statistics.EmpiricalInvCDF(data, tau)</code>
<code>Statistics.EmpiricalInvCDFFunc(data)</code>
<code>SortedArrayStatistics.EmpiricalCDF(data, x)</code></p>
<pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs4', 33)" onmouseover="showTip(event, 'fs4', 33)" class="id">ecdf</span> <span class="o">=</span> <span class="id">Statistics</span><span class="pn">.</span><span class="id">EmpiricalCDFFunc</span> <span onmouseout="hideTip(event, 'fs1', 34)" onmouseover="showTip(event, 'fs1', 34)" class="id">whiteNoise</span>
<span class="id">Generate</span><span class="pn">.</span><span class="id">LinearSpacedMap</span><span class="pn">(</span><span class="n">20</span><span class="pn">,</span> <span class="id">start</span><span class="o">=</span><span class="n">3.0</span><span class="pn">,</span> <span class="id">stop</span><span class="o">=</span><span class="n">17.0</span><span class="pn">,</span> <span class="id">map</span><span class="o">=</span><span onmouseout="hideTip(event, 'fs4', 35)" onmouseover="showTip(event, 'fs4', 35)" class="id">ecdf</span><span class="pn">)</span>
<span class="fsi">val it : float [] =</span>
<span class="fsi"> [|0.0; 0.001; 0.002; 0.005; 0.022; 0.05; 0.094; 0.172; 0.278; 0.423; 0.555; </span>
<span class="fsi"> 0.705; 0.843; 0.921; 0.944; 0.983; 0.992; 0.997; 0.999; 1.0|] </span>
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs5', 36)" onmouseover="showTip(event, 'fs5', 36)" class="fn">eicdf</span> <span class="o">=</span> <span class="id">Statistics</span><span class="pn">.</span><span class="id">empiricalInvCDFFunc</span> <span onmouseout="hideTip(event, 'fs1', 37)" onmouseover="showTip(event, 'fs1', 37)" class="id">whiteNoise</span>
<span class="pn">[</span> <span class="k">for</span> <span onmouseout="hideTip(event, 'fs6', 38)" onmouseover="showTip(event, 'fs6', 38)" class="fn">tau</span> <span class="k">in</span> <span class="o">0.0</span><span class="o">..</span><span class="n">0.05</span><span class="o">..</span><span class="n">1.0</span> <span class="k">-&gt;</span> <span onmouseout="hideTip(event, 'fs5', 39)" onmouseover="showTip(event, 'fs5', 39)" class="fn">eicdf</span> <span onmouseout="hideTip(event, 'fs6', 40)" onmouseover="showTip(event, 'fs6', 40)" class="fn">tau</span> <span class="pn">]</span>
<span class="fsi">val it : float [] =</span>
<span class="fsi"> [3.633070184; 6.682142043; 7.520000817; 8.040513497; 8.347587493; </span>
<span class="fsi"> 8.645491746; 9.02681611; 9.298987151; 9.522627142; 9.819352699; 10.11872428; </span>
<span class="fsi"> 10.35991046; 10.57530906; 10.8259542; 11.08605473; 11.33170746; 11.54356436; </span>
<span class="fsi"> 11.90973541; 12.4294346; 13.36889423; 16.65183566] </span>
</code></pre>
<h2><a name="Histograms" class="anchor" href="#Histograms">Histograms</a></h2>
<p>A histogram can be computed using the <a href="https://numerics.mathdotnet.com/api/MathNet.Numerics.Statistics/Histogram.htm">Histogram</a> class. Its constructor takes
the samples enumerable, the number of buckets to create, plus optionally the range
(minimum, maximum) of the sample data if available.</p>
<table class="pre"><tr><td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">var</span> histogram <span class="o">=</span> <span class="k">new</span> Histogram(samples, <span class="n">10</span>);
<span class="k">var</span> bucket<span class="n">3</span>count <span class="o">=</span> histogram[<span class="n">2</span>].Count;
</code></pre></td></tr></table>
<h2><a name="Correlation" class="anchor" href="#Correlation">Correlation</a></h2>
<p>The <code>Correlation</code> class supports computing Pearson's product-momentum and Spearman's ranked
correlation coefficient, as well as their correlation matrix for a set of vectors.</p>
<p>Code Sample: Computing the correlation coefficient of 1000 samples of f(x) = 2x and g(x) = x^2:</p>
<table class="pre"><tr><td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">double</span>[] dataF <span class="o">=</span> Generate.LinearSpacedMap(<span class="n">1000</span>, <span class="n">0</span>, <span class="n">100</span>, x <span class="o">=</span><span class="o">&gt;</span> <span class="n">2</span>*x);
<span class="k">double</span>[] dataG <span class="o">=</span> Generate.LinearSpacedMap(<span class="n">1000</span>, <span class="n">0</span>, <span class="n">100</span>, x <span class="o">=</span><span class="o">&gt;</span> x*x);
<span class="k">double</span> correlation <span class="o">=</span> Correlation.Pearson(dataF, dataG);
</code></pre></td></tr></table>
<div class="fsdocs-tip" id="fs1">val whiteNoise : obj</div>
<div class="fsdocs-tip" id="fs2">val wave : obj</div>
<div class="fsdocs-tip" id="fs3">val os : (int -&gt; obj)</div>
<div class="fsdocs-tip" id="fs4">val ecdf : obj</div>
<div class="fsdocs-tip" id="fs5">val eicdf : (float -&gt; obj)</div>
<div class="fsdocs-tip" id="fs6">val tau : float</div>
</div>
<!-- BEGIN SEARCH BOX: this adds support for the search box -->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/JavaScript-autoComplete/1.0.4/auto-complete.css" />
<script type="text/javascript">var fsdocs_search_baseurl = 'https://numerics.mathdotnet.com/';</script>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/lunr.js/2.3.8/lunr.min.js"></script>
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/JavaScript-autoComplete/1.0.4/auto-complete.min.js"></script>
<script type="text/javascript" src="https://numerics.mathdotnet.com/content/fsdocs-search.js"></script>
<!-- END SEARCH BOX: this adds support for the search box -->
</div>
</body>
</html>