csharpfftfsharpintegrationinterpolationlinear-algebramathdifferentiationmatrixnumericsrandomregressionstatisticsmathnet
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
471 lines
41 KiB
471 lines
41 KiB
<!DOCTYPE html>
|
|
<html lang="en">
|
|
|
|
<head>
|
|
<meta charset="utf-8">
|
|
<title>Descriptive Statistics
|
|
</title>
|
|
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
|
|
<meta name="author" content="Christoph Ruegg, Marcus Cuda, Jurgen Van Gael">
|
|
|
|
<link rel="stylesheet" id="theme_link" href="https://cdnjs.cloudflare.com/ajax/libs/bootswatch/4.6.0/materia/bootstrap.min.css">
|
|
<script src="https://code.jquery.com/jquery-3.4.1.min.js"></script>
|
|
<script src="https://cdn.jsdelivr.net/npm/bootstrap@4.6.0/dist/js/bootstrap.bundle.min.js" integrity="sha384-Piv4xVNRyMGpqkS2by6br4gNJ7DXjqk09RmUpJ8jgGtD7zP9yug3goQfGII0yAns" crossorigin="anonymous"></script>
|
|
|
|
<script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML"></script>
|
|
|
|
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico">
|
|
<link type="text/css" rel="stylesheet" href="https://numerics.mathdotnet.com/content/navbar-fixed-left.css" />
|
|
<link type="text/css" rel="stylesheet" href="https://numerics.mathdotnet.com/content/fsdocs-default.css" />
|
|
<link type="text/css" rel="stylesheet" href="https://numerics.mathdotnet.com/content/fsdocs-custom.css" />
|
|
<script type="text/javascript" src="https://numerics.mathdotnet.com/content/fsdocs-tips.js"></script>
|
|
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
|
|
<!--[if lt IE 9]>
|
|
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
|
|
<![endif]-->
|
|
<!-- BEGIN SEARCH BOX: this adds support for the search box -->
|
|
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/JavaScript-autoComplete/1.0.4/auto-complete.css" />
|
|
<!-- END SEARCH BOX: this adds support for the search box -->
|
|
|
|
</head>
|
|
|
|
<body>
|
|
<nav class="navbar navbar-expand-md navbar-light bg-secondary fixed-left" id="fsdocs-nav">
|
|
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarsExampleDefault" aria-controls="navbarsExampleDefault" aria-expanded="false" aria-label="Toggle navigation">
|
|
<span class="navbar-toggler-icon"></span>
|
|
</button>
|
|
<div class="collapse navbar-collapse navbar-nav-scroll" id="navbarsExampleDefault">
|
|
<a href="https://numerics.mathdotnet.com/"><img id="fsdocs-logo" src="/logo.png" /></a>
|
|
<!-- BEGIN SEARCH BOX: this adds support for the search box -->
|
|
<div id="header">
|
|
<div class="searchbox" id="fsdocs-searchbox">
|
|
<label for="search-by">
|
|
<i class="fas fa-search"></i>
|
|
</label>
|
|
<input data-search-input="" id="search-by" type="search" placeholder="Search..." />
|
|
<span data-search-clear="">
|
|
<i class="fas fa-times"></i>
|
|
</span>
|
|
</div>
|
|
</div>
|
|
<!-- END SEARCH BOX: this adds support for the search box -->
|
|
<ul class="navbar-nav">
|
|
<li class="nav-header">Math.NET Numerics</li>
|
|
<li class="nav-item"><a class="nav-link" href="Packages.html">NuGet & Binaries</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="ReleaseNotes.html">Release Notes</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="https://github.com/mathnet/mathnet-numerics/blob/master/LICENSE.md">MIT License</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="Compatibility.html">Platform Support</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="https://numerics.mathdotnet.com/api/">Class Reference</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="https://github.com/mathnet/mathnet-numerics/issues">Issues & Bugs</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="Users.html">Who is using Math.NET?</a></li>
|
|
|
|
<li class="nav-header">Contributing</li>
|
|
<li class="nav-item"><a class="nav-link" href="Contributors.html">Contributors</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="Contributing.html">Contributing</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="Build.html">Build & Tools</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="https://github.com/mathnet/mathnet-numerics/discussions/categories/ideas">Your Ideas</a></li>
|
|
|
|
<li class="nav-header">Getting Help</li>
|
|
<li class="nav-item"><a class="nav-link" href="https://discuss.mathdotnet.com/c/numerics">Discuss</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="https://stackoverflow.com/questions/tagged/mathdotnet">Stack Overflow</a></li>
|
|
|
|
<li class="nav-header">Getting Started</li>
|
|
<l class="nav-item"i><a class="nav-link" href="/">Getting started</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="Constants.html">Constants</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="Matrix.html">Matrices and Vectors</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="Euclid.html">Euclid & Number Theory</a></li>
|
|
<li class="nav-item">Combinatorics</li>
|
|
|
|
<li class="nav-header">Evaluation</li>
|
|
<li class="nav-item"><a class="nav-link" href="Functions.html">Special Functions</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="Integration.html">Integration</a></li>
|
|
|
|
<li class="nav-header">Statistics/Probability</li>
|
|
<li class="nav-item"><a class="nav-link" href="DescriptiveStatistics.html">Descriptive Statistics</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="Probability.html">Probability Distributions</a></li>
|
|
|
|
<li class="nav-header">Generation</li>
|
|
<li class="nav-item"><a class="nav-link" href="Generate.html">Generating Data</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="Random.html">Random Numbers</a></li>
|
|
|
|
<li class="nav-header">Solving Equations</li>
|
|
<li class="nav-item"><a class="nav-link" href="LinearEquations.html">Linear Equation Systems</a></li>
|
|
|
|
<li class="nav-header">Optimization</li>
|
|
<li class="nav-item"><a class="nav-link" href="Distance.html">Distance Metrics</a></li>
|
|
|
|
<li class="nav-header">Curve Fitting</li>
|
|
<li class="nav-item"><a class="nav-link" href="Regression.html">Regression</a></li>
|
|
|
|
<li class="nav-header">Native Providers</li>
|
|
<li class="nav-item"><a class="nav-link" href="MKL.html">Intel MKL</a></li>
|
|
|
|
<li class="nav-header">Working Together</li>
|
|
<li class="nav-item"><a class="nav-link" href="CSV.html">Delimited Text Files (CSV)</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="MatrixMarket.html">NIST MatrixMarket</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="MatlabFiles.html">MATLAB</a></li>
|
|
<li class="nav-item"><a class="nav-link" href="IFSharpNotebook.html">IF# Notebook</a></li>
|
|
</ul>
|
|
</div>
|
|
</nav>
|
|
<div class="container">
|
|
<div class="masthead">
|
|
<h3 class="muted">
|
|
<a href="https://numerics.mathdotnet.com">Math.NET Numerics</a> |
|
|
<a href="https://www.mathdotnet.com">Math.NET Project</a> |
|
|
<a href="https://github.com/mathnet/mathnet-numerics">GitHub</a>
|
|
</h3>
|
|
</div>
|
|
<hr />
|
|
<div class="container" id="fsdocs-content">
|
|
<h1><a name="Descriptive-Statistics" class="anchor" href="#Descriptive-Statistics">Descriptive Statistics</a></h1>
|
|
<h2><a name="Initialization" class="anchor" href="#Initialization">Initialization</a></h2>
|
|
<p>We need to reference Math.NET Numerics and open the statistics namespace:</p>
|
|
<table class="pre"><tr><td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">using</span> MathNet.Numerics.Statistics;
|
|
</code></pre></td></tr></table>
|
|
<h2><a name="Univariate-Statistical-Analysis" class="anchor" href="#Univariate-Statistical-Analysis">Univariate Statistical Analysis</a></h2>
|
|
<p>The primary class for statistical analysis is <code>Statistics</code> which provides common
|
|
descriptive statics as static extension methods to <code>IEnumerable<double></code> sequences.
|
|
However, various statistics can be computed much more efficiently if the data source
|
|
has known properties or structure, that's why the following classes provide specialized
|
|
static implementations:</p>
|
|
<ul>
|
|
<li>
|
|
<strong>ArrayStatistics</strong> provides routines optimized for single-dimensional arrays. Some
|
|
of these routines end with the <code>Inplace</code> suffix, indicating that they reorder the
|
|
input array slightly towards being sorted during execution - without fully sorting
|
|
them, which could be expensive.
|
|
</li>
|
|
<li>
|
|
<strong>SortedArrayStatistics</strong> provides routines optimized for an array sorting ascendingly.
|
|
Especially order-statistics are very efficient this way, some even with constant time complexity.
|
|
</li>
|
|
<li>
|
|
<strong>StreamingStatistics</strong> processes large amounts of data without keeping them in memory.
|
|
Useful if data larger than local memory is streamed directly from a disk or network.
|
|
</li>
|
|
</ul>
|
|
<p>Another alternative, in case you need to gather a whole set of statistical characteristics
|
|
in one pass, is provided by the <code>DescriptiveStatistics</code> class:</p>
|
|
<table class="pre"><tr><td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">var</span> samples <span class="o">=</span> <span class="k">new</span> ChiSquare(<span class="n">5</span>).Samples().Take(<span class="n">1000</span>);
|
|
<span class="k">var</span> statistics <span class="o">=</span> <span class="k">new</span> DescriptiveStatistics(samples);
|
|
|
|
<span class="k">var</span> largestElement <span class="o">=</span> statistics.Maximum;
|
|
<span class="k">var</span> smallestElement <span class="o">=</span> statistics.Minimum;
|
|
<span class="k">var</span> median <span class="o">=</span> statistics.Median;
|
|
|
|
<span class="k">var</span> mean <span class="o">=</span> statistics.Mean;
|
|
<span class="k">var</span> variance <span class="o">=</span> statistics.Variance;
|
|
<span class="k">var</span> stdDev <span class="o">=</span> statistics.StandardDeviation;
|
|
|
|
<span class="k">var</span> kurtosis <span class="o">=</span> statistics.Kurtosis;
|
|
<span class="k">var</span> skewness <span class="o">=</span> statistics.Skewness;
|
|
</code></pre></td></tr></table>
|
|
<h2><a name="Minimum-amp-Maximum" class="anchor" href="#Minimum-amp-Maximum">Minimum & Maximum</a></h2>
|
|
<p>The minimum and maximum values of a sample set can be evaluated with the <code>Minimum</code> and <code>Maximum</code>
|
|
functions of all four classes: <code>Statistics</code>, <code>ArrayStatistics</code>, <code>SortedArrayStatistics</code>
|
|
and <code>StreamingStatistics</code>. The one in <code>SortedArrayStatistics</code> is the fastest with constant
|
|
time complexity, but expects the array to be sorted ascendingly.</p>
|
|
<p>Both min and max are directly affected by outliers and are therefore no robust statistics at all.
|
|
For a more robust alternative, consider using Quantiles instead.</p>
|
|
<table class="pre"><tr><td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">var</span> samples <span class="o">=</span> <span class="k">new</span> ChiSquare(<span class="n">5</span>).Samples().Take(<span class="n">1000</span>).ToArray();
|
|
<span class="k">var</span> largestElement <span class="o">=</span> samples.Maximum();
|
|
<span class="k">var</span> smallestElement <span class="o">=</span> samples.Minimum();
|
|
</code></pre></td></tr></table>
|
|
<h2><a name="Mean" class="anchor" href="#Mean">Mean</a></h2>
|
|
<p>The <em>arithmetic mean</em> or <em>average</em> of the provided samples. In statistics, the sample mean is
|
|
a measure of the central tendency and estimates the expected value of the distribution.
|
|
The mean is affected by outliers, so if you need a more robust estimate consider to use the Median instead.</p>
|
|
<p><code>Statistics.Mean(data)</code>
|
|
<code>StreamingStatistics.Mean(stream)</code>
|
|
<code>ArrayStatistics.Mean(data)</code></p>
|
|
<p><span class="math">\[\overline{x} = \frac{1}{N}\sum_{i=1}^N x_i\]</span></p>
|
|
<pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs1', 1)" onmouseover="showTip(event, 'fs1', 1)" class="id">whiteNoise</span> <span class="o">=</span> <span class="id">Generate</span><span class="pn">.</span><span class="id">Normal</span><span class="pn">(</span><span class="n">1000</span><span class="pn">,</span> <span class="id">mean</span><span class="o">=</span><span class="n">10.0</span><span class="pn">,</span> <span class="id">standardDeviation</span><span class="o">=</span><span class="n">2.0</span><span class="pn">)</span>
|
|
<span class="fsi">val samples : float [] = [|12.90021939; 9.631515037; 7.810008046; 14.13301053; ...|] </span>
|
|
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Mean</span> <span onmouseout="hideTip(event, 'fs1', 2)" onmouseover="showTip(event, 'fs1', 2)" class="id">whiteNoise</span>
|
|
<span class="fsi">val it : float = 10.02162347</span>
|
|
|
|
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs2', 3)" onmouseover="showTip(event, 'fs2', 3)" class="id">wave</span> <span class="o">=</span> <span class="id">Generate</span><span class="pn">.</span><span class="id">Sinusoidal</span><span class="pn">(</span><span class="n">1000</span><span class="pn">,</span> <span class="id">samplingRate</span><span class="o">=</span><span class="n">100.</span><span class="pn">,</span> <span class="id">frequency</span><span class="o">=</span><span class="n">5.</span><span class="pn">,</span> <span class="id">amplitude</span><span class="o">=</span><span class="n">0.5</span><span class="pn">)</span>
|
|
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Mean</span> <span onmouseout="hideTip(event, 'fs2', 4)" onmouseover="showTip(event, 'fs2', 4)" class="id">wave</span>
|
|
<span class="fsi">val it : float = -4.133520783e-17</span>
|
|
</code></pre>
|
|
<h2><a name="Variance-and-Standard-Deviation" class="anchor" href="#Variance-and-Standard-Deviation">Variance and Standard Deviation</a></h2>
|
|
<p>Variance <span class="math">\(\sigma^2\)</span> and the Standard Deviation <span class="math">\(\sigma\)</span> are measures of how far the samples are spread out.</p>
|
|
<p>If the whole population is available, the functions with the Population-prefix
|
|
will evaluate the respective measures with an <span class="math">\(N\)</span> normalizer for a population of size <span class="math">\(N\)</span>.</p>
|
|
<p><code>Statistics.PopulationVariance(population)</code>
|
|
<code>Statistics.PopulationStandardDeviation(population)</code></p>
|
|
<p><span class="math">\[\sigma^2 = \frac{1}{N}\sum_{i=1}^N (x_i - \mu)^2\]</span></p>
|
|
<p>On the other hand, if only a sample of the full population is available, the functions
|
|
without the Population-prefix will estimate unbiased population measures by applying
|
|
Bessel's correction with an <span class="math">\(N-1\)</span> normalizer to a sample set of size <span class="math">\(N\)</span>.</p>
|
|
<p><code>Statistics.Variance(samples)</code>
|
|
<code>Statistics.StandardDeviation(samples)</code></p>
|
|
<p><span class="math">\[s^2 = \frac{1}{N-1}\sum_{i=1}^N (x_i - \overline{x})^2\]</span></p>
|
|
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">Variance</span> <span onmouseout="hideTip(event, 'fs1', 5)" onmouseover="showTip(event, 'fs1', 5)" class="id">whiteNoise</span>
|
|
<span class="fsi">val it : float = 3.819436094</span>
|
|
<span class="id">Statistics</span><span class="pn">.</span><span class="id">StandardDeviation</span> <span onmouseout="hideTip(event, 'fs1', 6)" onmouseover="showTip(event, 'fs1', 6)" class="id">whiteNoise</span>
|
|
<span class="fsi">val it : float = 1.954337764</span>
|
|
|
|
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Variance</span> <span onmouseout="hideTip(event, 'fs2', 7)" onmouseover="showTip(event, 'fs2', 7)" class="id">wave</span>
|
|
<span class="fsi">val it : float = 0.1251251251</span>
|
|
</code></pre>
|
|
<h4><a name="Combined-Routines" class="anchor" href="#Combined-Routines">Combined Routines</a></h4>
|
|
<p>Since mean and variance are often needed together, there are routines
|
|
that evaluate both in a single pass:</p>
|
|
<p><code>Statistics.MeanVariance(samples)</code>
|
|
<code>ArrayStatistics.MeanVariance(samples)</code>
|
|
<code>StreamingStatistics.MeanVariance(samples)</code></p>
|
|
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">MeanVariance</span> <span onmouseout="hideTip(event, 'fs1', 8)" onmouseover="showTip(event, 'fs1', 8)" class="id">whiteNoise</span>
|
|
<span class="fsi">val it : float * float = (10.02162347, 3.819436094)</span>
|
|
</code></pre>
|
|
<h2><a name="Covariance" class="anchor" href="#Covariance">Covariance</a></h2>
|
|
<p>The sample covariance is an estimation of the Covariance, a measure of how much two random
|
|
variables change together. Similarly to the variance above, there are two versions in order to
|
|
apply Bessel's correction to bias in case of sample data.</p>
|
|
<p><code>Statistics.Covariance(samples1, samples2)</code></p>
|
|
<p><span class="math">\[q = \frac{1}{N-1}\sum_{i=1}^N (x_i - \overline{x})(y_i - \overline{y})\]</span></p>
|
|
<p><code>Statistics.PopulationCovariance(population1, population2)</code></p>
|
|
<p><span class="math">\[q = \frac{1}{N}\sum_{i=1}^N (x_i - \mu_x)(y_i - \mu_y)\]</span></p>
|
|
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">Covariance</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 9)" onmouseover="showTip(event, 'fs1', 9)" class="id">whiteNoise</span><span class="pn">,</span> <span onmouseout="hideTip(event, 'fs1', 10)" onmouseover="showTip(event, 'fs1', 10)" class="id">whiteNoise</span><span class="pn">)</span>
|
|
<span class="fsi">val it : float = 3.819436094</span>
|
|
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Covariance</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 11)" onmouseover="showTip(event, 'fs1', 11)" class="id">whiteNoise</span><span class="pn">,</span> <span onmouseout="hideTip(event, 'fs2', 12)" onmouseover="showTip(event, 'fs2', 12)" class="id">wave</span><span class="pn">)</span>
|
|
<span class="fsi">val it : float = 0.04397985084</span>
|
|
</code></pre>
|
|
<h2><a name="Order-Statistics" class="anchor" href="#Order-Statistics">Order Statistics</a></h2>
|
|
<h4><a name="Order-Statistic" class="anchor" href="#Order-Statistic">Order Statistic</a></h4>
|
|
<p>The k-th order statistic of a sample set is the k-th smallest value. Note that,
|
|
as an exception to most of Math.NET Numerics, the order k is one-based, meaning
|
|
the smallest value is the order statistic of order 1 (there is no order 0).</p>
|
|
<p><code>Statistics.OrderStatistic(data, order)</code>
|
|
<code>SortedArrayStatistics.OrderStatistic(data, order)</code></p>
|
|
<p>If the samples are sorted ascendingly, this is trivial and can be evaluated in constant time,
|
|
which is what the <code>SortedArrayStatistics</code> implementation does.</p>
|
|
<p>If you have the samples in an array which is not (guaranteed to be) sorted,
|
|
but if it is fine if the array does incrementally get sorted over multiple calls,
|
|
you can also use the following in-place implementation. It is usually faster
|
|
than fully sorting the array, unless you need to compute it for more than a handful orders.</p>
|
|
<p><code>ArrayStatistics.OrderStatisticInplace(data, order)</code></p>
|
|
<p>For convenience there's also an option that returns a function <code>Func<int, double></code>,
|
|
mapping from order to the resulting order statistic. Internally it sorts a copy of the
|
|
provided data and then on each invocation uses efficient sorted algorithms:</p>
|
|
<p><code>Statistics.OrderStatisticFunc(data)</code></p>
|
|
<p>Such Inplace and Func variants are a common pattern throughout the Statistics class
|
|
and also the rest of the library.</p>
|
|
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">OrderStatistic</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 13)" onmouseover="showTip(event, 'fs1', 13)" class="id">whiteNoise</span><span class="pn">,</span> <span class="n">1</span><span class="pn">)</span>
|
|
<span class="fsi">val it : float = 3.633070184</span>
|
|
<span class="id">Statistics</span><span class="pn">.</span><span class="id">OrderStatistic</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 14)" onmouseover="showTip(event, 'fs1', 14)" class="id">whiteNoise</span><span class="pn">,</span> <span class="n">1000</span><span class="pn">)</span>
|
|
<span class="fsi">val it : float = 16.65183566</span>
|
|
|
|
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs3', 15)" onmouseover="showTip(event, 'fs3', 15)" class="fn">os</span> <span class="o">=</span> <span class="id">Statistics</span><span class="pn">.</span><span class="id">orderStatisticFunc</span> <span onmouseout="hideTip(event, 'fs1', 16)" onmouseover="showTip(event, 'fs1', 16)" class="id">whiteNoise</span>
|
|
<span onmouseout="hideTip(event, 'fs3', 17)" onmouseover="showTip(event, 'fs3', 17)" class="fn">os</span> <span class="n">250</span>
|
|
<span class="fsi">val it : float = 8.645491746</span>
|
|
<span onmouseout="hideTip(event, 'fs3', 18)" onmouseover="showTip(event, 'fs3', 18)" class="fn">os</span> <span class="n">500</span>
|
|
<span class="fsi">val it : float = 10.11872428</span>
|
|
<span onmouseout="hideTip(event, 'fs3', 19)" onmouseover="showTip(event, 'fs3', 19)" class="fn">os</span> <span class="n">750</span>
|
|
<span class="fsi">val it : float = 11.33170746</span>
|
|
</code></pre>
|
|
<h4><a name="Median" class="anchor" href="#Median">Median</a></h4>
|
|
<p>Median is a robust indicator of central tendency and much less affected by outliers
|
|
than the sample mean. The median is estimated by the value exactly in the middle of
|
|
the sorted set of samples and thus separating the higher half of the data from the lower half.</p>
|
|
<p><code>Statistics.Median(data)</code>
|
|
<code>SortedArrayStatistics.Median(data)</code>
|
|
<code>ArrayStatistics.MedianInplace(data)</code></p>
|
|
<p>The median is only unique if the sample size is odd. This implementation internally
|
|
uses the default quantile definition, which is equivalent to mode 8 in R and is approximately
|
|
median-unbiased regardless of the sample distribution. If you need another convention, use
|
|
<code>QuantileCustom</code> instead, see below for details.</p>
|
|
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">Median</span> <span onmouseout="hideTip(event, 'fs1', 20)" onmouseover="showTip(event, 'fs1', 20)" class="id">whiteNoise</span>
|
|
<span class="fsi">val it : float = 10.11872428</span>
|
|
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Median</span> <span onmouseout="hideTip(event, 'fs2', 21)" onmouseover="showTip(event, 'fs2', 21)" class="id">wave</span>
|
|
<span class="fsi">val it : float = -2.452600839e-16</span>
|
|
</code></pre>
|
|
<h4><a name="Quartiles-and-the-5-number-summary" class="anchor" href="#Quartiles-and-the-5-number-summary">Quartiles and the 5-number summary</a></h4>
|
|
<p>Quartiles group the ascendingly sorted data into four equal groups, where each
|
|
group represents a quarter of the data. The lower quartile is estimated by
|
|
the middle number between the first two groups and the upper quartile by the middle
|
|
number between the remaining two groups. The middle number between the two middle groups
|
|
estimates the median as discussed above.</p>
|
|
<p><code>Statistics.LowerQuartile(data)</code>
|
|
<code>Statistics.UpperQuartile(data)</code>
|
|
<code>SortedArrayStatistics.LowerQuartile(data)</code>
|
|
<code>SortedArrayStatistics.UpperQuartile(data)</code>
|
|
<code>ArrayStatistics.LowerQuartileInplace(data)</code>
|
|
<code>ArrayStatistics.UpperQuartileInplace(data)</code></p>
|
|
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">LowerQuartile</span> <span onmouseout="hideTip(event, 'fs1', 22)" onmouseover="showTip(event, 'fs1', 22)" class="id">whiteNoise</span>
|
|
<span class="fsi">val it : float = 8.645491746</span>
|
|
<span class="id">Statistics</span><span class="pn">.</span><span class="id">UpperQuartile</span> <span onmouseout="hideTip(event, 'fs1', 23)" onmouseover="showTip(event, 'fs1', 23)" class="id">whiteNoise</span>
|
|
<span class="fsi">val it : float = 11.33213732</span>
|
|
</code></pre>
|
|
<p>Using that data we can provide a useful set of indicators usually named 5-number summary,
|
|
which consists of the minimum value, the lower quartile, the median, the upper quartile and
|
|
the maximum value. All these values can be visualized in the popular box plot diagrams.</p>
|
|
<p><code>Statistics.FiveNumberSummary(data)</code>
|
|
<code>SortedArrayStatistics.FiveNumberSummary(data)</code>
|
|
<code>ArrayStatistics.FiveNumberSummaryInplace(data)</code></p>
|
|
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">FiveNumberSummary</span> <span onmouseout="hideTip(event, 'fs1', 24)" onmouseover="showTip(event, 'fs1', 24)" class="id">whiteNoise</span>
|
|
<span class="fsi">val it : float [] = [|3.633070184; 8.645937823; 10.12165054; 11.33213732; 16.65183566|] </span>
|
|
<span class="id">Statistics</span><span class="pn">.</span><span class="id">FiveNumberSummary</span> <span onmouseout="hideTip(event, 'fs2', 25)" onmouseover="showTip(event, 'fs2', 25)" class="id">wave</span>
|
|
<span class="fsi">val it : float [] = [|-0.5; -0.3584185509; -2.452600839e-16; 0.3584185509; 0.5|] </span>
|
|
</code></pre>
|
|
<p>The difference between the upper and the lower quartile is called inter-quartile range (IQR)
|
|
and is a robust indicator of spread. In box plots the IQR is the total height of the box.</p>
|
|
<p><code>Statistics.InterquartileRange(data)</code>
|
|
<code>SortedArrayStatistics.InterquartileRange(data)</code>
|
|
<code>ArrayStatistics.InterquartileRangeInplace(data)</code></p>
|
|
<p>Just like median, quartiles use the default R8 quantile definition internally.</p>
|
|
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">InterquartileRange</span> <span onmouseout="hideTip(event, 'fs1', 26)" onmouseover="showTip(event, 'fs1', 26)" class="id">whiteNoise</span>
|
|
<span class="fsi">val it : float = 2.686199498</span>
|
|
</code></pre>
|
|
<h4><a name="Percentiles" class="anchor" href="#Percentiles">Percentiles</a></h4>
|
|
<p>Percentiles extend the concept further by grouping the sorted values into 100
|
|
equal groups and looking at the 101 places (0,1,..,100) between and around them.
|
|
The 0-percentile represents the minimum value, 25 the first quartile, 50 the median,
|
|
75 the upper quartile and 100 the maximum value.</p>
|
|
<p><code>Statistics.Percentile(data, p)</code>
|
|
<code>Statistics.PercentileFunc(data)</code>
|
|
<code>SortedArrayStatistics.Percentile(data, p)</code>
|
|
<code>ArrayStatistics.PercentileInplace(data, p)</code></p>
|
|
<p>Just like median, percentiles use the default R8 quantile definition internally.</p>
|
|
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">Percentile</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 27)" onmouseover="showTip(event, 'fs1', 27)" class="id">whiteNoise</span><span class="pn">,</span> <span class="n">5</span><span class="pn">)</span>
|
|
<span class="fsi">val it : float = 6.693373507</span>
|
|
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Percentile</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 28)" onmouseover="showTip(event, 'fs1', 28)" class="id">whiteNoise</span><span class="pn">,</span> <span class="n">98</span><span class="pn">)</span>
|
|
<span class="fsi">val it : float = 13.97580653</span>
|
|
</code></pre>
|
|
<h4><a name="Quantiles" class="anchor" href="#Quantiles">Quantiles</a></h4>
|
|
<p>Instead of grouping into 4 or 100 boxes, quantiles generalize the concept to an infinite number
|
|
of boxes and thus to arbitrary real numbers <span class="math">\(\tau\)</span> between 0.0 and 1.0, where 0.0 represents the
|
|
minimum value, 0.5 the median and 1.0 the maximum value. Quantiles are closely related to
|
|
the inverse cumulative distribution function of the sample distribution.</p>
|
|
<p><code>Statistics.Quantile(data, tau)</code>
|
|
<code>Statistics.QuantileFunc(data)</code>
|
|
<code>SortedArrayStatistics.Quantile(data, tau)</code>
|
|
<code>ArrayStatistics.QuantileInplace(data, tau)</code></p>
|
|
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">Quantile</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 29)" onmouseover="showTip(event, 'fs1', 29)" class="id">whiteNoise</span><span class="pn">,</span> <span class="n">0.98</span><span class="pn">)</span>
|
|
<span class="fsi">val it : float = 13.97580653</span>
|
|
</code></pre>
|
|
<h4><a name="Quantile-Conventions-and-Compatibility" class="anchor" href="#Quantile-Conventions-and-Compatibility">Quantile Conventions and Compatibility</a></h4>
|
|
<p>Remember that all these descriptive statistics do not <em>compute</em> but merely <em>estimate</em>
|
|
statistical indicators of the value distribution. In the case of quantiles,
|
|
there is usually not a single number between the two groups specified by <span class="math">\(\tau\)</span>.
|
|
There are multiple ways to deal with this: the R project supports 9 modes and Mathematica
|
|
and SciPy have their own way to parametrize the behavior.</p>
|
|
<p>The <code>QuantileCustom</code> functions support all 9 modes from the R-project, which includes the one
|
|
used by Microsoft Excel, and also the 4-parameter variant of Mathematica:</p>
|
|
<p><code>Statistics.QuantileCustom(data, tau, definition)</code>
|
|
<code>Statistics.QuantileCustomFunc(data, definition)</code>
|
|
<code>SortedArrayStatistics.QuantileCustom(data, tau, a, b, c, d)</code>
|
|
<code>SortedArrayStatistics.QuantileCustom(data, tau, definition)</code>
|
|
<code>ArrayStatistics.QuantileCustomInplace(data, tau, a, b, c, d)</code>
|
|
<code>ArrayStatistics.QuantileCustomInplace(data, tau, definition)</code></p>
|
|
<p>The <code>QuantileDefinition</code> enumeration has the following options:</p>
|
|
<ul>
|
|
<li><strong>R1</strong>, SAS3, EmpiricalInvCDF</li>
|
|
<li><strong>R2</strong>, SAS5, EmpiricalInvCDFAverage</li>
|
|
<li><strong>R3</strong>, SAS2, Nearest</li>
|
|
<li><strong>R4</strong>, SAS1, California</li>
|
|
<li><strong>R5</strong>, Hydrology, Hazen</li>
|
|
<li><strong>R6</strong>, SAS4, Nist, Weibull, SPSS</li>
|
|
<li><strong>R7</strong>, Excel, Mode, S</li>
|
|
<li><strong>R8</strong>, Median, Default</li>
|
|
<li>
|
|
<strong>R9</strong>, Normal
|
|
|
|
[lang=fsharp]
|
|
Statistics.QuantileCustom(whiteNoise, 0.98, QuantileDefinition.R3)
|
|
// [fsi:val it : float = 13.97113209]
|
|
Statistics.QuantileCustom(whiteNoise, 0.98, QuantileDefinition.Excel)
|
|
// [fsi:val it : float = 13.97127374]
|
|
</li>
|
|
</ul>
|
|
<h2><a name="Rank-Statistics" class="anchor" href="#Rank-Statistics">Rank Statistics</a></h2>
|
|
<h4><a name="Ranks" class="anchor" href="#Ranks">Ranks</a></h4>
|
|
<p>Rank statistics are the counterpart to order statistics. The <code>Ranks</code> function evaluates the rank
|
|
of each sample and returns them as an array of doubles. The return type is double instead of int
|
|
in order to deal with ties, if one of the values appears multiple times.
|
|
Similar to <code>QuantileDefinition</code>, the <code>RankDefinition</code> enumeration controls how ties should be handled:</p>
|
|
<ul>
|
|
<li><strong>Average</strong>, Default: Replace ties with their mean (causing non-integer ranks).</li>
|
|
<li><strong>Min</strong>, Sports: Replace ties with their minimum, as typical in sports ranking.</li>
|
|
<li><strong>Max</strong>: Replace ties with their maximum.</li>
|
|
<li><strong>First</strong>: Permutation with increasing values at each index of ties.</li>
|
|
<li><strong>EmpiricalCDF</strong></li>
|
|
</ul>
|
|
<p><code>Statistics.Ranks(data, definition)</code>
|
|
<code>SortedArrayStatistics.Ranks(data, definition)</code>
|
|
<code>ArrayStatistics.RanksInplace(data, definition)</code></p>
|
|
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">Ranks</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 30)" onmouseover="showTip(event, 'fs1', 30)" class="id">whiteNoise</span><span class="pn">)</span>
|
|
<span class="fsi">val it : float [] = [|634.0; 736.0; 405.0; 395.0; 197.0; 167.0; 722.0; 44.0; ...|] </span>
|
|
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Ranks</span><span class="pn">(</span><span class="pn">[|</span> <span class="n">13.0</span><span class="pn">;</span> <span class="n">14.0</span><span class="pn">;</span> <span class="n">11.0</span><span class="pn">;</span> <span class="n">12.0</span><span class="pn">;</span> <span class="n">13.0</span> <span class="pn">|]</span><span class="pn">,</span> <span class="id">RankDefinition</span><span class="pn">.</span><span class="id">Average</span><span class="pn">)</span>
|
|
<span class="fsi">val it : float [] = [|3.5; 5.0; 1.0; 2.0; 3.5|] </span>
|
|
<span class="id">Statistics</span><span class="pn">.</span><span class="id">Ranks</span><span class="pn">(</span><span class="pn">[|</span> <span class="n">13.0</span><span class="pn">;</span> <span class="n">14.0</span><span class="pn">;</span> <span class="n">11.0</span><span class="pn">;</span> <span class="n">12.0</span><span class="pn">;</span> <span class="n">13.0</span> <span class="pn">|]</span><span class="pn">,</span> <span class="id">RankDefinition</span><span class="pn">.</span><span class="id">Sports</span><span class="pn">)</span>
|
|
<span class="fsi">val it : float [] = [|3.0; 5.0; 1.0; 2.0; 3.0|] </span>
|
|
</code></pre>
|
|
<h4><a name="Quantile-Rank" class="anchor" href="#Quantile-Rank">Quantile Rank</a></h4>
|
|
<p>Counterpart of the <code>Quantile</code> function, estimates <span class="math">\(\tau\)</span> of the provided <span class="math">\(\tau\)</span>-quantile value
|
|
<span class="math">\(x\)</span> from the provided samples. The <span class="math">\(\tau\)</span>-quantile is the data value where the cumulative distribution
|
|
function crosses <span class="math">\(\tau\)</span>.</p>
|
|
<p><code>Statistics.QuantileRank(data, x, definition)</code>
|
|
<code>Statistics.QuantileRankFunc(data, definition)</code>
|
|
<code>SortedArrayStatistics.QuantileRank(data, x, definition)</code></p>
|
|
<pre class="fssnip highlighted"><code lang="fsharp"><span class="id">Statistics</span><span class="pn">.</span><span class="id">QuantileRank</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 31)" onmouseover="showTip(event, 'fs1', 31)" class="id">whiteNoise</span><span class="pn">,</span> <span class="n">13.0</span><span class="pn">)</span>
|
|
<span class="fsi">val it : float = 0.9370045563</span>
|
|
<span class="id">Statistics</span><span class="pn">.</span><span class="id">QuantileRank</span><span class="pn">(</span><span onmouseout="hideTip(event, 'fs1', 32)" onmouseover="showTip(event, 'fs1', 32)" class="id">whiteNoise</span><span class="pn">,</span> <span class="n">6.7</span><span class="pn">,</span> <span class="id">RankDefinition</span><span class="pn">.</span><span class="id">Average</span><span class="pn">)</span>
|
|
<span class="fsi">val it : float = 0.04960610389</span>
|
|
</code></pre>
|
|
<h2><a name="Empirical-Distribution-Functions" class="anchor" href="#Empirical-Distribution-Functions">Empirical Distribution Functions</a></h2>
|
|
<p><code>Statistics.EmpiricalCDF(data, x)</code>
|
|
<code>Statistics.EmpiricalCDFFunc(data)</code>
|
|
<code>Statistics.EmpiricalInvCDF(data, tau)</code>
|
|
<code>Statistics.EmpiricalInvCDFFunc(data)</code>
|
|
<code>SortedArrayStatistics.EmpiricalCDF(data, x)</code></p>
|
|
<pre class="fssnip highlighted"><code lang="fsharp"><span class="k">let</span> <span onmouseout="hideTip(event, 'fs4', 33)" onmouseover="showTip(event, 'fs4', 33)" class="id">ecdf</span> <span class="o">=</span> <span class="id">Statistics</span><span class="pn">.</span><span class="id">EmpiricalCDFFunc</span> <span onmouseout="hideTip(event, 'fs1', 34)" onmouseover="showTip(event, 'fs1', 34)" class="id">whiteNoise</span>
|
|
<span class="id">Generate</span><span class="pn">.</span><span class="id">LinearSpacedMap</span><span class="pn">(</span><span class="n">20</span><span class="pn">,</span> <span class="id">start</span><span class="o">=</span><span class="n">3.0</span><span class="pn">,</span> <span class="id">stop</span><span class="o">=</span><span class="n">17.0</span><span class="pn">,</span> <span class="id">map</span><span class="o">=</span><span onmouseout="hideTip(event, 'fs4', 35)" onmouseover="showTip(event, 'fs4', 35)" class="id">ecdf</span><span class="pn">)</span>
|
|
<span class="fsi">val it : float [] =</span>
|
|
<span class="fsi"> [|0.0; 0.001; 0.002; 0.005; 0.022; 0.05; 0.094; 0.172; 0.278; 0.423; 0.555; </span>
|
|
<span class="fsi"> 0.705; 0.843; 0.921; 0.944; 0.983; 0.992; 0.997; 0.999; 1.0|] </span>
|
|
|
|
<span class="k">let</span> <span onmouseout="hideTip(event, 'fs5', 36)" onmouseover="showTip(event, 'fs5', 36)" class="fn">eicdf</span> <span class="o">=</span> <span class="id">Statistics</span><span class="pn">.</span><span class="id">empiricalInvCDFFunc</span> <span onmouseout="hideTip(event, 'fs1', 37)" onmouseover="showTip(event, 'fs1', 37)" class="id">whiteNoise</span>
|
|
<span class="pn">[</span> <span class="k">for</span> <span onmouseout="hideTip(event, 'fs6', 38)" onmouseover="showTip(event, 'fs6', 38)" class="fn">tau</span> <span class="k">in</span> <span class="o">0.0</span><span class="o">..</span><span class="n">0.05</span><span class="o">..</span><span class="n">1.0</span> <span class="k">-></span> <span onmouseout="hideTip(event, 'fs5', 39)" onmouseover="showTip(event, 'fs5', 39)" class="fn">eicdf</span> <span onmouseout="hideTip(event, 'fs6', 40)" onmouseover="showTip(event, 'fs6', 40)" class="fn">tau</span> <span class="pn">]</span>
|
|
<span class="fsi">val it : float [] =</span>
|
|
<span class="fsi"> [3.633070184; 6.682142043; 7.520000817; 8.040513497; 8.347587493; </span>
|
|
<span class="fsi"> 8.645491746; 9.02681611; 9.298987151; 9.522627142; 9.819352699; 10.11872428; </span>
|
|
<span class="fsi"> 10.35991046; 10.57530906; 10.8259542; 11.08605473; 11.33170746; 11.54356436; </span>
|
|
<span class="fsi"> 11.90973541; 12.4294346; 13.36889423; 16.65183566] </span>
|
|
</code></pre>
|
|
<h2><a name="Histograms" class="anchor" href="#Histograms">Histograms</a></h2>
|
|
<p>A histogram can be computed using the <a href="https://numerics.mathdotnet.com/api/MathNet.Numerics.Statistics/Histogram.htm">Histogram</a> class. Its constructor takes
|
|
the samples enumerable, the number of buckets to create, plus optionally the range
|
|
(minimum, maximum) of the sample data if available.</p>
|
|
<table class="pre"><tr><td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">var</span> histogram <span class="o">=</span> <span class="k">new</span> Histogram(samples, <span class="n">10</span>);
|
|
<span class="k">var</span> bucket<span class="n">3</span>count <span class="o">=</span> histogram[<span class="n">2</span>].Count;
|
|
</code></pre></td></tr></table>
|
|
<h2><a name="Correlation" class="anchor" href="#Correlation">Correlation</a></h2>
|
|
<p>The <code>Correlation</code> class supports computing Pearson's product-momentum and Spearman's ranked
|
|
correlation coefficient, as well as their correlation matrix for a set of vectors.</p>
|
|
<p>Code Sample: Computing the correlation coefficient of 1000 samples of f(x) = 2x and g(x) = x^2:</p>
|
|
<table class="pre"><tr><td class="snippet"><pre class="fssnip highlighted"><code lang="csharp"><span class="k">double</span>[] dataF <span class="o">=</span> Generate.LinearSpacedMap(<span class="n">1000</span>, <span class="n">0</span>, <span class="n">100</span>, x <span class="o">=</span><span class="o">></span> <span class="n">2</span>*x);
|
|
<span class="k">double</span>[] dataG <span class="o">=</span> Generate.LinearSpacedMap(<span class="n">1000</span>, <span class="n">0</span>, <span class="n">100</span>, x <span class="o">=</span><span class="o">></span> x*x);
|
|
<span class="k">double</span> correlation <span class="o">=</span> Correlation.Pearson(dataF, dataG);
|
|
</code></pre></td></tr></table>
|
|
|
|
<div class="fsdocs-tip" id="fs1">val whiteNoise : obj</div>
|
|
<div class="fsdocs-tip" id="fs2">val wave : obj</div>
|
|
<div class="fsdocs-tip" id="fs3">val os : (int -> obj)</div>
|
|
<div class="fsdocs-tip" id="fs4">val ecdf : obj</div>
|
|
<div class="fsdocs-tip" id="fs5">val eicdf : (float -> obj)</div>
|
|
<div class="fsdocs-tip" id="fs6">val tau : float</div>
|
|
|
|
</div>
|
|
<!-- BEGIN SEARCH BOX: this adds support for the search box -->
|
|
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/JavaScript-autoComplete/1.0.4/auto-complete.css" />
|
|
<script type="text/javascript">var fsdocs_search_baseurl = 'https://numerics.mathdotnet.com/';</script>
|
|
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/lunr.js/2.3.8/lunr.min.js"></script>
|
|
<script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/JavaScript-autoComplete/1.0.4/auto-complete.min.js"></script>
|
|
<script type="text/javascript" src="https://numerics.mathdotnet.com/content/fsdocs-search.js"></script>
|
|
<!-- END SEARCH BOX: this adds support for the search box -->
|
|
</div>
|
|
</body>
|
|
|
|
</html>
|
|
|