Statistical Analysis

This help file describes operations and functions for statistical analysis together with some general guidelines for their use. This is not a statistics tutorial; for that you can consult one of the references at the end of this help file or the references listed in the documentation of a particular operation or function. The material below assumes that you are familiar with techniques and methods of statistical analysis.

Most statistics operations and functions are named with the prefix "Stats". Naming exceptions include the random noise functions which have traditionally been named based on the distribution they represent.

There are six natural groups of statistics operations and functions. They include:

Test operations
Noise functions
Probability distribution functions (PDFs)
Cumulative distribution functions (CDFs)
Inverse cumulative distribution functions
General purpose statistics operations and functions

Statistical Test Operations

Test operations analyze the input data to examine the validity of a specific hypothesis. The common test involves a computation of some numeric value (also known as "test statistic") which is usually compared with a critical value in order to determine if you should accept or reject the test hypothesis (H₀). Most tests compute a critical value for the given significance alpha which has the default value 0.05 or a user-provided value via the /ALPH flag. Some tests directly compute the P-value which you can compare to the desired significance value.

In the past, critical values have been published in tables for various significance levels and tails of distributions. They are by far the most difficult technical aspect in implementating statistical tests. The critical values are usually obtained from the inverse of the CDF for the particular distribution, i.e., from solving the equation

\displaystyle { cdf }({ criticalValue })=1- { alpha },

where alpha is the significance. In some distributions (e.g., Friedman's) the calculation of the CDF is so computationally intensive that it is impractical (using desktop computers in 2006) to compute for very large parameters. Fortunately, large parameters usually imply that the distributions can be approximated using simpler expressions. Igor's tests provide whenever possible exact critical values as well as the common relevant approximations.

Comparison of critical values with published table values can sometimes be interesting as there does not appear to be a standard for determining the published critical value when the CDF takes a finite number of discrete values (step-like). In this case the CDF attains the value (1-alpha) in a vertical transition so one could use the x-value for the vertical transition as a critical value or the x-value of the subsequent vertical transition. Some tables reflect a "convervative" approach and print the x-value of subsequent transitions.

Statistical test operations can print their results to the history area of the command window and save them in a wave in the current data folder. Result waves have a fixed name associated with the operation. Elements in the wave are designated by dimension labels. You can use the /T flag to display the results of the operation in a table with dimension labels. The argument for this flag determines what happens when you kill the table. You can use /Q in all test operations to prevent printing information in the history area and you can use the /Z flag to make sure that the operations do not report errors except by setting the V_Flag variable to -1.

Statistical test operations tend to include several variations of the named test. You can usually choose to execute one or more variations by specifying the appropriate flags. The following table can be used as a guide for identifying the operation associated with a given test name.

Statistical Test Operations by Name

Test Name	Where to find it
Angular Distance	StatsAngularDistanceTest
Bartlett's test for variances	StatsVariancesTest
BootStrap	StatsResample
Brown and Forsythe	StatsANOVA1Test
Chi-squared test for means	StatsChiTest
Cochran's test	StatsCochranTest
Dunn-Holland-Wolfe	StatsNPMCTest
Dunnette multicomparison test	StatsDunnettTest, StatsLinearRegression
Fisher's Exact Test	StatsContingencyTable
Fixed Effect Model	StatsANOVA1Test
Friedman test on randomized block	StatsFriedmanTest
F-test on two distributions	StatsFTest
Hartigan test for unimodality	StatsDIPTest
Hodges-Ajne (Batschelet)	StatsHodgesAjneTest
Hotelling	StatsCircularTwoSampleTest, StatsCircularMeans
Jackknife	StatsResample
Jarque-Bera Test	StatsJBTest
Kolmogorov-Smirnov	StatsKSTest
Kruskal-Wallis	StatsKWTest
Kuiper Test	StatsCircularMoments
Levene's test for variances	StatsVariancesTest
Linear Correlation Test	StatsLinearCorrelationTest
Linear Order Statistic	StatsCircularMoments
Mann-Kendall	StatsKendallTauTest
Moore test	StatsCircularTwoSampleTest, StatsCircularMeans
Non-parametric multiple contrasts	StatsNPMCTest
Non-parametric angular-angular correlation	StatsCircularCorrelationTest
Non-parametric second order circular analysis	StatsCircularMeans
Non-parametric serial randomness (nominal)	StatsNPNominalSRTest
Parametric angular-angular correlation	StatsCircularCorrelationTest
Parametric angular-Linear correlation	StatsCircularCorrelationTest
Parametric second order circular analysis	StatsCircularMeans
Parametric serial randomness test	StatsSRTest
Rayleigh	StatsCircularMoments
Repeated Measures	StatsANOVA2RMTest
Scheffe equality of means	StatsScheffeTest
Shapiro-Wilk test for normality	StatsShapiroWilkTest
Spearman	StatsRankCorrelationTest
Student-Newman-Keuls	StatsNPMCTest
Tukey Test	StatsTukeyTest, StatsLinearRegression, StatsMultiCorrelationTest, StatsNPMCTest
Two-Factor ANOVA	StatsANOVA2NRTest
T-test	StatsTTest
Watson's nonparametric two-sample U^2</sup>	StatsWatsonUSquaredTest, StatsCircularTwoSampleTest
Watson-Williams	StatsWatsonWilliamsTest
Weighted -rank correlation test	StatsWRCorrelationTest
Wheeler-Watson nonparametric test	StatsWheelerWatsonTest
Wilcoxon-Mann-Whitney two-sample	StatsWilcoxonRankTest
Wilcoxon signed rank	StatsWilcoxonRankTest

Statistical Test Operations by Data Format

The following tables group statistical operations and functions according to the format of the input data.

Tests For Single Waves

Analysis Method	Comment
StatsChiTest	Compares with known binned values
StatsCircularMoments	WaveStats for circular data
StatsKendallTauTest	Similar to Spearman's correlation
StatsMedian	Returns the median
StatsNPNominalSRTest	Non-parametric serial randomness test
StatsQuantiles	Computes quantiles and more
StatsResample	Bootstrap analysis
StatsSRTest	Serial randomness test
StatsTrimmedMean	Returns the trimmed mean
StatsTTest	Compares with known mean
Sort	Reorders the data
WaveStats	Basic statistical description
StatsJBTest	Jarque-Bera test for normality
StatsKSTest	Limited scope test for normality
StatsDIPTest	Hartigan test for unimodality
StatsShapiroWilkTest	Shapiro-Wilk test for normality

Tests For Two Waves

Analysis Method	Comment
StatsChiTest	Chi-squared statistic for comparing two distributions
StatsCochranTest_</sub>	Randomized block or repeated measures test
StatsCircularTwoSampleTest	Second order analysis of angles
StatsDunnettTest	Compares multiple groups to a control
StatsFTest	Computes ratio of variances
StatsFriedmanTest	Non-parametric ANOVA
StatsKendallTauTest	Similar to Spearman's correlation
StatsTTest	Compares the means of two distributions
StatsANOVA1Test	One-way analysis of variances
StatsLinearRegression	Linear regression analysis
StatsLinearCorrelationTest	Linear correlation coefficient and its error
StatsRankCorrelationTest	Computes Spearman's rank correlation
StatsVariancesTest	Compares variances of waves
StatsWilcoxonRankTest	Two-sample or signed rank test
StatsWatsonUSquaredTest	Compares two populations of circular data
StatsWatsonWilliamsTest	Compares mean values of angular distributions
StatsWheelerWatsonTest	Compares two angular distributions

Tests For Multiple or Multidimensional Waves

Analysis Method	Comment
StatsANOVA1Test	One-way analysis of variances
StatsANOVA2Test	Two-factor analysis of variances
StatsANOVA2RMTest	Two-factor repeated measure ANOVA
StatsCochranTest	Randomized block or repeated measures test
StatsContingencyTable	Contingency table analysis
StatsDunnettTest	Compares multiple groups to a control
StatsFriedmanTest	Non-parametric ANOVA
StatsNPMCTest	Non-parametric multiple comparison tests
StatsScheffeTest	Tests equality of means
StatsTukeyTest	Multiple comparisons based on means
StatsWatsonWilliamsTest	Compares mean values of angular distributions
StatsWheelerWatsonTest	Compares two angular distributions

Statistical Test Operations for Angular/Circular Data


StatsAngularDistanceTest
StatsCircularMoments
StatsCircularMeans
StatsCircularTwoSampleTest
StatsCircularCorrelationTest
StatsHodgesAjneTest
StatsWatsonUSquaredTest
StatsWatsonWilliamsTest
StatsWheelerWatsonTest

Statistical Test Operations: Non-Parametric Tests


StatsAngularDistanceTest
StatsFriedmanTest
StatsCircularTwoSampleTest	Parametric or nonparametric
StatsCircularCorrelationTest	Parametric or nonparameteric
StatsCircularMeans	Parametric or nonparameteric
StatsHodgesAjneTest
StatsKendallTauTest
StatsKWTest
StatsNPMCTest
StatsNPNominalSRTest
StatsRankCorrelationTest
StatsWatsonUSquaredTest
StatsWheelerWatsonTest
StatsWilcoxonRankTest

Noise Functions

The following functions return numbers from a pseudo-random distribution of the specified shapes and parameters. Except for enoise and gnoise where you have an option to select a random number generator, the remaining noise functions use a Mersenne Twister algorithm for the initial uniform pseudo-random distribution.

note

Whenever you need repeatable results you should use SetRandomSeed prior to executing any of the noise functions.

The following noise generation functions are available:


binomialNoise	logNormalNoise
enoise	lorentzianNoise
expNoise	poissonNoise
gammaNoise	StatsPowerNoise
gnoise	StatsVonMisesNoise
HyperGNoise	wnoise

Cumulative Distribution Functions

A Cumulative Distribution Function (CDF) is the integral of its respective probability distribution function (PDF). CDFs are usually well behaved functions with values in the range [0,1]. CDFs are important in computing critical values, P-values and power of statistical tests.

Many CDFs are computed directly from closed form expressions. Others can be difficult to compute because they involve evaluating a very large number of states, e.g., Friedman or USquared distributions. In these cases you have the following options:

Use a built-in table that consists of exact, pre-computed values.
Compute an approximate CDF based on the prevailing approximation method or using a Monte-Carlo approach.
Compute the exact CDF.

Built-in tables are ideal if they cover the range of the parameters that you need. Monte-Carlo methods can be tricky in the sense that repeated application may return small variations in values. Computing the exact CDF may be desirable, but it is often impractical. In most situations the range of parameters that is practical to compute on a desktop machine is already covered in the built-in tables. Larger parameters have not been considered because they take days to compute or because they require 64 bit processors. In addition, most of the approximations tend to improve with increasing size of the parameters.

The functions to calculate values from CDFs are as follows:


StatsBetaCDF	StatsHyperGCDF	StatsQCDF
StatsBinomialCDF	StatsKuiperCDF	StatsRayleighCDF
StatsCauchyCDF	StatsLogisticCDF	StatsRectangularCDF
StatsChiCDF	StatsLogNormalCDF	StatsRunsCDF
StatsCMSSDCDF	StatsMaxwellCDF	StatsSpearmanRhoCDF
StatsDExpCDF	StatsMooreCDF	StatsStudentCDF
StatsErlangCDF	StatsNBinomialCDF	StatsTopDownCDF
StatsEValueCDF	StatsNCFCDF	StatsTriangularCDF
StatsExpCDF	StatsNCTCDF	StatsUSquaredCDF
StatsFCDF	StatsNormalCDF	StatsVonMisesCDF
StatsFriedmanCDF	StatsParetoCDF	StatsQCDF
StatsGammaCDF	StatsPoissonCDF	StatsWaldCDF
StatsGeometricCDF	StatsPowerCDF	StatsWeibullCDF

Probablility Distribution Functions

Probability distribution functions (PDF) are sometimes known as probability densities. In the case of continuous distributions, the area under the curve of the PDF for each interval equals the probability for the random variable to fall within that interval. The PDFs are useful in calculating event probabilities, characteristic functions and moments of a distribution.

The functions to calculate values from PDFs are as follows:


StatsBetaPDF	StatsGammaPDF	StatsParetoPDF
StatsBinomialPDF	StatsGeometricPDF	StatsPoissonPDF
StatsCauchyPDF	StatsHyperGPDF	StatsPowerPDF
StatsChiPDF	StatsLogNormalPDF	StatsRayleighPDF
StatsDExpPDF	StatsMaxwellPDF	StatsRectangularPDF
StatsErlangPDF	StatsNBinomialPDF	StatsStudentPDF
StatsErrorPDF	StatsNCChiPDF	StatsTriangularPDF
StatsEValuePDF	StatsNCFPDF	StatsVonMisesPDF
StatsExpPDF	StatsNCTPDF	StatsWaldPDF
StatsFPDF	StatsNormalPDF	StatsWeibullPDF

Inverse Cumulative Distribution Functions

The inverse cumulative distribution functions return the values at which their respective CDFs attain a given level. This value is typically used as a critical test value. There are very few functions for which the inverse CDF can be written in closed form. In most situations the inverse is computed iteratively from the CDF.

The functions to calculate values from inverse CDFs are as follows:


StatsInvBetaCDF	StatsInvKuiperCDF	StatsInvQpCDF
StatsInvBinomialCDF	StatsInvLogisticCDF	StatsInvRayleighCDF
StatsInvCauchyCDF	StatsInvLogNormalCDF	StatsInvRectangularCDF
StatsInvChiCDF	StatsInvMaxwellCDF	StatsInvSpearmanCDF
StatsInvCMSSDCDF	StatsInvMooreCDF	StatsInvStudentCDF
StatsInvDExpCDF	StatsInvNBinomialCDF	StatsInvTopDownCDF
StatsInvEValueCDF	StatsInvNCFCDF	StatsInvTriangularCDF
StatsInvExpCDF	StatsInvNormalCDF	StatsInvUSquaredCDF
StatsInvFCDF	StatsInvParetoCDF	StatsInvVonMisesCDF
StatsInvFriedmanCDF	StatsInvPoissonCDF	StatsInvWeibullCDF
StatsInvGammaCDF	StatsInvPowerCDF
StatsInvGeometricCDF	StatsInvQCDF

General Purpose Statistics Operations and Functions

This group includes operations and functions that existed before IGOR Pro 6.0 and some general purpose operations and functions that do not belong to the main groups listed above.


binomial	Sort	StatsTrimmedMean
binomialln	StatsCircularMoments	StudentA
erf	StatsCorrelation	StudentT
erfc	StatsMedian	WaveStats
inverseERF	StatsQuantiles	StatsPermute
inverseERFC	StatsResample

Hazard and Survival Functions

Igor does not provide built-in functions to calculate the Survival or Hazard functions. They can be calculated easily from the Probability Density Functions and Cumulative Density Functions provided by Igor.

In the following, the Cumulative Distribution Functions are denoted by F(x) and the Probablility Distribution Functions are denoted by p(x).

The Survival Function S(x) is given by

\displaystyle S(x)=1-F(x) .

The Hazard function h(x) is given by

\displaystyle h(x)=\frac{p(x)}{S(x)}=\frac{p(x)}{1-F(x)} .

The cumulative hazard function H(x)

\displaystyle H(x)=\int_{-\infty}^{x} h(u) d u,

\displaystyle H(x)=-\ln [1-F(x)] .

Inverse Survival Function Z(α)

\displaystyle Z(\alpha)=G(1-\alpha),

where G() is the inverse CDF (see Inverse Cumulative Distribution Functions).

Statistics Procedures

Several procedure files are provided to extend the built-in statistics capability described in this help file. Some of these procedure files provide control-panel user interfaces to the built-in statistics funtionality. Others extend the functionality.

In the Analysis menu you will find a Statistics item that brings up a submenu. Selecting any item in the submenu will cause all the statistics-related procedure files to be loaded, making them ready to use. Alternativelly, you can load all the statistics procedures by adding the following include statement to the top of your procedure window:

#include <AllStatsProcedures>

Functionality provided by the statistics procedure files includes the 1D Statistics Report package for automatic analysis of single 1D waves, and the ANOVA Power Calculations Panel, as well as functions to create specialized graphs:


StatsAutoCorrPlot()	StatsPlotLag()
StatsBoxPlot()	StatsProbPlot()
StatsPlotHistogram()

Also included are these of convenience functions:


WM_2MeanConfidenceIntervals()	WM_MCPointOnRegressionLines()
WM_2MeanConfidenceIntervals2()	WM_MeanConfidenceInterval()
WM_BernoulliCdf()	WM_OneTailStudentA()
WM_BinomialPdf()	WM_OneTailStudentT()
WM_CIforPooledMean()	WM_PlotBiHistogram()
WM_CompareCorrelations()	WM_RankForTies()
WM_EstimateMinDetectableDiff()	WM_RankLetterGradesWithTies()
WM_EstimateReqSampleSize()	WM_RegressionInversePrediction()
WM_EstimateReqSampleSize2()	WM_SSEstimatorFunc()
WM_EstimateSampleSizeForDif()	WM_SSEstimatorFunc2()
WM_GetANOVA1Power()	WM_SSEstimatorFunc3()
WM_GetGeometricAverage()	WM_VarianceConfidenceInterval()
WM_GetHarmonicMean()	WM_WilcoxonPairedRanks()
WM_GetPooledMean()	WM_StatsKaplanMeier()
WM_GetPooledVariance()

Statistics References

Ajne, B., A simple test for uniformity of a circular distribution, Biometrica, 55, 343-354, 1968.

Bradley, J.V., Distribution-Free Statistical Tests, Prentice Hall, Englewood Cliffs, New Jersey, 1968.

Cheung, Y.K., and J.H. Klotz, The Mann Whitney Wilcoxon distribution using linked lists, Statistica Sinica, 7, 805-813, 1997.

Copenhaver, M.D., and B.S. Holland, Multiple comparisons of simple effects in the two-way analysis of variance with fixed effects, Journal of Statistical Computation and Simulation, 30, 1-15, 1988.

Evans, M., N. Hastings, and B. Peacock, Statistical Distributions, 3rd ed., Wiley, New York, 2000.

Fisher, N.I., Statistical Analysis of Circular Data, 295pp., Cambridge University Press, New York, 1995.

Iman, R.L., and W.J. Conover, A measure of top-down correlation, Technometrics, 29, 351-357, 1987.

Kendall, M.G., Rank Correlation Methods, 3rd ed., Griffin, London, 1962.

Klotz, J.H., Computational Approach to Statistics.

Moore, B.R., A modification of the Rayleigh test for vector data, Biometrica, 67, 175-180, 1980.

Press, William H., et al., Numerical Recipes in C, 2nd ed., 994 pp., Cambridge University Press, New York, 1992.

van de Wiel, M.A., and A. Di Bucchianico, Fast computation of the exact null distribution of Spearman's rho and Page's L statistic for samples with and without ties, J. of Stat. Plan. and Inference, 92, 133-145, 2001..

Wallace, D.L., Simplified Beta-Approximation to the Kruskal-Wallis H Test, Jour. Am. Stat. Assoc., 54, 225-230.

Zar, J.H., Biostatistical Analysis, 4th ed., 929 pp., Prentice Hall, Englewood Cliffs, New Jersey, 1999.

Statistical Test Operations​

Statistical Test Operations by Name​

Statistical Test Operations by Data Format​

Tests For Single Waves​

Tests For Two Waves​

Tests For Multiple or Multidimensional Waves​

Statistical Test Operations for Angular/Circular Data​

Statistical Test Operations: Non-Parametric Tests​

Noise Functions​

Cumulative Distribution Functions​

Probablility Distribution Functions​

Inverse Cumulative Distribution Functions​

General Purpose Statistics Operations and Functions​

Hazard and Survival Functions​

Statistics Procedures​

Statistics References​