Skip to main content

Statistical Analysis

This help file describes operations and functions for statistical analysis together with some general guidelines for their use. This is not a statistics tutorial; for that you can consult one of the references at the end of this help file or the references listed in the documentation of a particular operation or function. The material below assumes that you are familiar with techniques and methods of statistical analysis.

Most statistics operations and functions are named with the prefix "Stats". Naming exceptions include the random noise functions which have traditionally been named based on the distribution they represent.

There are six natural groups of statistics operations and functions. They include:

  • Test operations

  • Noise functions

  • Probability distribution functions (PDFs)

  • Cumulative distribution functions (CDFs)

  • Inverse cumulative distribution functions

  • General purpose statistics operations and functions

Statistical Test Operations

Test operations analyze the input data to examine the validity of a specific hypothesis. The common test involves a computation of some numeric value (also known as "test statistic") which is usually compared with a critical value in order to determine if you should accept or reject the test hypothesis (H0). Most tests compute a critical value for the given significance alpha which has the default value 0.05 or a user-provided value via the /ALPH flag. Some tests directly compute the P-value which you can compare to the desired significance value.

In the past, critical values have been published in tables for various significance levels and tails of distributions. They are by far the most difficult technical aspect in implementating statistical tests. The critical values are usually obtained from the inverse of the CDF for the particular distribution, i.e., from solving the equation

cdf(criticalValue)=1alpha,\displaystyle { cdf }({ criticalValue })=1- { alpha },

where alpha is the significance. In some distributions (e.g., Friedman's) the calculation of the CDF is so computationally intensive that it is impractical (using desktop computers in 2006) to compute for very large parameters. Fortunately, large parameters usually imply that the distributions can be approximated using simpler expressions. Igor's tests provide whenever possible exact critical values as well as the common relevant approximations.

Comparison of critical values with published table values can sometimes be interesting as there does not appear to be a standard for determining the published critical value when the CDF takes a finite number of discrete values (step-like). In this case the CDF attains the value (1-alpha) in a vertical transition so one could use the x-value for the vertical transition as a critical value or the x-value of the subsequent vertical transition. Some tables reflect a "convervative" approach and print the x-value of subsequent transitions.

Statistical test operations can print their results to the history area of the command window and save them in a wave in the current data folder. Result waves have a fixed name associated with the operation. Elements in the wave are designated by dimension labels. You can use the /T flag to display the results of the operation in a table with dimension labels. The argument for this flag determines what happens when you kill the table. You can use /Q in all test operations to prevent printing information in the history area and you can use the /Z flag to make sure that the operations do not report errors except by setting the V_Flag variable to -1.

Statistical test operations tend to include several variations of the named test. You can usually choose to execute one or more variations by specifying the appropriate flags. The following table can be used as a guide for identifying the operation associated with a given test name.

Statistical Test Operations by Name

Test NameWhere to find it
Angular DistanceStatsAngularDistanceTest
Bartlett's test for variancesStatsVariancesTest
BootStrapStatsResample
Brown and ForsytheStatsANOVA1Test
Chi-squared test for meansStatsChiTest
Cochran's testStatsCochranTest
Dunn-Holland-WolfeStatsNPMCTest
Dunnette multicomparison testStatsDunnettTest, StatsLinearRegression
Fisher's Exact TestStatsContingencyTable
Fixed Effect ModelStatsANOVA1Test
Friedman test on randomized blockStatsFriedmanTest
F-test on two distributionsStatsFTest
Hartigan test for unimodalityStatsDIPTest
Hodges-Ajne (Batschelet)StatsHodgesAjneTest
HotellingStatsCircularTwoSampleTest, StatsCircularMeans
JackknifeStatsResample
Jarque-Bera TestStatsJBTest
Kolmogorov-SmirnovStatsKSTest
Kruskal-WallisStatsKWTest
Kuiper TestStatsCircularMoments
Levene's test for variancesStatsVariancesTest
Linear Correlation TestStatsLinearCorrelationTest
Linear Order StatisticStatsCircularMoments
Mann-KendallStatsKendallTauTest
Moore testStatsCircularTwoSampleTest, StatsCircularMeans
Non-parametric multiple contrastsStatsNPMCTest
Non-parametric angular-angular correlationStatsCircularCorrelationTest
Non-parametric second order circular analysisStatsCircularMeans
Non-parametric serial randomness (nominal)StatsNPNominalSRTest
Parametric angular-angular correlationStatsCircularCorrelationTest
Parametric angular-Linear correlationStatsCircularCorrelationTest
Parametric second order circular analysisStatsCircularMeans
Parametric serial randomness testStatsSRTest
RayleighStatsCircularMoments
Repeated MeasuresStatsANOVA2RMTest
Scheffe equality of meansStatsScheffeTest
Shapiro-Wilk test for normalityStatsShapiroWilkTest
SpearmanStatsRankCorrelationTest
Student-Newman-KeulsStatsNPMCTest
Tukey TestStatsTukeyTest, StatsLinearRegression, StatsMultiCorrelationTest, StatsNPMCTest
Two-Factor ANOVAStatsANOVA2NRTest
T-testStatsTTest
Watson's nonparametric two-sample U2StatsWatsonUSquaredTest, StatsCircularTwoSampleTest
Watson-WilliamsStatsWatsonWilliamsTest
Weighted -rank correlation testStatsWRCorrelationTest
Wheeler-Watson nonparametric testStatsWheelerWatsonTest
Wilcoxon-Mann-Whitney two-sampleStatsWilcoxonRankTest
Wilcoxon signed rankStatsWilcoxonRankTest

Statistical Test Operations by Data Format

The following tables group statistical operations and functions according to the format of the input data.

Tests For Single Waves

Analysis MethodComment
StatsChiTestCompares with known binned values
StatsCircularMomentsWaveStats for circular data
StatsKendallTauTestSimilar to Spearman's correlation
StatsMedianReturns the median
StatsNPNominalSRTestNon-parametric serial randomness test
StatsQuantilesComputes quantiles and more
StatsResampleBootstrap analysis
StatsSRTestSerial randomness test
StatsTrimmedMeanReturns the trimmed mean
StatsTTestCompares with known mean
SortReorders the data
WaveStatsBasic statistical description
StatsJBTestJarque-Bera test for normality
StatsKSTestLimited scope test for normality
StatsDIPTestHartigan test for unimodality
StatsShapiroWilkTestShapiro-Wilk test for normality

Tests For Two Waves

Analysis MethodComment
StatsChiTestChi-squared statistic for comparing two distributions
StatsCochranTest Randomized block or repeated measures test
StatsCircularTwoSampleTestSecond order analysis of angles
StatsDunnettTestCompares multiple groups to a control
StatsFTestComputes ratio of variances
StatsFriedmanTestNon-parametric ANOVA
StatsKendallTauTestSimilar to Spearman's correlation
StatsTTestCompares the means of two distributions
StatsANOVA1TestOne-way analysis of variances
StatsLinearRegressionLinear regression analysis
StatsLinearCorrelationTestLinear correlation coefficient and its error
StatsRankCorrelationTestComputes Spearman's rank correlation
StatsVariancesTestCompares variances of waves
StatsWilcoxonRankTestTwo-sample or signed rank test
StatsWatsonUSquaredTestCompares two populations of circular data
StatsWatsonWilliamsTestCompares mean values of angular distributions
StatsWheelerWatsonTestCompares two angular distributions

Tests For Multiple or Multidimensional Waves

Analysis MethodComment
StatsANOVA1TestOne-way analysis of variances
StatsANOVA2TestTwo-factor analysis of variances
StatsANOVA2RMTestTwo-factor repeated measure ANOVA
StatsCochranTestRandomized block or repeated measures test
StatsContingencyTableContingency table analysis
StatsDunnettTestCompares multiple groups to a control
StatsFriedmanTestNon-parametric ANOVA
StatsNPMCTestNon-parametric multiple comparison tests
StatsScheffeTestTests equality of means
StatsTukeyTestMultiple comparisons based on means
StatsWatsonWilliamsTestCompares mean values of angular distributions
StatsWheelerWatsonTestCompares two angular distributions

Statistical Test Operations for Angular/Circular Data

StatsAngularDistanceTest
StatsCircularMoments
StatsCircularMeans
StatsCircularTwoSampleTest
StatsCircularCorrelationTest
StatsHodgesAjneTest
StatsWatsonUSquaredTest
StatsWatsonWilliamsTest
StatsWheelerWatsonTest

Statistical Test Operations: Non-Parametric Tests

StatsAngularDistanceTest
StatsFriedmanTest
StatsCircularTwoSampleTestParametric or nonparametric
StatsCircularCorrelationTestParametric or nonparameteric
StatsCircularMeansParametric or nonparameteric
StatsHodgesAjneTest
StatsKendallTauTest
StatsKWTest
StatsNPMCTest
StatsNPNominalSRTest
StatsRankCorrelationTest
StatsWatsonUSquaredTest
StatsWheelerWatsonTest
StatsWilcoxonRankTest

Noise Functions

The following functions return numbers from a pseudo-random distribution of the specified shapes and parameters. Except for enoise and gnoise where you have an option to select a random number generator, the remaining noise functions use a Mersenne Twister algorithm for the initial uniform pseudo-random distribution.

note

Whenever you need repeatable results you should use SetRandomSeed prior to executing any of the noise functions.

The following noise generation functions are available:

binomialNoiselogNormalNoise
enoiselorentzianNoise
expNoisepoissonNoise
gammaNoiseStatsPowerNoise
gnoiseStatsVonMisesNoise
HyperGNoisewnoise

Cumulative Distribution Functions

A Cumulative Distribution Function (CDF) is the integral of its respective probability distribution function (PDF). CDFs are usually well behaved functions with values in the range [0,1]. CDFs are important in computing critical values, P-values and power of statistical tests.

Many CDFs are computed directly from closed form expressions. Others can be difficult to compute because they involve evaluating a very large number of states, e.g., Friedman or USquared distributions. In these cases you have the following options:

  1. Use a built-in table that consists of exact, pre-computed values.

  2. Compute an approximate CDF based on the prevailing approximation method or using a Monte-Carlo approach.

  3. Compute the exact CDF.

Built-in tables are ideal if they cover the range of the parameters that you need. Monte-Carlo methods can be tricky in the sense that repeated application may return small variations in values. Computing the exact CDF may be desirable, but it is often impractical. In most situations the range of parameters that is practical to compute on a desktop machine is already covered in the built-in tables. Larger parameters have not been considered because they take days to compute or because they require 64 bit processors. In addition, most of the approximations tend to improve with increasing size of the parameters.

The functions to calculate values from CDFs are as follows:

StatsBetaCDFStatsHyperGCDFStatsQCDF
StatsBinomialCDFStatsKuiperCDFStatsRayleighCDF
StatsCauchyCDFStatsLogisticCDFStatsRectangularCDF
StatsChiCDFStatsLogNormalCDFStatsRunsCDF
StatsCMSSDCDFStatsMaxwellCDFStatsSpearmanRhoCDF
StatsDExpCDFStatsMooreCDFStatsStudentCDF
StatsErlangCDFStatsNBinomialCDFStatsTopDownCDF
StatsEValueCDFStatsNCFCDFStatsTriangularCDF
StatsExpCDFStatsNCTCDFStatsUSquaredCDF
StatsFCDFStatsNormalCDFStatsVonMisesCDF
StatsFriedmanCDFStatsParetoCDFStatsQCDF
StatsGammaCDFStatsPoissonCDFStatsWaldCDF
StatsGeometricCDFStatsPowerCDFStatsWeibullCDF

Probablility Distribution Functions

Probability distribution functions (PDF) are sometimes known as probability densities. In the case of continuous distributions, the area under the curve of the PDF for each interval equals the probability for the random variable to fall within that interval. The PDFs are useful in calculating event probabilities, characteristic functions and moments of a distribution.

The functions to calculate values from PDFs are as follows:

StatsBetaPDFStatsGammaPDFStatsParetoPDF
StatsBinomialPDFStatsGeometricPDFStatsPoissonPDF
StatsCauchyPDFStatsHyperGPDFStatsPowerPDF
StatsChiPDFStatsLogNormalPDFStatsRayleighPDF
StatsDExpPDFStatsMaxwellPDFStatsRectangularPDF
StatsErlangPDFStatsNBinomialPDFStatsStudentPDF
StatsErrorPDFStatsNCChiPDFStatsTriangularPDF
StatsEValuePDFStatsNCFPDFStatsVonMisesPDF
StatsExpPDFStatsNCTPDFStatsWaldPDF
StatsFPDFStatsNormalPDFStatsWeibullPDF

Inverse Cumulative Distribution Functions

The inverse cumulative distribution functions return the values at which their respective CDFs attain a given level. This value is typically used as a critical test value. There are very few functions for which the inverse CDF can be written in closed form. In most situations the inverse is computed iteratively from the CDF.

The functions to calculate values from inverse CDFs are as follows:

StatsInvBetaCDFStatsInvKuiperCDFStatsInvQpCDF
StatsInvBinomialCDFStatsInvLogisticCDFStatsInvRayleighCDF
StatsInvCauchyCDFStatsInvLogNormalCDFStatsInvRectangularCDF
StatsInvChiCDFStatsInvMaxwellCDFStatsInvSpearmanCDF
StatsInvCMSSDCDFStatsInvMooreCDFStatsInvStudentCDF
StatsInvDExpCDFStatsInvNBinomialCDFStatsInvTopDownCDF
StatsInvEValueCDFStatsInvNCFCDFStatsInvTriangularCDF
StatsInvExpCDFStatsInvNormalCDFStatsInvUSquaredCDF
StatsInvFCDFStatsInvParetoCDFStatsInvVonMisesCDF
StatsInvFriedmanCDFStatsInvPoissonCDFStatsInvWeibullCDF
StatsInvGammaCDFStatsInvPowerCDF
StatsInvGeometricCDFStatsInvQCDF

General Purpose Statistics Operations and Functions

This group includes operations and functions that existed before IGOR Pro 6.0 and some general purpose operations and functions that do not belong to the main groups listed above.

binomialSortStatsTrimmedMean
binomiallnStatsCircularMomentsStudentA
erfStatsCorrelationStudentT
erfcStatsMedianWaveStats
inverseERFStatsQuantilesStatsPermute
inverseERFCStatsResample

Hazard and Survival Functions

Igor does not provide built-in functions to calculate the Survival or Hazard functions. They can be calculated easily from the Probability Density Functions and Cumulative Density Functions provided by Igor.

In the following, the Cumulative Distribution Functions are denoted by F(x) and the Probablility Distribution Functions are denoted by p(x).

The Survival Function S(x) is given by

S(x)=1F(x).\displaystyle S(x)=1-F(x) .

The Hazard function h(x) is given by

h(x)=p(x)S(x)=p(x)1F(x).\displaystyle h(x)=\frac{p(x)}{S(x)}=\frac{p(x)}{1-F(x)} .

The cumulative hazard function H(x)

H(x)=xh(u)du,\displaystyle H(x)=\int_{-\infty}^{x} h(u) d u, H(x)=ln[1F(x)].\displaystyle H(x)=-\ln [1-F(x)] .

Inverse Survival Function Z(α)

Z(α)=G(1α),\displaystyle Z(\alpha)=G(1-\alpha),

where G() is the inverse CDF (see Inverse Cumulative Distribution Functions).

Statistics Procedures

Several procedure files are provided to extend the built-in statistics capability described in this help file. Some of these procedure files provide control-panel user interfaces to the built-in statistics funtionality. Others extend the functionality.

In the Analysis menu you will find a Statistics item that brings up a submenu. Selecting any item in the submenu will cause all the statistics-related procedure files to be loaded, making them ready to use. Alternativelly, you can load all the statistics procedures by adding the following include statement to the top of your procedure window:

#include <AllStatsProcedures>

Functionality provided by the statistics procedure files includes the 1D Statistics Report package for automatic analysis of single 1D waves, and the ANOVA Power Calculations Panel, as well as functions to create specialized graphs:

StatsAutoCorrPlot()StatsPlotLag()
StatsBoxPlot()StatsProbPlot()
StatsPlotHistogram()

Also included are these of convenience functions:

WM_2MeanConfidenceIntervals()WM_MCPointOnRegressionLines()
WM_2MeanConfidenceIntervals2()WM_MeanConfidenceInterval()
WM_BernoulliCdf()WM_OneTailStudentA()
WM_BinomialPdf()WM_OneTailStudentT()
WM_CIforPooledMean()WM_PlotBiHistogram()
WM_CompareCorrelations()WM_RankForTies()
WM_EstimateMinDetectableDiff()WM_RankLetterGradesWithTies()
WM_EstimateReqSampleSize()WM_RegressionInversePrediction()
WM_EstimateReqSampleSize2()WM_SSEstimatorFunc()
WM_EstimateSampleSizeForDif()WM_SSEstimatorFunc2()
WM_GetANOVA1Power()WM_SSEstimatorFunc3()
WM_GetGeometricAverage()WM_VarianceConfidenceInterval()
WM_GetHarmonicMean()WM_WilcoxonPairedRanks()
WM_GetPooledMean()WM_StatsKaplanMeier()
WM_GetPooledVariance()

Statistics References

Ajne, B., A simple test for uniformity of a circular distribution, Biometrica, 55, 343-354, 1968.

Bradley, J.V., Distribution-Free Statistical Tests, Prentice Hall, Englewood Cliffs, New Jersey, 1968.

Cheung, Y.K., and J.H. Klotz, The Mann Whitney Wilcoxon distribution using linked lists, Statistica Sinica, 7, 805-813, 1997.

Copenhaver, M.D., and B.S. Holland, Multiple comparisons of simple effects in the two-way analysis of variance with fixed effects, Journal of Statistical Computation and Simulation, 30, 1-15, 1988.

Evans, M., N. Hastings, and B. Peacock, Statistical Distributions, 3rd ed., Wiley, New York, 2000.

Fisher, N.I., Statistical Analysis of Circular Data, 295pp., Cambridge University Press, New York, 1995.

Iman, R.L., and W.J. Conover, A measure of top-down correlation, Technometrics, 29, 351-357, 1987.

Kendall, M.G., Rank Correlation Methods, 3rd ed., Griffin, London, 1962.

Klotz, J.H., Computational Approach to Statistics.

Moore, B.R., A modification of the Rayleigh test for vector data, Biometrica, 67, 175-180, 1980.

Press, William H., et al., Numerical Recipes in C, 2nd ed., 994 pp., Cambridge University Press, New York, 1992.

van de Wiel, M.A., and A. Di Bucchianico, Fast computation of the exact null distribution of Spearman's rho and Page's L statistic for samples with and without ties, J. of Stat. Plan. and Inference, 92, 133-145, 2001..

Wallace, D.L., Simplified Beta-Approximation to the Kruskal-Wallis H Test, Jour. Am. Stat. Assoc., 54, 225-230.

Zar, J.H., Biostatistical Analysis, 4th ed., 929 pp., Prentice Hall, Englewood Cliffs, New Jersey, 1999.