public class GofFormat
extends java.lang.Object
Strictly speaking, applying several tests simultaneously makes the p-values ``invalid'' in the sense that the probability of having at least one p-value less than 0.01, say, is larger than 0.01. One must therefore be careful with the interpretation of these p-values (one could use, e.g., the Bonferroni inequality). Applying simultaneous tests is convenient in some situations, such as in screening experiments for detecting statistical deficiencies in random number generators. In that context, rejection of the null hypothesis typically occurs with extremely small p-values (e.g., less than 10-15), and the interpretation is quite obvious in this case.
The class also provides tools to plot an empirical or
theoretical distribution function, by creating a data file that
contains a graphic plot in a format compatible with the software
specified by the environment variable graphSoft.
Note: This class uses the Colt library.
| Modifier and Type | Field and Description |
|---|---|
static boolean[] |
activeTests
The set of EDF tests that are to be performed when calling
the methods
activeTests, formatActiveTests, etc. |
static int |
AD
Anderson-Darling test
|
static int |
CM
Cramér-von Mises test
|
static int |
COR
Correlation
|
static double |
EPSILONP
Environment variable used in
formatp0 to determine
which p-values are too close to 0 or 1 to be printed explicitly. |
static int |
GNUPLOT
Data file format used for plotting functions with Gnuplot.
|
static int |
graphSoft
Environment variable that selects the type of software to be
used for plotting the graphs of functions.
|
static int |
KS
Kolmogorov-Smirnov test
|
static int |
KSM
Kolmogorov-Smirnov- test
|
static int |
KSP
Kolmogorov-Smirnov+ test
|
static int |
MATHEMATICA
Data file format used for creating graphics with Mathematica.
|
static int |
MEAN
Mean
|
static int |
NTESTTYPES
Total number of test types
|
static double |
SUSPECTP
Environment variable used in
formatp1 to determine
which p-values should be marked as suspect when printing test results. |
static java.lang.String[] |
TESTNAMES
Name of each testType test.
|
static int |
WG
Watson G test
|
static int |
WU
Watson U test
|
| Modifier and Type | Method and Description |
|---|---|
static void |
activeTests(DoubleArrayList data,
ContinuousDistribution dist,
double[] sVal,
double[] pVal)
The observations are in data,
not necessarily sorted, and we want to compare their empirical
distribution with the distribution dist.
|
static void |
activeTests(DoubleArrayList sortedData,
double[] sVal,
double[] pVal)
Computes the EDF test statistics by calling
tests, then computes the p-values of those
that currently belong to activeTests,
and return these quantities in sVal and pVal, respectively. |
static java.lang.String |
drawCdf(ContinuousDistribution dist,
double a,
double b,
int m,
java.lang.String desc)
Formats data to plot the graph of the distribution function F over the
interval [a, b], and returns the result as a
String. |
static java.lang.String |
drawDensity(ContinuousDistribution dist,
double a,
double b,
int m,
java.lang.String desc)
Formats data to plot the graph of the density f (x) over the interval [a, b],
and returns the result as a
String. |
static java.lang.String |
formatActiveTests(int n,
double[] sVal,
double[] pVal)
Gets the p-values of the active EDF test statistics,
which are in activeTests.
|
static java.lang.String |
formatChi2(int k,
int d,
double chi2)
Computes the p-value of the chi-square statistic
chi2 for a test with k intervals.
|
static java.lang.String |
formatKS(DoubleArrayList data,
ContinuousDistribution dist)
Computes the KS test statistics to compare the
empirical distribution of the observations in data
with the theoretical distribution dist and
formats the results.
|
static java.lang.String |
formatKS(int n,
double dp,
double dm,
double d)
Computes the p-values of the three Kolmogorov-Smirnov statistics
DN+, DN-, and DN, whose values are in dp, dm, d,
respectively, assuming a sample of size n.
|
static java.lang.String |
formatKSJumpOne(DoubleArrayList data,
ContinuousDistribution dist,
double a)
Similar to
formatKS,
but for DN+(a). |
static java.lang.String |
formatKSJumpOne(int n,
double a,
double dp)
Similar to
formatKS,
but for the KS statistic DN+(a). |
static java.lang.String |
formatp0(double p)
Returns the significance level (or p-value) p of a test,
in the format ``1 - p'' if p is close to 1, and p otherwise.
|
static java.lang.String |
formatp1(double p)
Returns the string ``Significance level of test : '',
then calls
formatp0 to print p, and adds
the marker ``****'' if p is considered suspect
(uses the environment variable RSUSPECTP for this). |
static java.lang.String |
formatp2(double x,
double p)
Returns x on a single line, then go to the next line
and calls
formatp1. |
static java.lang.String |
formatp3(java.lang.String testName,
double x,
double p)
Formats the test statistic x for a test named testName
with p-value p.
|
static java.lang.String |
graphDistUnif(DoubleArrayList data,
java.lang.String desc)
Formats data to plot the empirical distribution of
U(1),..., U(N), which are assumed to be in data[0...N-1],
and to compare it with the uniform distribution.
|
static java.lang.String |
graphFunc(ContinuousDistribution dist,
double a,
double b,
int m,
int mono,
java.lang.String desc)
Deprecated.
|
static java.lang.String |
iterPowRatioTests(DoubleArrayList sortedData,
int k,
boolean printval,
boolean graph,
java.io.PrintWriter f)
Similar to
iterSpacingsTests, but with the
GofStat.powerRatios transformation. |
static java.lang.String |
iterSpacingsTests(DoubleArrayList sortedData,
int k,
boolean printval,
boolean graph,
java.io.PrintWriter f)
Repeats the following k times:
Applies the
GofStat.iterateSpacings
transformation to the
U(0),..., U(N-1), assuming that these observations are in
sortedData, then computes the EDF test statistics and calls
activeTests after each transformation. |
static void |
tests(DoubleArrayList data,
ContinuousDistribution dist,
double[] sVal)
The observations V are in data,
not necessarily sorted, and their empirical
distribution is compared with the continuous distribution dist.
|
static void |
tests(DoubleArrayList sortedData,
double[] sVal)
Computes all EDF test statistics
to compare the empirical
distribution of
U(0),..., U(N-1) with the uniform distribution,
assuming that these sorted observations are in sortedData.
|
public static final int GNUPLOT
public static final int MATHEMATICA
public static int graphSoft
graphFunc and
graphDistUnif will be in a format suitable
for this selected software.
The default value is GNUPLOT.
To display a graphic in file f using gnuplot, for example,
one can use the command ``plot f with steps, x with lines''
in gnuplot.
graphSoft can take the values GNUPLOT or MATHEMATICA.public static double EPSILONP
formatp0 to determine
which p-values are too close to 0 or 1 to be printed explicitly.
If EPSILONP
= ε, then any p-value
(or significance level) less than ε or larger than
1 - ε is not written explicitly;
the program simply writes ``eps'' or ``1-eps''.
The default value is 10-15.public static double SUSPECTP
formatp1 to determine
which p-values should be marked as suspect when printing test results.
If SUSPECTP = α, then any p-value
(or significance level) less than α or larger than
1 - α is considered suspect and is
``singled out'' by formatp1.
The default value is 0.01.public static final int KSP
public static final int KSM
public static final int KS
public static final int AD
public static final int CM
public static final int WG
public static final int WU
public static final int MEAN
public static final int COR
public static final int NTESTTYPES
public static final java.lang.String[] TESTNAMES
public static boolean[] activeTests
activeTests, formatActiveTests, etc.
By default, this set contains KSP, KSM,
and AD. Note: MEAN and COR are always excluded
from this set of active tests.
The valid indices for this array are KSP, KSM,
KS, AD, CM, WG,
WU, MEAN, and COR.@Deprecated public static java.lang.String graphFunc(ContinuousDistribution dist, double a, double b, int m, int mono, java.lang.String desc)
drawCdf instead.
Formats data to plot the graph of the distribution function F (or bar(F))
over the interval [a, b], and returns the result as a String.
dist.cdf(x) (or dist.barF(x)) returns the value of F
(or bar(F)) at x,
and that F is either non-decreasing or non-increasing.
If mono = 1, the method will verify that F is non-decreasing;
if mono = -1, it will verify that bar(F) is non-increasing.
(This is useful to verify if F is effectively a sensible
approximation to a distribution function or its complementary
in the given interval.)
The String desc gives a short caption for the graphic plot.
The method computes the m + 1 points
(xi, F(xi)),
where
xi = a + i(b - a)/m for
i = 0, 1,…, m, and formats these points
into a String in a format suitable for the
software specified by graphSoft.dist - continuous distribution function to plota - lower bound of the interval to plotb - upper bound of the interval to plotm - number of points in the plot minus onemono - 1 for plotting a distribution function, -1 if for a complementary
distribution functiondesc - short caption describing the plotpublic static java.lang.String drawCdf(ContinuousDistribution dist, double a, double b, int m, java.lang.String desc)
String.
The method dist.cdf(x) returns the value of F at x.
The String desc gives a short caption for the graphic plot.
The method computes the m + 1 points
(xi, F(xi)),
where
xi = a + i(b - a)/m for
i = 0, 1,…, m, and formats these points
into a String in a format suitable for the
software specified by graphSoft.dist - continuous distribution function to plota - lower bound of the interval to plotb - upper bound of the interval to plotm - number of points in the plot minus onedesc - short caption describing the plotpublic static java.lang.String drawDensity(ContinuousDistribution dist, double a, double b, int m, java.lang.String desc)
String. The method
dist.density(x) returns the value of f (x) at x.
The String desc gives a short caption for the graphic
plot. The method computes the m + 1 points
(xi, f (xi)),
where
xi = a + i(b - a)/m for
i = 0, 1,…, m, and formats these points
into a String in a format suitable for the
software specified by graphSoft.dist - continuous density function to plota - lower bound of the interval to plotb - upper bound of the interval to plotm - number of points in the plot minus onedesc - short caption describing the plotpublic static java.lang.String graphDistUnif(DoubleArrayList data,
java.lang.String desc)
graphSoft.data - array of observations to plotdesc - short caption describing the plotpublic static java.lang.String formatp0(double p)
EPSILONP and replaces p
by ε when it is too small.p - the p-value or significance level to be formatedpublic static java.lang.String formatp1(double p)
formatp0 to print p, and adds
the marker ``****'' if p is considered suspect
(uses the environment variable RSUSPECTP for this).p - the p-value or significance level to be formatedpublic static java.lang.String formatp2(double x,
double p)
formatp1.x - value of the statistic for which the significance level is formatedp - the p-value or significance level to be formatedpublic static java.lang.String formatp3(java.lang.String testName,
double x,
double p)
testName - name of the test that was performedx - value of the test statisticp - significance level (or p-value) of the testpublic static java.lang.String formatChi2(int k,
int d,
double chi2)
pDisc.k - number of subintervals for the chi-square testchi2 - chi-square statisticpublic static java.lang.String formatKS(int n,
double dp,
double dm,
double d)
formatp2 for each one.n - sample sizedp - value of the DN+ statisticdm - value of the DN- statisticd - value of the DN statisticpublic static java.lang.String formatKS(DoubleArrayList data,
ContinuousDistribution dist)
data - array of observations to be testeddist - assumed distribution of the observationspublic static java.lang.String formatKSJumpOne(int n,
double a,
double dp)
formatKS,
but for the KS statistic DN+(a).
Writes a header,
computes the p-value and calls formatp2.n - sample sizea - size of the jumpdp - value of DN+(a)public static java.lang.String formatKSJumpOne(DoubleArrayList data,
ContinuousDistribution dist,
double a)
formatKS,
but for DN+(a).data - array of observations to be testeddist - assumed distribution of the dataa - size of the jumppublic static void tests(DoubleArrayList sortedData,
double[] sVal)
GofStat.sortedData - array of sorted observationssVal - array that will be filled with the results of the testspublic static void tests(DoubleArrayList data,
ContinuousDistribution dist,
double[] sVal)
data - array of observations to testdist - assumed distribution of the observationssVal - array that will be filled with the results of the testspublic static void activeTests(DoubleArrayList sortedData,
double[] sVal,
double[] pVal)
tests, then computes the p-values of those
that currently belong to activeTests,
and return these quantities in sVal and pVal, respectively.
Assumes that
U(0),..., U(N-1) are in sortedData
and that we want to compare their empirical distribution
with the uniform distribution.
If N = 1, only puts 1 -sortedData.get (0) in
sVal[KSP], pVal[KSP], and pVal[MEAN].sortedData - array of sorted observationssVal - array that will be filled with the results of the testspVal - array that will be filled with the p-valuespublic static void activeTests(DoubleArrayList data,
ContinuousDistribution dist,
double[] sVal,
double[] pVal)
data - array of observations to testdist - assumed distribution of the observationssVal - array that will be filled with the results of the testspVal - array that will be filled with the p-valuespublic static java.lang.String formatActiveTests(int n,
double[] sVal,
double[] pVal)
formatp2 for each one.
If n=1, prints only pVal[KSP] using formatp1.n - sample sizesVal - array containing the results of the testspVal - array containing the p-valuespublic static java.lang.String iterSpacingsTests(DoubleArrayList sortedData,
int k,
boolean printval,
boolean graph,
java.io.PrintWriter f)
GofStat.iterateSpacings
transformation to the
U(0),..., U(N-1), assuming that these observations are in
sortedData, then computes the EDF test statistics and calls
activeTests after each transformation.
The function returns the original array sortedData (the
transformations are applied on a copy of sortedData).
If printval = true, stores all the values into the returned
String after each iteration.
If graph = true, calls graphDistUnif after each iteration
to print to stream f the data for plotting the distribution
function of the Ui.sortedData - array containing the sorted observationsk - number of times the tests are appliedprintval - if true, stores all the values of the observations at each iterationgraph - if true, the distribution of the Ui will be plotted after each
iterationf - stream where the plots are written topublic static java.lang.String iterPowRatioTests(DoubleArrayList sortedData,
int k,
boolean printval,
boolean graph,
java.io.PrintWriter f)
iterSpacingsTests, but with the
GofStat.powerRatios transformation.sortedData - array containing the sorted observationsk - number of times the tests are appliedprintval - if true, stores all the values of the observations at each iterationgraph - if true, the distribution of the Ui will be plotted after each
iterationf - stream where the plots are written toTo submit a bug or ask questions, send an e-mail to Pierre L'Ecuyer.