Shapiro–Francia test explained
The Shapiro–Francia test is a statistical test for the normality of a population, based on sample data. It was introduced by S. S. Shapiro and R. S. Francia in 1972 as a simplification of the Shapiro–Wilk test.[1]
Theory
Let
be the
-th ordered value from our size-
sample. For example, if the sample consists of the values
\left\{5.6,-1.2,7.8,3.4\right\}
,
, because that is the second-lowest value. Let
be the
mean of the
th
order statistic when making
independent draws from a
normal distribution. For example,
, meaning that the second-lowest value in a sample of four draws from a normal distribution is typically about 0.297 standard deviations below the mean.
[2] Form the
Pearson correlation coefficient between the
and the
:
W'=
m)}{\sigmax\sigmam}=
(mi-\bar{m})}{\sqrt{\left(
(x(i)-\bar{x})2\right)\left(
(mi-\bar{m})2\right)}}
Under the null hypothesis that the data is drawn from a normal distribution, this correlation will be strong, so
values will cluster just under 1, with the peak becoming narrower and closer to 1 as
increases. If the data deviate strongly from a normal distribution,
will be smaller.
This test is a formalization of the older practice of forming a Q–Q plot to compare two distributions, with the
playing the role of the quantile points of the sample distribution and the
playing the role of the corresponding quantile points of a
normal distribution.
Compared to the Shapiro–Wilk test statistic
, the Shapiro–Francia test statistic
is easier to compute, because it does not require that we form and invert the matrix of covariances between order statistics.
Practice
There is no known closed-form analytic expression for the values of
required by the test. There, are however, several approximations that are adequate for most practical purposes.
The exact form of the null distribution of
is known only for
.
Monte-Carlo simulations have shown that the transformed statistic
is nearly normally distributed, with values of the mean and standard deviation that vary slowly with
in an easily parameterized form.
[3] Power
Comparison studies have concluded that order statistic correlation tests such as Shapiro–Francia and Shapiro–Wilk are among the most powerful of the established statistical tests for normality.[4] One might assume that the covariance-adjusted weighting of different order statistics used by the Shapiro–Wilk test should make it slightly better, but in practice the Shapiro–Wilk and Shapiro–Francia variants are about equally good. In fact, the Shapiro–Francia variant actually exhibits more power to distinguish some alternative hypothesis.[5]
References
- Shapiro . S. S. . Francia . R. S. . An Approximate Analysis of Variance Test for Normality . Journal of the American Statistical Association . 1972-03-01 . 67 . 337 . 215–216 . 10.2307/2284728 . . 1537-274X . 1480864 . 2284728.
- Book: Arnold . Barry C. . Balakrishnan . Narayanaswamy . Nagaraja . Haikady N. . A First Course in Order Statistics . 2008 . . Philadelphia, PA . 978-0-89871-648-1 . 2008061100 . Classics in Applied Mathematics . 54 . 1992.
- Royston . Patrick . A Toolkit for Testing for Non-Normality in Complete and Censored Samples . The Statistician . 1993 . 42 . 1 . 37–43 . 10.2307/2348109 . . 2348109.
- Razali . Nornadiah Mohd . Wah . Yap Bee . Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors, and Anderson–Darling Tests . Journal of Statistical Modeling and Analytics . 2011 . 2 . 1 . 21–33 . Institut Statistik Malaysia . Kuala Lumpur . 978-967-363-157-5.
- Ahmad . Fiaz . Khan . Rehan Ahmad . A power comparison of various normality tests . Pakistan Journal of Statistics and Operation Research . 2015 . 11 . 3 . 331–345 . 10.18187/pjsor.v11i3.845 . . Lahore, Pakistan . 2220-5810. free .