Data Screening
(EDA) Interpretation Guide__ __

Measures of Homogeneity of Variance

The **Levene test** is
a homogeneity-of-variance test that is less dependent on the assumption of normality
than most tests and thus is particularly useful with analysis of variance. It
is obtained by computing, for each case, the absolute differences from its cell
mean and performing a one-way analysis of variance on these differences. If
the Levene test statistic is significant then the groups are not homogeneous
and we may need to consider transforming the original data or using a non-parametric
statistic.

A **variance ratio analysis**
can be obtained by dividing the lowest variance of a group for two groups into
the highest group variance of the two group variances. Concern arises if the
resulting ratio is 4-5 + which indicates that the largest variance is 4 to 5
times the smallest variance.

Also, you can eyeball the similarity in heights (50% of cases) of the comparative graph of the groups’ box plots. Additionally, you can look for similarities (values that are close) of the group standard deviations.

Measures of Normality

In a **normal probability
plot**, each observed value is paired with its expected value from the normal
distribution. The expected value from the normal distribution is based on the
number of cases in the sample and the rank order of the case in the sample.
If the sample is from a normal distribution, we expect that the points will
fall more or less on a straight line.

A **detrended normal plot**
are the actual deviations of the points from a straight line. If the sample
is from a normal population, the points should cluster around a horizontal line
through 0, and there should be no pattern. A striking pattern suggest departure
from normality.

The **Shapiro-Wilks’**
**test** and the **Lilliefors test** are statistical tests that test the
hypothesis that the data are from a normal distribution. If either test is significant
then the data is not normally distributed. It is important to remember that
whenever the sample size is large, almost any goodness-of-fit test will result
in rejection of the null hypothesis since it is almost impossible to find data
that are exactly normally distributed. For most statistical tests, it is sufficient
that the data are approximately normally distributed.

A distribution that is not
symmetric but has more cases (more of a "tail") toward one end of
the distribution than the other is said to be **skewed**.

Value of 0 = normal

Positive Value = positive skew

Negative Value = negative skew

Concern arises when skewness value is greater than plus or minus 2.50-3.00.

**Kurtosis** is the relative
concentration of scores in the center, the upper and lower ends (tails) and
the shoulders (between the center and the tails) of a distribution.

Value of 0 = normal-mesokurtic

Positive Value = leptokurtic

Negative Value = platykurtic

Concern arises when kurtosis value is greater than plus or minus 2.50-3.00.