Call it chance or whatever, but whenever these kind of tasks came up I hear people talking about the t-tests only. No issues as long as you want to compare means or when your target variable is a continuous value. But how or why do people talk about the t-test when they want to compare ratios or proportions? Whatever happened to the Chi-Square tests or the Z-test for difference in proportions?

I did a bit of research on the net, a bit of calculation using pen and paper [very good exercise for the brain in this age of calculators and spreadsheets :-) ], read a very good article by Gerard E. Dallal, and I found the answers.

Going back to our introductory class in statistics, let’s check out the formulae for the t-tests.

**1. Assuming that the population variances are equal,**

T = (X

T = (X

_{1}– X_{2})/sqrt (Sp^{2}(1/n_{1}+ 1/n_{2}) ..........Equation 1where

X

_{1}, X

_{2}= means of sample 1 and 2

n

_{1}, n

_{2}= size of sample 1 and 2

Sp

^{2}= pooled variance = [((n

_{1}-1)S

_{1}

^{2}+(n

_{2}-1)S

_{2}

^{2})/(n

_{1}+n

_{2}-2)]

**2. Assuming that the population variances are not equal,**

T = (X

T = (X

_{1}– X_{2})/sqrt(S_{1}^{2}/n_{1}+ S_{2}^{2}/n_{2}) ..........Equation 2We have also been taught that the test statistic Z is used to determine the difference between two population proportions based on the difference between the two sample proportions.

And the formula for the Z statistic is given by

**Z = (P**

_{1}– P_{2})/ sqrt(P(1-P)(1/n_{1}+ 1/n_{2})) ..........Equation 3where

P

_{1}, P

_{2}= proportions of success (or target category) in samples 1 and 2

S

_{1}, S

_{2}= variances for samples 1 and 2

n

_{1}, n

_{2}= size of samples 1 and 2

P = pooled estimate of the sample proportion of successes =(X

_{1}+ X

_{2})/(n

_{1}+ n

_{2})

X

_{1}, X

_{2}= number of successes (or target category) in samples 1 and 2

The test statistic Z (equation 3) is equivalent to the chi- square goodness-of-fit test, also called a test of homogeneity of proportions.

But how different is the proportions from means? The proportion having the desired outcome is the number of individuals/observations with the outcome divided by total number of individuals/observations. Suppose we create a variable that equals 1 if the subject has the outcome and 0 if not. The proportion of individuals/observations with the outcome is the mean of this variable because the sum of these 0s and 1s is the number of individuals/observations with the outcome.

Let's suppose there are m 1s and (n-m) 0s among the n observations. Then, X

_{Mean}(=P) = m/n and X

_{i}- X

_{Mean}is equal to (1-m/n) for m observations and 0-m/n for (n-m) observations. When these results are combined, the final result is

∑(X

_{i}– X

_{Mean})

^{2}= m(1-m/n)

^{2}+ (n – m) (0 – m/n)

^{2}

= m(1 – 2m/n + m

^{2}/n

^{2}) + (n – m) m

^{2}/n

^{2}

= m – 2(m

^{2}/n

^{2}) + (m

^{3}/n

^{2}) + (m

^{2}/n) – (m

^{3}/n

^{2})

= m – (m

^{2}/n)

= m(1-m/n)

= nP(1-P)

So, variance = ∑(X

_{i}– X

_{Mean})

^{2}/n = P(1-P)

Substituting this in the equation 3 (for Z statistic), we get

(P

_{1}– P

_{2})/ sqrt(Variance/n

_{1}+ Variance/n

_{2})), which is not so different from equation 2 (the formula for the "equal variances not assumed" version of t test).

As long as the sample size is relatively large, the distributional assumptions are met, and the response is binomial – the t test and the z test will give p-values that are very close to one another.

And in the case where we have only two categories, the z test and the chi-square test turn out to be exactly equivalent, though the chi-square is by nature a two-tailed test. The chi-square distribution for 1 df is just the square of the z distribution.

The various tests and their assumptions as listed in Wikipedia are given below:

**1. Two-sample pooled t-test, equal variances**

(Normal populations or n1 + n2 > 40) and independent observations and σ1 = σ2 and (σ1 and σ2 unknown)

**2. Two-sample unpooled t-test, unequal variances**

(Normal populations or n1 + n2 > 40) and independent observations and σ1 ≠ σ2 and (σ1 and σ2 unknown)

**3. Two-proportion z-test, equal variances**

n1 p1 > 5 and n1(1 − p1) > 5 and n2 p2 > 5 and n2(1 − p2) > 5 and independent observations

**4. Two-proportion z-test, unequal variances**

n1 p1 > 5 and n1(1 − p1) > 5 and n2 p2 > 5 and n2(1 − p2) > 5 and independent observations

## No comments:

Post a Comment