We have a single binary categorical variable. We want to perform inference on the true proportion of the population, \(p\), that fall into one value (e.g. “yes” on a survey). We cannot know this value with certainty, so we estimate by drawing a sample of size \(n\). We record \(X\) successes from our sample. Let \(\hat{p}=\frac{X}{n}\), our sample proportion. \(\hat{p}\) is an unbiased estimator of \(p\).
This technique relies upon the normal approximation to the binomial distribution. For this to hold, the following conditions must hold:
Where \(p_0\) is our supposed value of \(p\) under the null hypothesis.
If we are constructing a confidence interval for \(p\), the preconditions are slightly different. Here, the normal approximation to the binomial distribution still needs to hold, but it applies to our sample proportion, \(\hat{p}\) rather than \(p_0\). The following conditions must hold:
Note that this is equivalent to having at least 10 observations corresponding to either value (e.g., at least 10 “yes” and at least 10 “no”).
Null hypothesis, \(H_0\): \(p = p_0\)
Alternative hypothesis, \(H_a\): \(p \ne/>/< p_0\)
\[ z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \]
Where Z is a standard normal random variable,
p = stats.norm.cdf(z)
p = 1 - stats.norm.cdf(z)
p = 2 * stats.norm.cdf(-abs(z))
\[ \text{C% confidence interval} = \hat{p} \pm z^\star \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\\ \text{choose } z^\star \text{ s.t. area on standard normal distribution from }(-z^\star,z^\star)\text{ = C} \]
See the preconditions for the confidence interval.
\[ n = (\frac{z^\star}{m^\star})^2 p^\star (1-p^\star) \]
\(p^\star\) is an “educated guess” about the value of \(p\). If you have access to a sample proportion, \(\hat{p}\), set \(p^\star=\hat{p}\). Otherwise, a conservative approach is to set \(p^\star = 0.5\).
If you want to compare proportions across two independent populations, a two-sample z-test for proportions is appropriate.
A one-sample z-test for proportions conducted at confidence level \(\alpha\) will reject the null hypothesis if and only if the value corresponding to the null hypothesis, \(p_0\), is completely outside of the \(C = 1-\alpha\) confidence interval for the true proportion.
Neither the z-test nor the confidence interval use the standard deviation of the sample proportion, \(\sigma_{\hat{p}}\).