We have a single binary categorical variable. We want to perform inference on the true proportion of the population, \(p\), that fall into one value (e.g. “yes” on a survey). We cannot know this value with certainty, so we estimate by drawing a sample of size \(n\). We record \(X\) successes from our sample. Let \(\hat{p}=\frac{X}{n}\), our sample proportion. \(\hat{p}\) is an *unbiased estimator* of \(p\).

This technique relies upon the normal approximation to the binomial distribution. For this to hold, the following conditions must hold:

- \(np_0 \ge 10\)
- \(n(1-p_0) \ge 10\)

Where \(p_0\) is our supposed value of \(p\) under the null hypothesis.

If we are constructing a confidence interval for \(p\), the preconditions are slightly different. Here, the normal approximation to the binomial distribution still needs to hold, but it applies to our sample proportion, \(\hat{p}\) rather than \(p_0\). The following conditions must hold:

- \(n \hat{p} \ge 10\)
- \(n(1-\hat{p}) \ge 10\)

Note that this is equivalent to having at least 10 observations corresponding to either value (e.g., at least 10 “yes” and at least 10 “no”).

**Null hypothesis, \(H_0\):** \(p = p_0\)

**Alternative hypothesis, \(H_a\):** \(p \ne/>/< p_0\)

\[ z = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \]

Where Z is a standard normal random variable,

- For \(H_a : p < p_0\): \(p = P(Z \le z)\)
- For \(H_a : p > p_0\): \(p = P(Z \ge z)\)
- For \(H_a : p \ne p_0\): \(p = 2P(Z \le -|z|)\)

- For \(H_a : p < p_0\):
`p = stats.norm.cdf(z)`

- For \(H_a : p > p_0\):
`p = 1 - stats.norm.cdf(z)`

- For \(H_a : p \ne p_0\):
`p = 2 * stats.norm.cdf(-abs(z))`

\[ \text{C% confidence interval} = \hat{p} \pm z^\star \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\\ \text{choose } z^\star \text{ s.t. area on standard normal distribution from }(-z^\star,z^\star)\text{ = C} \]

See the preconditions for the confidence interval.

\[ n = (\frac{z^\star}{m^\star})^2 p^\star (1-p^\star) \]

\(p^\star\) is an “educated guess” about the value of \(p\). If you have access to a sample proportion, \(\hat{p}\), set \(p^\star=\hat{p}\). Otherwise, a *conservative approach* is to set \(p^\star = 0.5\).

If you want to compare proportions across two independent populations, a two-sample z-test for proportions is appropriate.

A one-sample z-test for proportions conducted at confidence level \(\alpha\) will reject the null hypothesis if and only if the value corresponding to the null hypothesis, \(p_0\), is completely outside of the \(C = 1-\alpha\) confidence interval for the true proportion.

Neither the z-test nor the confidence interval use the standard deviation of the sample proportion, \(\sigma_{\hat{p}}\).

- The hypothesis test uses the standard deviation of the sample proportion under the null hypothesis, \(\sigma^{H_0}_{\hat{p}}\).
- The confidence interval uses the standard error of the sample proportion, \(SE_{\hat{p}}\).