Margin of error explained

The margin of error is a statistic expressing the amount of random sampling error in the results of a survey. The larger the margin of error, the less confidence one should have that a poll result would reflect the result of a census of the entire population. The margin of error will be positive whenever a population is incompletely sampled and the outcome measure has positive variance, which is to say, whenever the measure varies.

The term margin of error is often used in non-survey contexts to indicate observational error in reporting measured quantities.

Concept

Consider a simple yes/no poll

as a sample of

respondents drawn from a population

N,(n\llN)

reporting the percentage

of yes responses. We would like to know how close

is to the true result of a survey of the entire population

, without having to conduct one. If, hypothetically, we were to conduct a poll

over subsequent samples of

respondents (newly drawn from

), we would expect those subsequent results

p_1,p_2,\ldots

to be normally distributed about

\overline{p}

, the true but unknown percentage of the population. The margin of error describes the distance within which a specified percentage of these results is expected to vary from

\overline{p}

Going by the Central limit theorem, the margin of error helps to explain how the distribution of sample means (or percentage of yes, in this case) will approximate a normal distribution as sample size increases. If this applies, it would speak about the sampling being unbiased, but not about the inherent distribution of the data.^[1]

According to the 68-95-99.7 rule, we would expect that 95% of the results

p_1,p_2,\ldots

will fall within about two standard deviations (

\plusmn2\sigma_P

) either side of the true mean

\overline{p}

. This interval is called the confidence interval, and the radius (half the interval) is called the margin of error, corresponding to a 95% confidence level.

Generally, at a confidence level

\gamma

, a sample sized

of a population having expected standard deviation

\sigma

has a margin of error

MOE_\gamma=z_\gamma x \sqrt{

	\sigma²
	n

}

where

z_\gamma

denotes the quantile (also, commonly, a z-score), and

\sqrt{	\sigma²
	n

} is the standard error.

Standard deviation and standard error

We would expect the average of normally distributed values

p_1,p_2,\ldots

to have a standard deviation which somehow varies with

. The smaller

, the wider the margin. This is called the standard error

\sigma_\overline{p}

For the single result from our survey, we assume that

p=\overline{p}

, and that all subsequent results

p_1,p_2,\ldots

together would have a variance

	2=P(1-P)
\sigma
	P

Standarderror=\sigma_\overline{p} ≈ \sqrt{

	2
\sigma
	P

} \approx \sqrt

Note that

p(1-p)

corresponds to the variance of a Bernoulli distribution.

Maximum margin of error at different confidence levels

For a confidence level

\gamma

, there is a corresponding confidence interval about the mean

\mu\plusmnz_\gamma\sigma

, that is, the interval

[\mu-z_{\gamma\sigma,\mu+z}_{\gamma\sigma]}

within which values of

should fall with probability

\gamma

. Precise values of

z_\gamma

are given by the quantile function of the normal distribution (which the 68–95–99.7 rule approximates).

Note that

z_\gamma

is undefined for

|\gamma|\ge1

, that is,

z_1.00

is undefined, as is

z_1.10

\gamma	z_\gamma	\gamma	z_\gamma
0.84		0.9995
0.95		0.99995
0.975	1.959963984540	0.999995
0.99		0.9999995
0.995		0.99999995
0.9975		0.999999995
0.9985		0.9999999995

Since

max

	2
\sigma
	P

=maxP(1-P)=0.25

p=0.5

, we can arbitrarily set

p=\overline{p}=0.5

, calculate

\sigma_P

\sigma_\overline{p}

, and

z_\gamma\sigma_\overline{p}

to obtain the maximum margin of error for

at a given confidence level

\gamma

and sample size

, even before having actual results. With

p=0.5,n=1013

MOE₉₅(0.5)=z_0.95\sigma_\overline{p} ≈ z_0.95\sqrt{

	2
\sigma
	P

} = 1.96\sqrt = 0.98/\sqrt=\plusmn3.1%

MOE₉₉(0.5)=z_0.99\sigma_\overline{p} ≈ z_0.99\sqrt{

	2
\sigma
	P

} = 2.58\sqrt = 1.29/\sqrt=\plusmn4.1%

Also, usefully, for any reported

MOE₉₅

MOE₉₉=

	z_0.99
	z_0.95

MOE₉₅ ≈ 1.3 x MOE₉₅

Specific margins of error

If a poll has multiple percentage results (for example, a poll measuring a single multiple-choice preference), the result closest to 50% will have the highest margin of error. Typically, it is this number that is reported as the margin of error for the entire poll. Imagine poll

reports

p_a,p_b,p_c

71%,27%,2%,n=1013

MOE₉₅(P_a)=z_0.95\sigma_\overline{p_a

} \approx 1.96\sqrt = 0.89/\sqrt=\plusmn2.8% (as in the figure above)

MOE₉₅(P_b)=z_0.95\sigma_\overline{p_b

} \approx 1.96\sqrt = 0.87/\sqrt=\plusmn2.7%

MOE₉₅(P_c)=z_0.95\sigma_\overline{p_c

} \approx 1.96\sqrt = 0.27/\sqrt=\plusmn0.8%

As a given percentage approaches the extremes of 0% or 100%, its margin of error approaches ±0%.

Comparing percentages

Imagine multiple-choice poll

reports

p_a,p_b,p_c

46%,42%,12%,n=1013

. As described above, the margin of error reported for the poll would typically be

MOE₉₅(P_a)

, as

p_a

is closest to 50%. The popular notion of statistical tie or statistical dead heat, however, concerns itself not with the accuracy of the individual results, but with that of the ranking of the results. Which is in first?

If, hypothetically, we were to conduct a poll

over subsequent samples of

respondents (newly drawn from

), and report the result

p_w=p_a-p_b

, we could use the standard error of difference to understand how

p
	w₁

,p
	w₂

,p
	w₃

,\ldots

is expected to fall about

\overline{p_w}

. For this, we need to apply the sum of variances to obtain a new variance,

	2
\sigma
	P_w

	2
\sigma
	P_a-P_b

	2
\sigma
	P_a

	2-2\sigma
\sigma
	P_a,P_b

=p_a(1-p_a)+p_b(1-p_b)+2p_ap_b

where

\sigma
	P_a,P_b

=-P_aP_b

is the covariance of

P_a

and

P_b

Thus (after simplifying),

Standarderrorofdifference=\sigma_\overline{w

} \approx \sqrt = \sqrt = 0.029, P_=P_-P_

MOE₉₅(P_a)=z_0.95

\sigma
	\overline{p_a

} \approx \plusmn

MOE₉₅(P_w)=z_0.95\sigma_\overline{w

} \approx \plusmn

Note that this assumes that

P_c

is close to constant, that is, respondents choosing either A or B would almost never choose C (making

P_a

and

P_b

close to perfectly negatively correlated). With three or more choices in closer contention, choosing a correct formula for

	2
\sigma
	P_w

becomes more complicated.

Effect of finite population size

The formulae above for the margin of error assume that there is an infinitely large population and thus do not depend on the size of population

, but only on the sample size

. According to sampling theory, this assumption is reasonable when the sampling fraction is small. The margin of error for a particular sampling method is essentially the same regardless of whether the population of interest is the size of a school, city, state, or country, as long as the sampling fraction is small.

In cases where the sampling fraction is larger (in practice, greater than 5%), analysts might adjust the margin of error using a finite population correction to account for the added precision gained by sampling a much larger percentage of the population. FPC can be calculated using the formula^[2]

\operatorname{FPC}=\sqrt{

	N-n
	N-1

}

...and so, if poll

were conducted over 24% of, say, an electorate of 300,000 voters,

MOE₉₅(0.5)=z_0.95\sigma_\overline{p} ≈

	0.98
	\sqrt{72,000

}=\plusmn0.4%

MOE
	95_FPC

(0.5)=z_0.95

\sigma

\overline{p}\sqrt{	N-n
	N-1

}\approx \frac\sqrt=\plusmn0.3%

Intuitively, for appropriately large

\lim_n\sqrt{

	N-n
	N-1

}\approx 1

\lim_n\sqrt{

	N-n
	N-1

} = 0

In the former case,

is so small as to require no correction. In the latter case, the poll effectively becomes a census and sampling error becomes moot.

Sources

Sudman, Seymour and Bradburn, Norman (1982). Asking Questions: A Practical Guide to Questionnaire Design. San Francisco: Jossey Bass.
Book: Introductory Statistics. 5th. Wonnacott, T.H. . R.J. Wonnacott. Wiley . 1990 . 0-471-61518-8 .

Notes and References

Web site: Siegfried . Tom . 2014-07-03 . Scientists’ grasp of confidence intervals doesn’t inspire confidence Science News . 2024-08-06 . Science News . en-US.
Isserlis. L.. 1918. On the value of a mean as calculated from a sample. Journal of the Royal Statistical Society. Blackwell Publishing. 81. 1. 75–81. 10.2307/2340569. 2340569. (Equation 1)

Margin of error explained

Concept

Standard deviation and standard error

Maximum margin of error at different confidence levels

Specific margins of error

Comparing percentages

Effect of finite population size

See also

Sources

Notes and References