In statistics, a sampling frame is the source material or device from which a sample is drawn.[1] It is a list of all those within a population who can be sampled, and may include individuals, households or institutions.[1]
Importance of the sampling frame is stressed by Jessen[2] and Salant and Dillman.[3]
A slightly more general concept of sampling frame includes area sampling frames, whose elements have a geographic nature. Area sampling frames can be useful for example in agricultural statistics when a suitable and updated agricultural census is not available. In environmental surveys, area sampling frames may be the only option.
In the most straightforward cases, such as when dealing with a batch of material from a production run, or using a census, it is possible to identify and measure every single item in the population and to include any one of them in our sample; this is known as direct element sampling.[1] However, in many other cases this is not possible; either because it is cost-prohibitive (reaching every citizen of a country) or impossible (reaching all humans alive).
Having established the frame, there are a number of ways for organizing it to improve efficiency and effectiveness. It's at this stage that the researcher should decide whether the sample is in fact to be the whole population and would therefore be a census.
This list should also facilitate access to the selected sampling units. A frame may also provide additional 'auxiliary information' about its elements; when this information is related to variables or groups of interest, it may be used to improve survey design. While not necessary for simple sampling, a sampling frame used for more advanced sample techniques, such as stratified sampling, may contain additional information (such as demographic information).[1] For instance, an electoral register might include name and sex; this information can be used to ensure that a sample taken from that frame covers all demographic categories of interest. (Sometimes the auxiliary information is less explicit; for instance, a telephone number may provide some information about location.
An ideal sampling frame will have the following qualities:[1]
The most straightforward type of frame is a list of elements of the population (preferably the entire population) with appropriate contact information. For example, in an opinion poll, possible sampling frames include an electoral register or a telephone directory. Other sampling frames can include employment records, school class lists, patient files in a hospital, organizations listed in a thematic database, and so on.[1] [5] On a more practical levels, sampling frames have the form of computer files.[1]
Not all frames explicitly list population elements; some list only 'clusters'. For example, a street map can be used as a frame for a door-to-door survey; although it doesn't show individual houses, we can select streets from the map and then select houses on those streets. This offers some advantages: such a frame would include people who have recently moved and are not yet on the list frames discussed above, and it may be easier to use because it doesn't require storing data for every unit in the population, only for a smaller number of clusters.
The sampling frame must be representative of the population and this is a question outside the scope of statistical theory demanding the judgment of experts in the particular subject matter being studied. All the above frames omit some people who will vote at the next election and contain some people who will not; some frames will contain multiple records for the same person. People not in the frame have no prospect of being sampled.
Because a cluster-based frame contains less information about the population, it may place constraints on the sample design, possibly requiring the use of less efficient sampling methods and/or making it harder to interpret the resulting data.
Statistical theory tells us about the uncertainties in extrapolating from a sample to the frame. It should be expected that sample frames, will always contain some mistakes.[5] In some cases, this may lead to sampling bias.[1] Such bias should be minimized, and identified, although avoiding it completely in a real world is nearly impossible.[1] One should also not assume that sources which claim to be unbiased and representative are such.[1]
In defining the frame, practical, economic, ethical, and technical issues need to be addressed. The need to obtain timely results may prevent extending the frame far into the future. The difficulties can be extreme when the population and frame are disjoint. This is a particular problem in forecasting where inferences about the future are made from historical data. In fact, in 1703, when Jacob Bernoulli proposed to Gottfried Leibniz the possibility of using historical mortality data to predict the probability of early death of a living man, Gottfried Leibniz recognized the problem in replying:[6]
Leslie Kish posited four basic problems of sampling frames:[7]
Problems like those listed can be identified by the use of pre-survey tests and pilot studies.