Sampling and Sample Design
When you collect any sort of data, especially quantitative data, whether observational, through surveys or from secondary data, you need to decide which data to collect and from whom.
This is called the sample.
There are a variety of ways to select your sample, and to make sure that it gives you results that will be reliable and credible.
The difference between population and sample
Ideally, research would collect information from every single member of the population that you are studying. However, most of the time that would take too long and so you have to select a suitable sample: a subset of the population.
Principles Behind Choosing a Sample
The idea behind selecting a sample is to be able to generalise your findings to the whole population, which means that your sample must be:
- Representative of the population. In other words, it should contain similar proportions of subgroups as the whole population, and not exclude any particular groups, either by method of sampling or by design, or by who chooses to respond.
- Large enough to give you enough information to avoid errors. It does not need to be a specific proportion of your population, but it does need to be at least a certain size so that you know that your answers are likely to be broadly correct.
If your sample is not representative, you can introduce bias into the study. If it is not large enough, the study will be imprecise.
However, if you get the relationship between sample and population right, then you can draw strong conclusions about the nature of the population.
Sample size: how long is a piece of string?
How large should your sample be? It depends how precise you want the answer. Larger samples generally give more precise answers.
Your desired sample size depends on what you are measuring and the size of the error that you’re prepared to accept. For example:
To estimate a proportion in a population:
Sample size =[ (z-score)² × p(1-p) ] ÷ (margin of error)²
- The margin of error is what you are prepared to accept (usually between 1% and 10%);
- The z-score, also called the z value, is found from statistical tables and depends on the confidence interval chosen (90%, 95% and 99% are commonly used, so choose which one you want);
- p is your estimate of what the proportion is likely to be. You can often estimate p from previous research, but if you can’t do that then use 0.5.
To estimate a population mean:
Margin of error = t × (s ÷ square root of the sample size).
- Margin of error is what you are prepared to accept (usually between 1% and 10%);;
- As long as the sample size is larger than about 30, t is equivalent to the z score, and available from statistical tables as before;
- s is the standard deviation, which is usually guessed, based on previous experience or other research.
If you’re not very confident about this kind of thing, then the best way to deal with it is to find a friendly statistician and ask for some help. Most of them will be delighted to help you make sense of their specialty.
It is better to be imprecisely right than precisely wrong.
How bias and precision interact:
|Precision||High||Precisely wrong||Precisely right|
|Low||Imprecisely wrong||Imprecisely right|
Source: Management Research (4th Edition), Easterby-Smith, Thorpe and Jackson
Imprecisely right means that you know broadly what the correct answer is. Precisely wrong means that you think you know the answer, but you don’t. In other words, if you can only worry about one, worry about bias.
Selecting a Sample
Probability sampling is where the probability of each person or thing being part of the sample is known. Non-probability sampling is where it is not.
Probability sampling methods allow the researcher to be precise about the relationship between the sample and the population.
This means that you can be absolutely confident about whether your sample is representative or not, and you can also put a number on how certain you are about your findings (this number is called the significance, and is discussed further in our page on Statistical Analysis).
In simple random sampling, every member of the population has an equal chance of being chosen. The drawback is that the sample may not be genuinely representative. Small but important sub-sections of the population may not be included.
Researchers therefore developed an alternative method called stratified random sampling. This method divides the population into smaller homogeneous groups, called strata, and then takes a random sample from each stratum.
Proportional stratified random sampling takes the same proportion from each stratum, but again suffers from the disadvantage that rare groups will be badly represented. Non-proportional stratified sampling therefore takes a larger sample from the smaller strata, to ensure that there is a large enough sample from each stratum.
Systematic random sampling relies on having a list of the population, which should ideally be randomly ordered. The researcher then takes every nth name from the list.
There are many different methods of selecting ‘random samples’. If you are the lead researcher for a project and instructing others to ‘take a random sample’, or indeed asked to take a ‘random sample’, make sure you are all using the same method!
Cluster sampling is designed to address problems of a widespread geographical population. Random sampling from a large population is likely to lead to high costs of access. This can be overcome by dividing the population into clusters, selecting only two or three clusters, and sampling from within those. For example, if you wished to find out about the use of transport in urban areas in the UK, you could randomly select just two or three cities, and then sample fully from within these.
It is, of course, possible to combine all these in several stages, which is often done for large-scale studies.
Using non-probability sampling methods, it is not possible to say what is the probability of any particular member of the population being sampled. Although this does not make the sample ‘bad’, researchers using such samples cannot be as confident in drawing conclusions about the whole population.
Convenience sampling selects a sample on the basis of how easy it is to access. Such samples are extremely easy to organise, but there is no way to guarantee whether they are representative.
Quota sampling divides the population into categories, and then selects from within categories until a sample of the chosen size is obtained within that category. Some market research is this type, which is why researchers often ask for your age: they are checking whether you will help them meet their quotas for particular age groups.
Purposive sampling is where the researcher only approaches people who meet certain criteria, and then checks whether they meet other criteria. Again, market researchers out and about with clipboards often use this approach: for example, if they are looking to examine the shopping habits of men aged between 20 and 40, they would only approach men, and then ask their age.
Snowball sampling is where the researcher starts with one person who meets their criteria, and then uses that person to identify others. This works well when your sample has very specific criteria: for example, if you want to talk to workers with a particular set of responsibilities, you might approach one person with that set, and ask them to introduce you to others.
Non-probability sampling methods have generally been developed to address very specific problems. For example, snowball sampling deals with hard-to-find populations, and convenience sampling allows for speed and ease.
However, although some non-probability sampling methods, particularly quota and purposive sampling, ensure the sample draws from all categories in the population, samples taken using these methods may not be representative.
A Word in Conclusion
Almost all research is a compromise between the ideal and the possible.
Ideally, you would study the whole population; in practice, you don’t have time or capacity. But care in your sample selection, both size and method, will ensure that your research does not fall into the traps of either introducing bias, or lacking precision. This, in turn, will give it that vital credibility.