Stratified sampling is a sampling method in scientific research that involves ensuring your sample group has fair representation of sub-groups (strata) of a population you’re studying. To do this, you ensure each sub-group of the population is proportionately represented in the sample group.
For example, if you have a population of 50,000 people in a city and you want to survey them, you probably can’t survey them all. You might just survey 100 and extrapolate the results. But you’ll want to ensure the sample group (those 100 people you survey) truly represent the city population as a whole. So, if 20% of the city are Asian, then you’d want to make sure 20% of the sample group (20 people) are also Asian.
This, in essence, is stratified sampling: ensuring each sub-group (or ‘strata’) in your sample is represented in the correct portion to the entire population.
Stratified Sampling Examples
- Ensuring students from all grades are represented in a school study: Let’s say you need a sample of 100 from 1000 students who were asked about their preferred subject. To avoid selection bias due to different grades having different subjects, the students can be grouped according to the grade, and students are chosen from each grade. This way, you won’t have an over-representation of the Grade 7 students and under-representation of the Grade 8 students, for example.
- Ensuring people of all income brackets are represented in a taxation study: To study the impact of recent tax reforms, the population can be stratified into various segments according to their income levels as the tax laws and economic incentives are different for each income level. You don’t want to just survey rich people or the results might be skewed!
- Making sure a political focus group has proportionate numbers of people who vote for each party: While surveying the opinion of people on specific policies or political decisions, it can be beneficial to stratify based on the political affiliation of the respondents. If all the people in the survey were liberal, the results would surely be skewed and not representative of the whole population!
- Grouping shampoo users based on brand, to ensure you correct for over-representation of one brand in the study: For a market survey conducted to analyze the thoughts of people on the effectiveness of anti-hailfall shampoos, the respondent population can be grouped based on the brand of the product they use.
- Stratifying a sample population according to ethnicity: An analysis of social determinants of health and their correlation with obesity prevalence among a sample of Hispanics and non-Hispanic. Based on the percentage of Hispanic households, the sampling area was separated into non-overlapping strata. – Carrie R. Howell
- Using neighborhood characteristics as proportionate strata: Research conducted to determine the travel characteristics like frequency and mode of travel of a population used the characteristics of the residential areas as different strata. – Fei Shi
- Using geography as strata: The long-term psychological impact of abuse was studied among the general population where the population was stratified according to their geographical location. –John Briere
- Internal company surveys need to include proportionate amounts of people from each department: A company might want to stratify according to various departments while surveying the relationship between income levels and job satisfaction.
- A study of fish might stratify the groups according to size: The relationship between maturity and age of fish was studied by dividing the fish according to their lengths. – M. Joanne Morgan
- Stratification based on types of consumers: Stratified Random Sampling to Estimate Water Use by Government agencies, where the strata were large industrial water consumers vs small agricultural water consumers.
1. Hypothetical Study of Students
Consider that 100 students will serve as the sample for a hypothetical study out of 1000 students who were asked about their preferred subject.
Let’s divide each grade into different strata so that the survey can produce accurate answers because students from different grades may have different subjects and/or preferences.
|Grade||Number of students (n)|
We can calculate the sample of each grade using the stratified random sampling formula:
Sample for each grade = Sample Size/Population Size*Population of each grade
- Sample for grade 6 = 100 / 1000 * 180 = 18
- Sample for grade 7 = 100 / 1000 * 210 = 21
- Sample for grade 8 = 100 / 1000 * 280 = 28
- Sample for grade 9 = 100 / 1000 * 160 = 16
- Sample for grade 10 = 100 / 1000 * 170 = 17
Now, we use this sample of 100 students for our analysis.
2. Social determinants correlation with obesity
Howell et al. conducted a community-based survey to determine the correlation between social determinants and obesity and how it differs in the Hispanic population from the non-Hispanic population.
For the analysis, researchers decided to stratify the populations based on different geographical locations according to the ratio of Hispanics to Non-Hispanics.
Dividing the population into these specific strata allowed the researchers to reduce sampling bias due to the varied number of Hispanic populations across different areas.
After dividing the population into multiple non-overlapping strata, a random sample of Census blocks was taken from each strata.
The study concluded that there was a strong correlation between the social determinants and obesity among Hispanics compared to Caucasian Population.
The study also found that the impact was proportional to the ratio of Hispanics to Non-Hispanics as different results were found for different strata (7% for strata 1, 30% for strata 2, 58% for strata 3, and 83% for strata 4).
3. Travel characteristics of a population
A survey was conducted by Fei Shi where he focused on determining the travel characteristics of people of Kunshan City in Jiangsu province, China.
The research was focused on finding a pattern of travel characteristics like travel frequency and mode of travel.
The researcher also wanted to see if there is any relationship between the travel characteristic and the characteristics of residential areas of the respondents.
For this reason, the sample was stratified based on the different characteristics of residential areas.
The findings demonstrated that stratifying resident travel investigations in line with residential area characteristics will produce samples with significantly smaller differences and lower investigation sampling rates.
4. Maturity of fish according to age
The relationship between maturity and age of commercial fish was studied in two different species (Cod and American Plaice).
Often, the proportion mature at a given age is estimated from the observed fish at that age without taking into account the size-frequency distribution from which the specimens are drawn.
However, the researchers contended that a particular age might span numerous length classes for the majority of fish species and that the likelihood of maturation at a given age is determined by the length at that age.
They, therefore, suggested a stratified sampling technique in which the evaluation of maturity at a given age is done from a stratification based on the lengths of the fish.
By allocating weight to observations based on the abundances of the size groups, the technique took the size distribution at age into account. There were noticeable changes in the findings when the results were compared to those obtained using the unweighted approach.
5. Government study of Water Usage
The Department of Agriculture and many other government departments use stratified sampling methods for their surveys.
An example of such a survey is the one done to estimate the acreage of irrigated land in different areas of the country based on water usage.
The water usage was divided into two distinct groups based on their historic records of water usage; large use for industries or power plants and smaller use for irrigation only.
The usage of substrata boundaries and the splitting of irrigation withdrawals into two subcategories significantly outperform random selection within the irrigation category.
A statewide sampling strategy using 330 random samples yields a standard error of 16% for the irrigation category while the stratified sampling approach drastically decreased the standard error to 4%.
Furthermore, it was noted that stratified random sampling has the potential to significantly reduce the workload associated with data collecting, and reducing the amount of data collected makes it possible to pay more attention to the quality of the collected data.
Strengths and Weaknesses of Stratified Sampling
- It allows more precision than simple random sampling or cluster sampling if we can ensure that the population within the group are very similar and there is a clear distinction between the groups.
- It reduces the probability of sampling bias significantly.
- It ensures a proportional representation of sub-groups in our sample irrespective of their numbers.
- A conclusion can be drawn for specific subcategories which allows comparison between the subgroups as well.
- Due to the researchers’ complete control over the stratum division, this sampling strategy covers the largest possible population.
- Researchers can better manage a sample that would otherwise be too vast to evaluate by classifying a population into strata which also shortens the time and decreases the cost needed for data gathering.
- If there are too many disparities among the population or not enough data available, it is impossible to specify strata. With hidden populations, you may need to use other methods such as snowball sampling.
- Incorrect stratum selection can provide research results that do not truly represent the population.
- This involves extra planning and information gathering that simple random sampling does not require.
- If the sample fails to precisely reflect the population as a whole, sampling errors might still happen.
The stratified sampling method can be used in large data that can be divided into homogenous sub-groups internally while the sub-groups themselves are heterogeneous. This allows the analysis to account for smaller deviations in the data and can be a better representation of the data than Random Sampling or Cluster Sampling, ultimately reducing the selection bias. Moreover, stratified sampling also allows the comparison between the sub-groups which can produce more insights from the analysis.
However, it is of paramount importance that the strata should be selected carefully as poorly thought-out selection may lead to even more problems and can render the analysis useless. Also, to avoid unequal representation of the samples from different strata, the method of sampling has to be predetermined (either proportionate or disproportionate).
Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]