Selection bias occurs when the sample being studied is not representative of the population from which the sample was drawn, leading to skewed or misleading results (Walliman, 2021).
In these situations, the sample under study deviates from a fair, random, and equitable selection process. This influences the outcomes and interpretations of a research study.
A common situation where selection bias affects results is in electoral polling. If the sample that the pollster interviews skews older than the general population, or has a disproportionate amount of men or women compared to the general population, then the data will be wrong. As a result, we might get a shock on election day!
Selection Bias Examples
1. Sampling Bias
Sampling bias occurs when a researcher selects sampling methods that aren’t representative of the entire population, thereby introducing bias in the representation (Atkinson et al., 2021).
A common example is convenience sampling, where individuals are chosen based on their proximity or accessibility, rather than considering the characteristics of the larger population.
In behavioral science, for example, reliance on undergraduate students as subjects limits the application of many theories to different age groups (Liamputtong, 2020). This is not a representation of the wider population, hence introducing sampling bias.
This form of bias affects the generalizability and external validity of the results (Busetto et al., 2020). Therefore it is crucial to balance representativeness and accessibility while designing the sample strategy (Walliman, 2021).
2. Self-selection Bias
Self-selection bias arises when participants are given the choice to participate in a study and the ones who opt in or out are systematically different from the others (Suter, 2011).
The study findings may not accurately represent the entire population as those who self-selected may have specific characteristics or behaviors influencing the research outcome (Creswell, 2013). For instance, individuals who agree to be part of a weight loss study might already be motived to lose weight.
This bias skews the resulting data and lacks inferential power to the broader population (Bryman, 2015). Hence, while presenting an opportunity for participant autonomy, self-selection bias calls for cautious interpretation of data.
3. Exclusion Bias
Exclusion bias refers to the systematic exclusion of certain individuals from the sample.
It could be due to the specific criteria defined in the study design or the involuntary or intentional exclusion of groups resulting from the recruitment strategy (Walliman, 2021). For example, a study on work productivity excluding night-shift workers will have an exclusion bias.
This form of bias threatens the internal validity of the study as it implies a differential selection of subjects into study groups (Atkinson et al., 2021). Thus, researchers should ensure their selection criteria do not create an undue bias.
4. Berkson’s bias
Berkson’s Bias, named after American statistician Joseph Berkson, is a form of selection bias seen commonly in medical research.
This bias takes place when the selection of subjects into a study is related to their exposure and outcome (Barker et al., 2016). For example, if a study is conducted in a hospital setting, it would more likely attract people who are ill and seeking treatment than healthy individuals, leading to an overrepresentation of one group.
Understanding that this bias decouples the results from generalizability, ensuring a diverse sample becomes integral (Liamputtong, 2020).
5. Undercoverage Bias
Undercoverage bias happens when some groups of the population are inadequately represented in the sample (Creswell, 2013).
Similar to exclusion bias, it can emerge because the researchers did not reach out to certain groups, or those groups could not respond due to barriers (Bryman, 2015). An example would be a telephone survey that only includes landline numbers, thereby excluding a huge sector of the population, mainly the tech-savvy younger generation.
Acknowledging and factoring in such underrepresentation ensures a more accurate result (Suter, 2011).
6. Cherry Picking
Cherry-picking is a type of selection bias, which involves selectively presenting, emphasizing, or excluding data that support a particular conclusion while neglecting significant data that may contradict it (Bryman, 2015).
It can lead to inaccurate or misleading findings because the research results have been skewed deliberately. An example could be climate change deniers who selectively focus on particular periods to argue that global warming isn’t happening or isn’t serious.
Researchers must be explicit about their selection process and should refrain from selectively highlighting or suppressing data (Walliman, 2021).
7. Survivorship Bias
Survivorship bias is a bias that can occur when the focus is solely on the subjects that “survived” or succeeded, dismissing those that failed or dropped out (Atkinson et al., 2021).
This can clearly skew the results as important factors contributing to failure or dropout might be overlooked. An example is in entrepreneurship where stories of successful founders are commonly told while ignoring the much larger number of entrepreneurs who failed.
To avoid this bias, researchers need to consider the whole spectrum of outcomes (Busetto, Wick, & Gumbinger, 2020).
Read More: Survivorship Bias Examples
8. Time Interval Bias
Time Interval Bias arises when the time between measurements or observations is inconsistent, thereby inflating or reducing the observed effects or associations in a study (Atkinson et al., 2021).
The choice of time intervals is vital, and different intervals can lead to different results. For example, tracking a group of patients’ recovery weekly might lead to a different outcome than if the analysis was done monthly.
While varying the intervals might capture more detail, it may also lead to an overestimation of some results (Suter, 2011).
Researchers need to carefully consider the most accurate and reasonable intervals to mitigate such bias (Bryman, 2015). Therefore, understanding the implication of time intervals is critical for truthful representation and valid interpretation of data (Walliman, 2021).
9. Attrition Bias
Attrition Bias, also known as dropout bias, comes into play when participants exit a study before its completion, leading to a skewness in the final results (Bryman, 2015).
This departure can be associated with certain characteristics or responses to the study, thus altering the distribution of variables within the remaining sample (Walliman, 2021). An example would be participants dropping out of a drug efficacy study due to intense side effects.
If many of these participants belonged to the group that received the new drug, the remaining participants would likely show results biased in favor of the new drug’s efficacy (Atkinson et al., 2021).
To control for attrition bias, strategies such as bolstering participant engagement and using intention-to-treat analysis should be considered (Suter, 2011). Therefore, attention to withdrawal reasons and early identification of potential dropout factors are critical aspects of research design and execution (Creswell, 2013).
10. Non-response Bias
Non-response bias arises when the characteristics of those who choose to participate in a study differ significantly from those who do not respond.
For instance, in a survey about personal health habits, individuals with poor health habits may be less likely to respond. Hence, the data would underestimate the prevalence of poor health habits in the population (Walliman, 2021).
To mitigate this bias, researchers can adopt strategies such as contacting non-responders repeatedly or offering incentives to improve response rates (Barker, Pistrang, & Elliott, 2016).
11. Volunteer Bias
Volunteer bias transpires when individuals who volunteer for a research study are fundamentally different from the ones who decline to participate (Atkinson et al., 2021).
Their eagerness to participate is often reflective of strong opinions or experiences related to the research topic. It creates a skewed representation as the volunteers may be more educated, affluent, or health-conscious than the broader population (Creswell, 2013). For instance, in a study regarding alcohol consumption patterns, non-drinkers or moderate drinkers may be less inclined to respond.
Therefore, caution must be exercised while making inferences from volunteer-based data as they have a higher propensity for experience distortion (Suter, 2011). Thus, purposive sampling strategies must be employed to ensure a balanced representation (Bryman, 2015).
12. Healthy User Effect
The healthy user effect, or a health-conscious bias, arises when participants who voluntarily engage in health behavior or treatment studies are generally healthier, more educated, and compliant than the average populace (Walliman, 2021).
This participation can cause an overestimation of the benefits of the health behavior or treatment being studied (Atkinson et al., 2021). A classic example would be a study of the impacts of a healthy diet, where individuals already conscious about their food choices, are more likely to participate (Bryman, 2015).
Such selective participation skews the outcomes towards favorable results (Liamputtong, 2020). So it’s paramount that researchers control for health consciousness in their analysis to ensure the effects being studied are indeed due to the intervention and not related to healthier behaviors (Barker et al., 2016).
13. Exposure Bias
Exposure bias operates when there are inconsistencies or errors in measuring an individual’s exposure to a certain factor or condition in a research study (Suter, 2011).
This might occur when a study measures participants’ sun exposure levels without considering their sunscreen usage, leading to an overestimation of sun exposure and its effects (Bryman, 2015).
Such flawed measurement can consequently undermine the validity and reliability of the research findings (Walliman, 2021). As a result, it’s crucial to consider and control for confounding variables that might affect exposure levels (Barker et al., 2016).
Importantly, employing consistent and objective methods of measurement helps to minimize exposure bias (Liamputtong, 2020).
14. Location Bias
Location bias is a sample distortion that emerges when the setting for data collection influences the research results, making them unrepresentative of the wider population (Atkinson et al., 2021).
If a study on physical fitness is conducted solely in a gym, the results would most likely present a fitness level higher than the general population (Suter, 2011). This location-specific data might falsely represent the overall fitness levels because a gym environment already attracts more physically active people (Creswell, 2013).
To avoid this bias, researchers should aim to diversify the settings for data collection, ensuring they are reflective of various environments where the target population might be found (Liamputtong, 2020). Therefore, an understanding of the potential influence of the study location is crucial to reduce location bias (Bryman, 2015).
15. Referral Bias
Referral bias appears in studies when the sampled population has been specifically referred from another source, creating potential unrepresentativeness (Barker et al., 2016).
This type of bias is common in healthcare research, whereby patients referred for specialized care are investigated (Walliman, 2021). Forwarding these patients into a study could misrepresent the condition’s severity as they are already pre-selected based on their need for specialized care (Creswell, 2013).
Consequently, the outcomes of such studies could overestimate disease severity or the effectiveness of specialized treatment (Atkinson et al., 2021). Thus, understanding and considering referral patterns and their implications is a crucial step in mitigating referral bias in research (Suter, 2011).
16. Pre-screening of Subjects
Pre-screening of subjects happens when researchers follow a vetting process to determine whether potential participants are suitable for a study (Walliman, 2021).
This process could inadvertently exclude certain individuals or groups, leading to a biased, non-representative sample (Atkinson et al., 2021).
An example of pre-screening bias is when a study on heart diseases excludes individuals with a history of hypertension. As a result, it could potentially understate the severity of heart conditions as it does not account for such overlapping conditions (Bryman, 2015).
Thus, careful balancing must be undertaken during pre-screening to ensure the sample reflects the wider research context whilst adhering to study-specific needs (Creswell, 2013). Importantly, the implications of pre-screening should be acknowledged in any resulting data interpretations (Liamputtong, 2020).
What’s Wrong with Selection Bias?
Selection bias can and does skew results. This is an overarching issue in both qualitative and quantitative research, as biases may emerge from the chosen selection methods, either intentionally or unintentionally (Busetto, Wick, & Gumbinger, 2020).
Diverse factors such as geography, socioeconomic status, or personal preferences can influence participant choice and thereby introduce bias.
Selection bias ultimately reduces both external and internal validity:
- External validity is compromised as the biased sample is not representative of the larger population, making it hard to generalize the findings. (See: Threats to external validity).
- Internal validity is compromised because the bias introduces additional variables, making it challenging to confirm whether the observed effect is due to the experiment itself or the bias (See: Threats to internal validity).
Overall, selection bias contravenes scientific research principles because it potentially leads to inaccurate findings and breaks the trust between the researcher and the public or scientific community.
Combatting Selection Bias: Specialized Methodologies
Addressing selection bias is vital for maintaining the integrity of research outcomes. By combining careful planning, methodological rigor, statistical expertise, and transparency, significant strides can be made in reducing this type of bias (Walliman, 2021).
Specifically, here are four techniques:
1. Stratified Sampling
Stratified sampling is a method in which the larger population is first divided into distinct, non-overlapping subgroups or “strata” based on specific characteristics or variables (Atkinson et al., 2021).
These could be attributes like age range, geographic location, or socio-economic groups. The next step is to randomly select samples from each stratum.
The benefit of stratified sampling technique is it allows to achieve a sample that is more representative of the diversity in the population (Bryman, 2015). So, instead of treating the population as homogenous, it respects the heterogeneity and reduces the risk of under-representation.
Randomization is the process of assigning individuals to groups randomly within a study. It ensures that each participant has an equal chance of being assigned to any group, thereby minimizing the risk of selection bias (Creswell, 2013).
Importantly, it also helps to distribute the features of participants evenly across groups. As the distribution is random, differences in outcome can be more confidently attributed to differing interventions rather than underlying differences in the groups.
Its key strength is that it supports causal inferences by balancing both known and unknown confounds (Suter, 2011).
3. Propensity Score Matching
Propensity score matching (PSM) is a statistical method that attempts to estimate the effect of an intervention, treatment, or policy by accounting for the covariates that predict receiving the treatment (Busetto et al., 2020).
Essentially, it matches individuals in the treated group with individuals in the control group with similar “propensity scores” or predicted probabilities of receiving the treatment.
By balancing the observed characteristics between treated and control groups in this manner, PSM helps to mimic a randomized controlled trial and minimize selection bias in non-experimental studies (Barker et al., 2016).
4. Instrumental Variable Methods
Instrumental variable (IV) methods are used in situations where random assignment is not feasible, and there’s potential for uncontrolled confounding (Atkinson et al., 2021).
An instrument is a variable that affects the treatment status but does not independently affect the outcome, except through its effect on treatment status (Walliman, 2021).
The goal of IV methods is to remove bias in the estimated treatment effects by isolating the variability in treatment that is not due to confounding (Bryman, 2015). It’s a powerful tool addressing selection bias in observational studies, but finding a valid instrument can be challenging (Liamputtong, 2020).
Overcoming selection bias requires meticulous planning, proper sample selection, and unbiased data analysis (Bryman, 2015). Responsible research and commitment to ethical guidelines will significantly reduce cases of selection bias.
To conclude, selection bias, as emphasized by Liamputtong (2020), is one of the significant forms of bias in research. Its influence can markedly distort research outcomes, and considerable efforts must be made to identify, control, and mitigate its impact on research findings.
Atkinson, P., Delamont, S., Cernat, A., Sakshaug, J., & Williams, R. A. (2021). SAGE research methods foundations. London: Sage Publications.
Barker, C., Pistrang, N., & Elliott, R. (2016). Research Methods in Clinical Psychology: An Introduction for Students and Practitioners. London: John Wiley & Sons.
Bryman, A. (2015). The SAGE Handbook of Qualitative Research. London: Sage Publications.
Busetto, L., Wick, W., & Gumbinger, C. (2020). How to use and assess qualitative research methods. Neurological Research and practice, 2, 1-10.
Creswell, J. W. (2013). Research Design: Qualitative, Quantitative and Mixed Methods Approaches. New York: Sage Publications.
Liamputtong, P. (2020). Qualitative research methods. New York: Sage.
Walliman, N. (2021). Research methods: The basics. Los Angeles: Routledge.
Suter, W. N. (2011). Introduction to educational research: A critical thinking approach. London: SAGE publications.
Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]