External validity refers to the level at which research findings and conclusions from a study can be applied to the world outside the context of that particular study.
It signifies the extent to which the outcomes of a study are generalizable or transferable to other settings or contexts (Kam & Meyer, 2019).
Generally, external validity is highest when the participants of a study exactly represent the population intended to be generalized.
In a clinical trial where a new medicine is tested, for example, external validity would be strong if the participants included varied age groups, both genders, and people from a wide range of economies and cultures.
Threats to External Validity
1. Hawthorne Effect
The Hawthorne Effect refers to the behavioral modification where individuals alter their performance or behavior due to the awareness of being observed (Adair, 2010).
As such, this affects external validity as participants may behave differently when observed, skewing results and jeopardizing generalizability.
In the context of a workplace efficiency study, for example, the employees knowing they’re being observed might work harder than usual.
Consequently, the study’s findings may not accurately indicate their typical productivity levels.
2. Placebo Effect
The Placebo Effect is a psychological phenomenon wherein a person experiences a perceived improvement in conditions due to their belief in the effectiveness of a treatment, even when it’s inert (Price, Finniss, & Benedetti, 2010).
This can challenge external validity, as any observed effects may be attributed to participants’ beliefs instead of the treatment itself.
For example, in a medicine efficacy experiment, if participants who receive a placebo report improvement, it’s not due to the medicine’s effect but their expectations.
This confounds the study’s results, making generalization to larger populations tricky.
3. Sampling Bias
Sampling Bias arises when the study participants aren’t representative of the larger population intended for generalization (Bethlehem, 2010).
This threat to external validity may occur due to non-random selection or self-selection of participants.
For instance, a study on childhood nutrition may lose external validity if only children from high-income families are sampled. The findings, in this case, may not be generalizable to children from other socio-economic backgrounds.
4. Observer Bias
Observer Bias occurs when researchers subconsciously project their expectations or beliefs onto the participants or interpret findings based on their predispositions, thereby skewing results.
This bias compromises external validity as it hinders the accurate and objective recording of data.
For example, in a study on student behavior, if the researcher assumes boys are more disruptive, they might pay more attention to and over report boys’ misbehavior. Consequently, the generalizability of the study’s findings is compromised.
5. Measurement Validity
Measurement Validity concerns how well a study accurately measures what it intends to measure, thus straightly relating to external validity (Gall, Gall & Borg, 2014).
If research lacks measurement validity, it’s less likely that the results will apply in other situations, contexts, or populations.
For instance, if a test measuring reading comprehension is heavily skewed towards complicated science texts, it might not accurately measure a person’s ability to understand other types of texts. Consequently, the generalizability of the findings to other genres would be doubtful.
6. Selection Bias
Selection Bias emerges when the individuals chosen for the study aren’t chosen randomly or aren’t representative of the broader population (Rothman, Greenland, & Lash, 2012).
This may jeopardize the external validity to the extent it limits the generalization of the findings.
For example, if a study on the effects of a new diabetes medication only selects relatively young, fit participants, the results may not extend to elderly or sicker individuals.
Hence, selection bias can impede the applicability of study findings to broader populations.
See Also: Examples of Selection Bias
Contamination refers to the unwanted influence of outside variables affecting the outcomes of a study (Dal Bó & Dal Bó, 2014).
It poses a risk to external validity by affecting the study conditions and subsequently its generalizability.
In a research study comparing two educational strategies, for example, contamination could occur if participants in one group discuss techniques with the other group.
Therefore, the observed effects might not solely reflect the impact of the educational strategies, limiting their generalizability.
8. Ecological Validity
Ecological Validity relates to the degree to which the findings of a study can be generalized to real-life settings or conditions (Gall, Gall & Borg, 2014).
A challenge to ecological validity results in reduced external validity as study findings may not transfer well to everyday environments.
For instance, successful negotiation techniques tested in laboratory conditions might not be effective in real-world negotiations, such as business deals or politics. Hence, it might be challenging to apply the study’s findings beyond the laboratory setting.
9. Population Validity
Population Validity concerns the degree to which a study’s results can be generalized to the broader population (Kratochwill & Levin, 2010).
A threat to population validity limits the study’s generalizability.
For example, a study on smartphone usage using a sample solely comprised of university students might not accurately portray usage patterns across broader age groups. Thus, the findings may not be extended to older adults or to teenagers.
10. Temporal Validity
Temporal Validity involves the extent to which the findings of a study hold true over time.
If it is low, the results may be time-specific, restricting the overall external validity.
For instance, a study in a city completed in the years following a dominant societal event, like an international sporting event in the city, might not yield the same results if replicated a decade later.
Hence, the generalizability of the findings may be temporally limited.
11. Situation Specificity
Situation Specificity refers to the uniqueness of a situation that might limit the generalizability of research findings (Gall, Gall, & Borg, 2014).
The more specific the situation or condition under which the study was conducted, the narrower the scope of its external validity.
An investigation into team dynamics in a military context, for example, may yield different results if applied in a corporate setting. Therefore, the findings may not extend beyond military teams, limiting their applicability.
12. Cross-Cultural Validity
Cross-Cultural Validity represents the degree to which findings of a study are applicable across various cultural groups (Matsumoto & Van de Vijver, 2011).
If cultural differences aren’t considered, the generalizability of the results across diverse cultures is compromised.
For instance, a mental health study conducted within a Western cultural framework might not maintain validity when applied to different cultural contexts. Thus, results might not be generalized beyond the original cultural context, affecting the study’s external validity.
13. Construct Underrepresentation
Construct Underrepresentation occurs when the research does not sufficiently capture the theoretical concept it purports to measure (Messick, 2011).
This compromises the external validity, as the narrow focus may not adequately represent the broader concept.
For example, in a study assessing intelligence using only math-based tests, the scope is likely too narrow. Therefore, the findings might not be generalized to broader constructs of intelligence, like verbal or social intelligence.
14. Treatment Variation
Treatment Variation refers to inconsistencies or deviations in the administration of experimental treatments.
This threatens external validity as the variations may obscure the actual effects of the intervention, affecting its applicability.
In a study testing a teaching strategy, if the approach varies significantly across different classrooms, it undermines the ability to generalize the results to other settings. Thus, inconsistencies in treatment delivery can significantly limit a study’s generalizability.
15. Lack of Replication
Lack of Replication concerns the inability to duplicate the results of a study in other settings or with different participants (Cumming, 2012).
This limitation hinders the external validity by restricting the generalizability of the findings.
For instance, if a study on diet and weight loss lacks replication across diverse participant samples, the findings might remain questionable and less applicable to a larger demographic. Hence, replication is a cornerstone of robust research and its absence can significantly limit external validity.
16. Interaction of Time and Treatment
Interaction of Time and Treatment refers to a situation where the effects of treatment vary based on the timing of its administration (Cook & Campbell, 2018).
If the effects of a treatment are highly dependent on when they are administered, the ability to generalize findings is diminished.
For example, a summer reading program might be more successful when implemented at the beginning rather than at the end of summer. Here, the interaction of time and treatment can limit the study’s ability to generalize across different timeframes.
17. Experimenter Effects
Experimenter Effects refer to the influence the experimenter’s behavior, characteristics or expectations may have on the study outcomes (Levitt & List, 2011).
These effects can distort results and threaten their generalizability.
For instance, a researcher’s expectations might subtly influence their scoring of participant responses, leading to biased outcomes. As a result, the true external validity of the study might be compromised.
18. Multiple-Treatment Interference
Multiple-Treatment Interference occurs when multiple treatments are administered in a study, which can influence the outcomes of each other.
This interferes with the external validity by complicating the isolation of effects due to individual treatments.
For instance, in a health study, if participants are exposed to both a new exercise regimen and a diet plan, it can be challenging to determine which is primarily responsible for observed health improvements.
Consequently, the generalization of the study’s findings to other settings where only one intervention is applied becomes tricky.
19. The Rosenthal Effect
The Rosenthal Effect, or experimenter expectancy effect, arises when a researcher’s preconceived expectations or biases unknowingly influence participant performance (Rosenthal, 2012).
By subtly cueing participants towards the expected outcome, the researcher can skew results, which threatens external validity.
In a study investigating a new teaching method’s efficacy, if a researcher communicates high expectations to certain students subtly, this could unfairly inflate their performance.
Thus, the actual effectiveness of the teaching method might be distorted, impeding its generalizability.
20. Nonresponse Bias
Nonresponse Bias refers to the skewing of study results due to differential responses between participants and nonparticipants (Groves, 2011).
This bias poses a serious threat to external validity, as the findings may not accurately represent the broader population.
For example, in a community health survey, if only health-conscious individuals respond, the results may overstate the overall community’s health practices. Consequently, the generalizability of the results is compromised.
Overgeneralization occurs when a researcher applies the findings of a study to a population or context far beyond the sample used (Malterud, 2012).
This threatens external validity by overestimating the applicability of results.
If a study on stress management techniques in a local company is overgeneralized to all corporate settings worldwide, the results might not be suited to companies with different cultures or structures. This practice thus restricts the generalizability of the findings.
22. Subject Effects (Volunteer Bias or Participation Bias)
Subject Effects encompass changes in participant behavior due to awareness of being studied, including volunteer bias or participation bias, which refer to potential differences between those who choose to participate in a study and those who do not (Darlington & Scott, 2015).
Such bias threatens external validity by potentially making the sample unrepresentative of the larger target population.
For instance, in a study on exercise habits, if only consistently active people volunteer, the results may not accurately reflect the broader population’s exercise behaviors.
Consequently, the prospects of generalized findings diminish.
How to Enhance External Validity
Enhancing external validity is a major objective in research (Cumming, 2012). One popular way to strengthen external validity is by random selection, which makes a study’s findings applicable to a larger population.
Some strategies include:
1. Randomized Sampling
Drawing a random sample from a larger population when doing market research ensures that the conclusions can be generalized across a wider demographic (Gall, Gall & Borg, 2014).
Randomization ensures that each participant or unit in the population has an equal probability of being selected for the study, reducing the likelihood of sampling bias (Calder, Phillips, & Tybout, 2019).
Moreover, it allows for the generalization of the results to other individuals in the population.
For example, if you are conducting a study on the impact of an intervention on reading achievement among elementary school pupils, drawing a random sample of students across various schools, grade levels, and socio-economic backgrounds will strengthen the external validity.
This process guarantees that your sample is representative of the larger population, thus enhancing the applicability of your findings (Koralov & Sinai, 2007).
2. Controlling for Threats
Another crucial way to enhance external validity is to carefully consider and control potential threats (Cumming, 2012).
For example, to mitigate the Hawthorne effect or observer bias, the research could be designed to minimize the participants’ awareness of being observed or ensure the observers are unaware of the experiment’s objectives.
To deal with treatment variation, meticulous standardization of intervention administration is essential. In the case of a study investigating the efficacy of a therapy technique, ensuring all therapists follow the same protocol can enhance external validity.
This meticulous attention to potential threats helps ensure that the outcomes are due to the variables being studied and not external influences, thereby enhancing the study’s external validity.
External Validity vs Internal Validity
External validity isn’t the only aspect to consider in evaluating a study’s quality.
Equally important is the complementary concept of internal validity, which relates to how well a study is carried out and its structural sufficiency.
Aspects like data collection methods, the use of control groups, and manipulation checks all contribute to a study’s internal validity.
To build a comprehensive understanding of the study’s quality, both internal and external validity should be considered (Gall, Gall & Borg, 2014).
|Internal Validity||External Validity|
|Definition||The degree to which a study accurately shows a cause-and-effect relationship between variables.||The degree to which a study’s results can be generalized to other situations, people, or time periods.|
|Threats||Pretesting effects, maturation, experimental mortality, experimenter’s stereotype, etc.||Hawthorne effect, ecological validity, sampling bias, population validity, situation specificity, etc.|
|Focus||Accuracy and reliability of the study’s results.||Applicability and generalizability of the study’s results.|
|Importance||Vital for drawing accurate conclusions about the studied variables within the experiment.||Vital for applying the study’s results to broader contexts, situations, or populations.|
|Mitigation||Use of control groups, randomization, careful experimental design, etc.||Careful selection of participants, replication of studies, use of real-world settings, etc.|
|Examples||If a study claims a new medicine causes weight loss, internal validity ensures that it is indeed the medicine, not other factors like diet or exercise, causing this effect.||If a study finds a teaching method improves student grades in a single class, external validity ensures this method can also work in other classes, schools, or countries.|
Read Next: Threats to Internal Validity
Adair, J. G. (2010). The Hawthorne effect: A reconsideration of the methodological artifact. Journal of Applied Psychology, 69(2), 334. Doi: https://psycnet.apa.org/doi/10.1037/0021-9010.69.2.334
Koralov, L., & Sinai, Y. G. (2007). Theory of probability and random processes. Springer Science & Business Media.
Bethlehem, J. (2010). Selection bias in web surveys. International statistical review, 78(2), 161-188.
Calder, B. J., Phillips, L. W., & Tybout, A. M. (1981). Designing Research for Application?’Journal of Consumer Research 8 (September): 197-207.,–, and. 1982. The Concept of External Validity, 240-244.
Cook, T. D., & Campbell, D. T. (2018). Quasi-experimentation: Design & Analysis Issues for Field Settings (Vol. 351). Boston: Houghton Mifflin.
Cumming, G. (2012). Understanding the New Statistics: Effect sizes, confidence intervals, and meta-analysis. London: Routledge.
Dal Bó, E., & Dal Bó, P. (2014). “Do the right thing:” The effects of moral suasion on cooperation. Journal of Public Economics, 117, 28-38. Doi: https://doi.org/10.1016/j.jpubeco.2014.05.002
Darlington, Y., & Scott, D. (2015). Understanding qualitative research and ethnomethodology. London: Sage.
Gall, M. D., Gall, J. P., & Borg, W. R. (2014). Applying educational research: How to read, do, and use research to solve problems of practice. Sydney: Pearson.
Groves, R. M. (2011). Nonresponse rates and nonresponse bias in household surveys. Public Opinion Quarterly, 70(5), 646-675. Doi: https://doi.org/10.1093/poq/nfl033
Kam, C. C. S., & Meyer, J. P. (2015). How careless responding and acquiescence response bias can influence construct dimensionality: The case of job satisfaction. Organizational research methods, 18(3), 512-541. Doi: https://doi.org/10.1177/1094428115571894
Kratochwill, T. R., & Levin, J. R. (2010). Enhancing the scientific credibility of single-case intervention research: randomization to the rescue. Psychological Methods, 15(2), 124. Doi: https://psycnet.apa.org/doi/10.1037/14376-003
Levitt, S. D., & List, J. A. (2011). Was there really a Hawthorne effect at the Hawthorne plant? An analysis of the original illumination experiments. American Economic Journal: Applied Economics, 3(1), 224-38.
Malterud, K. (2012). Systematic text condensation: a strategy for qualitative analysis. Scandinavian journal of public health, 40(8), 795-805. Doi: https://doi.org/10.1177/1403494812465030
Matsumoto, D., & Van de Vijver, F. J. (2011). Cross-cultural research methods in psychology. Cambridge: Cambridge University Press.
Messick, S. (2011). Standards of validity and the validity of standards in performance assessment. Educational Measurement: Issues and Practice, 14(4), 5-8. Doi: https://doi.org/10.1111/j.1745-3992.1995.tb00881.x
Price, D. D., Finniss, D. G., & Benedetti, F. (2010). A comprehensive review of the placebo effect: recent advances and current thought. Annu. Rev. Psychol., 59, 565-590. Doi: https://doi.org/10.1146/annurev.psych.59.113006.095941
Rosenthal, R. (2012). Covert communication in classrooms, clinics, courtrooms, and cubicles. American Psychologist, 54(11), 839.
Rothman, K. J., Greenland, S., & Lash, T. L. (2012). Modern epidemiology. Wolters Kluwer Health/Lippincott Williams & Wilkins.
Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]