Internal validity refers to making sure the results of your experiment are truly a result of what you’re testing, and not caused by other factors or mistakes in your study’s design.
An experiment with high internal validity allows researchers and professionals to link cause and effect confidently.
However, maintaining high internal validity isn’t always easy. Internal validity can be threatened by a range of research design factors, including poorly calibrated instrumentation, observer bias, and selection bias.
As a result, researchers need to minimize confounding variables, maintain a constant research environment, and ensure that the participants are randomly selected or assigned to control and experimental groups (Kumar, 2010).
Threats to Internal Validity
1. Pretesting Effect
Pretesting Effect occurs when the act of preliminary testing influences the participant’s performance in subsequent testing.
An earlier test may familiarize participants with the material or procedures, influencing their later performance. This effect threatens internal validity as it complicates determining whether outcomes result from the intervention or the pretest (Christenfeld, Sloan, Carroll, & Greenland, 2004).
It’s critical to distinguish the influences of pretesting from other variables to ensure this doesn’t confound the results.
Depending on your study design, strategies like using control groups can help manage pretesting effects.
2. Pygmalion Effect
The Pygmalion Effect, named after the Greek myth of Pygmalion, describes the phenomenon where higher expectations lead to an increase in performance.
In research, this effect can manifest when the investigators’ expectations unknowingly influence the participants’ attitudes and behavior (Friedrich et al., 2015).
It compromises internal validity as the observed effects may not stem solely from the tested variables.
To mitigate this, researchers should employ ‘blind study’ designs whenever possible, masking the expected outcome to both participants and experimenters. Importantly, ensuring objective performance assessments can also reduce the risk of this threat.
3. Experimenter’s Stereotype
Experimenter’s Stereotype is a threat to internal validity that happens when a researcher’s stereotype about a particular group influences their behavior during the study.
This, in turn, can affect the study’s results (Rosenthal & Jacobson, 2003). This inadvertent behavior may lead to misinterpretation of the cause-effect relationship.
To minimize this threat, it’s advisable to use blinding techniques and consistent protocol for all study participants. Thorough staff training and debriefing can also help curtail experimenter-related biases.
4. Mortality and Attrition
Mortality or attrition threat happens when participants drop out from a longitudinal study before completion (Shaughnessy, Zechmeister, & Zechmeister, 2011).
Attrition can especially compromise internal validity, where the drop-out rate significantly varies between groups. This risks skewing results, since the remaining sample might not be representative of the initial population.
Consideration of participant comfort, appropriate incentives, and frequent follow-up can mitigate this.
Addressing it post hoc also includes conducting intent-to-treat analysis or treating dropouts as a separate comparison group.
5. Order or Sequencing Effects
Sequencing Effects occur when the order in which experimental tasks are presented influences the outcome (Aron, Coups, & Aron, 2013).
For example, participants might do better on later tasks because of learned skills or worse because of fatigue. This threatens the internal validity as the change might wrongly get attributed to the manipulated variable.
Counterbalancing, where you alter the sequence of tasks for different participants, helps mitigate order effects. It ensures that the observed effects aren’t solely because of the task order.
6. Compensatory Rivalry
Compensatory Rivalry arises when participants in a control group work harder or change their behavior upon realizing they are not in the experimental group.
This phenomenon can obscure the true effects of the intervention, thereby threatening internal validity (Cook & Campbell, 1979). Any increase in performance might be due to the control group’s competitive spirit, not the experimental manipulation.
To mitigate, researchers should disguise group allocations whenever possible. Equally distributing desirable elements across groups can also tone down perceived inequalities.
7. Compensatory Equalization
Compensatory Equalization happens when researchers, aware of the deprived condition of the control group, treat that group differently in an attempt to level the field (Cook & Campbell, 1979).
Psyche or morale, rather than the intervention itself, might then account for any progress made. It significantly threatens the cause-and-effect attribution crucial to maintaining internal validity.
To counteract this pitfall, researchers should use blind study designs whenever possible. Adhering to a strict protocol, regardless of group assignment, can help ensure consistent treatment.
8. Resentful Demoralization
This is an issue that arises when the control group is aware of their deprived status and interferes with their motivation levels.
Resentful demoralization can threaten the internal validity of the study as it alters the behavior of the control group (Yeaton & Wortman, 1993). Through this dissatisfaction, the control group might underperform, falsely inflating the differences between groups.
Mitigation strategies include maintaining blind conditions or taking steps to minimize perceived inequalities. Not discussing the treatment conditions with the participants can also limit potential demoralization.
9. John Henry Effect
The John Henry Effect describes a phenomenon where individuals in the control group work harder to compete with those in the experimental group.
This is another type of compensatory rivalry that threatens the internal validity because it alters the control group’s behavior (Peters, Langbein & Roberts, 2016). As interventions are compared against standard practice, improvements in the control group will undermine the interpretation of results.
Applying blinding methods and minimizing interaction between groups can help eliminate this effect.
10. Novelty Effect
The Novelty Effect is a temporary change in participant behavior in response to a new or novel intervention (Mellers, Hertwig, & Kahneman, 2001).
The new experience may inflate the measured effects above what they would be in a regular setting. It can mislead researchers into wrongly concluding that the changes are due to the tested variables and not the novelty.
Employing longer follow-up periods can help detect the leveling off of this initial spike. It’s also useful to make comparisons over extended periods, to ensure the results are steady and not novelty-induced.
Testing threat to internal validity occurs when a participant’s score on a posttest is influenced by having taken a pretest (Goldstein, David, & Wallis, 2014).
The problem arises when the change between tests is due to the testing experience and not the effect of an intervention. For instance, participants may better understand test requirements on subsequent takes making the learning curve superficial.
Use of control groups not subjected to pretesting and individuals’ score changes can check against this artefact. Alternatively, using different testing tools with similar constructs will similarly isolate genuine learning.
12. Instrumentation Calibration
Instrumentation threat refers to the changes in the measurement of variable over time (Lavrakas, 2008).
Changes in the measures’ definition, form, observer, or instrument calibration can lead to variances. This threat undermines our ability to attribute detected change to the intervention and not the measurement technique itself.
Strategies like staff training or applying the same measures through the study period provide protection. Ensuring consistent scoring strategies and data collection aids also minimize instrumentation threats.
13. Regression Toward the Mean
Regression Toward the Mean occurs when an unusually extreme outcome is followed by an outcome closer to the mean.
Specifically, it happens when participants with extreme scores on a pretest show less extreme scores on a posttest, not because of any experimental variable, but because of natural statistical variations (Kline, 2013).
This can impact experimental results, leading researchers to attribute changes to the intervention when they’re actually the result of statistical phenomenon. Maintaining large sample sizes, using repeated measures, and taking the possible effect of regression into account when analyzing data can help address this threat.
Maturation is a threat to internal validity that occurs when natural changes or developments within the participants over time affect the results (Cook & Campbell, 1979).
These changes can range from physical growth and aging to changes in social attitudes or personal circumstances. If these changes aren’t accounted for, researchers might erroneously attribute effects seen to the experimental intervention.
To mitigate this, multiple measurements over time can help distinguish between maturation and intervention effects. Further, the study design could incorporate a control group that undergoes the same maturation process but without the intervention.
15. Diffusion or Imitation of Treatments
Diffusion or Imitation of Treatments is a threat to internal validity that occurs when members of the control group adopt or mimic the experimental treatment (Cook & Campbell, 1979).
This can inadvertently reduce the differences between the experimental and control groups, leading to underestimation of the treatment effect. For instance, if two classrooms are part of a study on innovative teaching methods and the teachers start sharing their techniques, diffusion is occurring.
This threat can be lessened by minimizing communication between groups about the intervention. Isolating groups is ideal, if possible.
16. Experimenter Expectancy Effect
The Experimenter Expectancy Effect is a form of bias that occurs when the experimenter’s expectations about the outcome of the study influence the participants’ behavior, thereby potentially skewing the results (Allen, 2015).
For example, if investigators are aware of their research hypotheses, they may subconsciously communicate their expectations to the participants. This can lead to false-positive results where researchers think their experimental manipulation worked when it might not have.
Utilizing a double-blind structure, where both the experimental subjects and handlers are ignorant to group assignments, minimizes this bias. Standardized instructions and automated data collection can also play a role in minimizing this threat.
17. Temporal Precedence
Temporal Precedence refers to the sequence of events in causal relationships, where the cause must precede the effect.
When studying cause-effect relationships, misinterpretations may occur if the temporal order is not clear (Bickman, 2009). This threatens the validity of any claim about cause and effect relationships.
Precise experimental design ensuring that the presumed cause precedes the effect can mitigate this threat. Adequate record-keeping and time-stamped data also ensure that the timing of various events is tracked correctly.
18. Treatment Diffusion
Treatment Diffusion is a potential threat to internal validity when components of an experimental treatment unintentionally ‘diffuse’ or ‘spillover’ to the control group, distorting the outcome (Cook & Campbell, 1979).
This can occur in various ways, such as through participant communication or tester inadvertence. It might undermine the impact of the treatment, as the control group no longer provides a valid benchmark.
Methods for avoiding such diffusion include isolating groups, using blind designs, and instructing participants not to discuss with others. In addition, careful monitoring can also detect any diffusion, allowing for post hoc adjustments.
19. Testing Sensitization
Testing Sensitization occurs when the act of taking a pretest affects how participants respond to the experimental treatment or posttest.
The experience can potentially make subjects more sensitive to the treatment, leading to an overestimation of its effects. The threat lies in the possibility of wrongly attributing changes to the intervention instead of the predisposition created by the pretest.
Strategies to overcome this issue include blind designs and varied testing methods. Alternatively, the study can be designed without a pretest where practical.
20. Interpretation Bias
Interpretation bias is a threat to internal validity that arises when researchers interpret findings to fit their expectations or hypotheses (i.e confirmation bias), rather than based on what the data actually indicates.
For example, an experimenter could overemphasize elements supporting their hypothesis while downplaying contradictory evidence (Hergovich, Schott, & Burger, 2010).
Notably, this threat could lead to misleading conclusions about an intervention’s effectiveness.
To minimize interpretation bias, it’s crucial for researchers to remain objective and use statistical analyses that compare findings to chance levels. A blind analysis, where researchers don’t know the group assignments while analyzing the data, can also be a helpful safeguard.
See Also: Self-Fulfilling Prophecy
21. Selection Bias
Selection bias, often seen as a threat to external validity, also threatens internal validity. It occurs when participant selection isn’t random, causing potential differences that affect study results (Armstrong, 2010).
For instance, if groups are self-selected rather than randomly assigned, members of one group might share a characteristic influencing their response to a treatment. This leads to difficulty in isolating treatment effects from the effect of these characteristics.
To mitigate selection bias, use random assignment when possible. Utilizing matching or covariate analysis to control for any non-equivalent characteristics can also be effective.
See Also: Selection Bias Examples
22. Failure to Adhere to Protocol
Failure to Adhere to Protocol refers to a threat which occurs when a treatment isn’t executed as intended, creating deviations from the schedule or manipulation.
For example, a researcher might unintentionally deviate from the instructions for administering a treatment or test. Consequently, the experimental manipulation might not reflect the ideal implementation, causing misinterpretation of results.
Training researchers, using a detailed protocol, and cross-checking adherence can reduce this threat. Applying standardized and automated procedures can also be beneficial, if possible.
History as a threat to internal validity refers to external events that occur between measurements which may influence outcomes (Heppner, Kivlighan Jr, & Wampold, 2007).
For example, a change in societal norms or a major global event during a longitudinal study might affect participants’ responses. This threat makes it difficult to determine if observed changes are due to the intervention or these external events.
Including a control group experiencing the same historical events but don’t get the intervention may offset these effects. Additionally, monitoring and documenting potential historical influences are crucial.
24. Demand Characteristics
Demand Characteristics signify the cues in an experiment that lead participants to believe they know what the researcher is looking for, thus affecting their behavior (Orne, 2009).
This is generally seen as a sub-type of external validity, but it also affects the internal validity of the study.
Participants might change their behaviors based on what they perceive the study’s purpose to be, perhaps to align with the apparent expectations. This threat can unveil false positives, where researchers think their experimental manipulation led to the observed results when it could be participants’ response to perceived expectations.
Blind designs and ensuring clarity that there are no “right or wrong” answers can help mitigate this threat.
25. Impact of Data Collection Method
This threat to internal validity occurs when changes in data collection methods, instruments, or datar collectors create inconsistencies (Foddy, 1993).
For example, if a survey’s format changes midway through a study, or different interviewers collect data, this could introduce variance unrelated to the experiment’s manipulation. Researchers might mistakenly attribute this variance to their experimental manipulation, producing false positive results.
To guard against this, maintain consistency in data collection methods throughout the study. Regular meetings and training of data collectors are also essential to ensure consistency.
26. Social Desirability
Social Desirability is a type of response bias where participants tend to answer questions in a manner that will be viewed favorably by others.
It can take the form of over-reporting “good behavior” or under-reporting “bad,” or undesirable behavior. The threat to internal validity arises when results are influenced more by participants’ desire to give socially desirable responses than by the variables under study. This primarily affects studies that use self-reported data.
Mitigation strategies can include guaranteeing anonymity, emphasizing the importance of honesty, and using indirect questioning.
27. Confounding Variables
Confounding Variables are external variables that affect both the dependent and independent variables, hence producing an association that may mislead the researcher about the relationship between the dependent and independent variables.
This creates a threat to internal validity as it distorts the understanding of the relationships between variables. Not carefully controlling for these in an experiment could lead to the erroneous attribution of outcomes to the manipulated variable, while they are actually effects of the confounding variable.
Confounding variables are best handled through good experimental design that includes control for potential confounders, random assignment to conditions, and statistical control in the analysis phase.
Read Next: Threats to External Validity
Internal Validity vs External Validity
Many of the above threats can be threats to both internal and external validity.
Here’s the difference between the two concepts:
|Internal Validity||External Validity|
|Definition||The degree to which a study accurately shows a cause-and-effect relationship between variables.||The degree to which a study’s results can be generalized to other situations, people, or time periods.|
|Threats||Pretesting effects, maturation, experimental mortality, experimenter’s stereotype, etc.||Hawthorne effect, ecological validity, sampling bias, population validity, situation specificity, etc.|
|Focus||Accuracy and reliability of the study’s results.||Applicability and generalizability of the study’s results.|
|Importance||Vital for drawing accurate conclusions about the studied variables within the experiment.||Vital for applying the study’s results to broader contexts, situations, or populations.|
|Mitigation||Use of control groups, randomization, careful experimental design, etc.||Careful selection of participants, replication of studies, use of real-world settings, etc.|
|Examples||If a study claims a new medication causes weight loss, internal validity ensures that it is indeed the drug, not other factors like diet or exercise, causing this effect. (See more internal validity examples here)||If a study finds a teaching method improves student grades in a single class, external validity ensures this method can also work in other classes, schools, or countries. (See more external validity examples here)|
Allen, M. (2015). The experimenter expectancy effect: an inevitable component of school science?. Research in Education, 94(1), 13-29. Doi: https://doi.org/10.7227/RIE.0014
Armstrong, R. A. (2014). When to use the B onferroni correction. Ophthalmic and Physiological Optics, 34(5), 502-508. Doi: https://doi.org/10.1111/opo.12131
Aron, A., Coups, E. J., & Aron, E. N. (2013). Statistics for psychology. New York: Pearson.
Bickman, L. (2009). Research design: Donald Campbell’s legacy. New York, NY: SAGE Publications.
Christenfeld, N. J. S., Sloan, R. P., Carroll, D., & Greenland, S. (2004). Risk factors, confounding, and the illusion of statistical control. Psychosomatic Medicine, 66(6), 868-875.
Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues for field settings. Chicago: Rand McNally College Pub. Co.
Foddy, W. (1993). Constructing questions for interviews and questionnaires: Theory and practice in social research. Cambridge: Cambridge University Press.
Friedrich, A., Flunger, B., Nagengast, B., Jonkmann, K., & Trautwein, U. (2015). Pygmalion effects in the classroom: Teacher expectancy effects on students’ math achievement. Contemporary Educational Psychology, 41, 1-12.
Heppner, P.P., Kivlighan Jr, D.M., & Wampold, B.E. (2007). Research design in counseling. Cengage Learning.
Kline, R. B. (2013). Beyond significance testing: Statistics reform in the behavioral sciences. Washington, DC: American Psychological Association.
Kumar, R. (2010). Research Methodology: A step-by-step guide for beginners. Thousand Oaks, CA: SAGE Publications Inc.
Mellers, B. A., Hertwig, R., & Kahneman, D. (2001). Do frequency representations eliminate conjunction effects? An exercise in adversarial collaboration. Psychological Science, 12(4), 269-275. doi: https://doi.org/10.1111/1467-9280.00350
Orne, M. T. (2009). Demand characteristics and the concept of quasi-controls. In Artifacts in Behavioral Research (pp. 110-137). Oxford University Press. Doi: http://dx.doi.org/10.1093/acprof:oso/9780195385540.003.0005
Peters, J., Langbein, J., & Roberts, G. (2016). Policy evaluation, randomized controlled trials, and external validity—A systematic review. Economics Letters, 147, 51-54. Doi: https://doi.org/10.7227/RIE.0014
Shaughnessy, J. J., Zechmeister, E. B., & Zechmeister, J. S. (2011). Research methods in psychology. New York: McGraw-Hill.
Trochim, W. M. (2015). Research methods: The concise knowledge base. New York: Atomic Dog/Cengage Learning.
Yeaton, W. H., & Wortman, P. M. (1993). On the reliability of meta-analytic reviews: The role of intercoder agreement. Evaluation Review, 17(3), 292-309. doi: https://doi.org/10.1177/0193841X9301700303
Dr. Chris Drew is the founder of the Helpful Professor. He holds a PhD in education and has published over 20 articles in scholarly journals. He is the former editor of the Journal of Learning Development in Higher Education. [Image Descriptor: Photo of Chris]