Content validity is a term used to describe whether a study fully examines the construct it is designed to measure. It is important that a test is thorough and covers the entire domain of the construct.
For example, if a researcher wants to study emotional intelligence, and there are five dimensions of emotional intelligence, then the scale should have multiple questions that assess each dimension.
If it only explores one aspect of emotional intelligence and ignores the other four, then it would not have content validity.
Difference Between Content and Construct Validity
Sometimes there can be some confusion between content and construct validity.
The difference between the two concepts is subtle:
- Construct validity is concerned with whether the test measures what it is supposed to measure
- Content validity is more focused on whether the test items cover the entire conceptual domain of what is being measured.
Examples of Content Validity
1. Final Exam in a College Course
Suppose you are taking a course on European history. The textbook consists of 15 chapters.
The professor informs the students that the final exam will be comprehensive, meaning that it will cover all of the material from the beginning to end of the course.
As the final exam date approaches, you start re-reading the book and studying your lecture notes. Of course, you devote considerable time to preparing for the final and when the exam date arrives, you feel fully prepared.
To your surprise however, the exam only covers the odd-numbered chapters. There are no questions on chapters 2, 4, 6, etc.
In this example, the final exam obviously lacks content validity.
2. Driver’s License Test
Driver’s tests often only test you in good conditions. It doesn’t assess whether you can drive in rain, snow, or busy streets. Most driver’s tests therefore lack content validity.
Driving is more dangerous than most people realize. Getting behind the wheel of a 2,000-pound mass of steel traveling 50 mph can lead to serious injury if the driver is inattentive, lacks adequate hand-eye coordination, or gets excited too easily.
However, the typical state driver’s license test will include driving a simple, stress-free route around a parking lot, maybe a try at parallel parking, and an exam assessing knowledge of basic traffic rules and regulations.
That assessment procedure hardly covers the entire domain of skills it takes to actually drive in the real world. Therefore, the state driver’s license exam may be one of the most important tests in a person’s life that has the least content validity.
3. Job Interview Questions
Often, we study everything there is to study about a job before the interview, only to get to the interview and be asked very few questions, and certainly not enough to assess how good you will be at the job!
We all know that one question employers love to ask is during a job interview is, “where do you see yourself in 5 years?”
There are a lot of other questions too, such as: what are your strengths and weakness, how do you handle stress, and what would your friends say about you? You may even be given a likert scale of your strengths and weaknesses which clumsily attempts to operationalize abstract ideas.
Although those questions might give an employer some information about your verbal skills and ability to create a favorable impression, none of them may actually relate to the specific job requirements.
Not only that, but scoring is completely subjective and probably more a function of the employer’s mood at that particular moment than the applicant’s responses.
A more accurate assessment, based on content validity, would involve designing job simulation tasks that mirror those duties actually performed on the job.
Each applicant would go through the simulation and their performance would be observed and evaluated by a panel of subject matter experts (SMEs).
4. The NFL Combine
The NFL combine is a test of how prepared college footballers would be for the NFL. Unfortunately, it’s not very accurate because the tests aren’t comprehensive enough.
Every year, college athletes participate in the NFL combine in the hopes of making it into the pros (and substantially higher tax brackets). Head coaches, managers, and scouts from every team travel from around the country to attend.
The event involves athletes participating in several drills and physical tests that are supposed to predict future performance on the field. A sample of some of these assessments include the 40-yard dash, bench pressing 220 lbs., the vertical jump, and something called the three-cone drill.
Unfortunately, the content of these tests is a far-cry from being representative of job requirements. For example, offensive and defensive lineman will probably go an entire season and never need to run 40 yards at full sprint.
The research on the combine also reveals that performance at this event has little similarity with job demands on gameday (Kuzmits & Adams, 2008). Click here for a brief explanation.
5. IQ Tests
Perhaps one of the best examples of a test that lacks content validity is the standard IQ test. A 1-hour test can’t even scratch the surface of identifying how intelligent a person is.
Although there are numerous versions, such as the Wechsler Adult Intelligence Scale for Adults and the Stanford-Binet Intelligence Scales, there is considerable debate as to whether the questions on the test actually represent intelligence in all of its many forms.
There are many different aspects of intelligence, different types of intelligence, and different ways in which intelligence can be expressed. It’s hard to imagine that a paper and pencil test could possibly be comprehensive enough to cover the entire content domain.
For this reason, the content validity of standardized tests is always open for debate.
6. Head Chef
To test if someone is ready to be head chef, they need to both be skilled in cooking and leadership. Testing just one of these and not the other would lead to content validity failure.
With a job as skills-oriented as being the Head Chef of an exclusive restaurant, one would think that the content of the hiring process would consist of a demonstration of practical skills. In fact, this is often the case.
An applicant may be invited to the kitchen and asked to prepare a variety of dishes, including entrees and desserts. There may be a paper and pencil test as well that assesses knowledge of various health codes and regulations, as well as issues regarding business aspects of running a kitchen such as handling the budget.
In this case, in order for the assessment to have content validity, is should contain two components, one practical and one knowledge-based.
7. Hiring an Economics Teacher
Teaching has two branches: knowledge and teaching. You can be highly knowledgeable and a terrible teacher. To hire the teacher, you’d need to be able to test both aspects.
Economics can be a difficult subject to teach. There are a lot of technical concepts that can involve sophisticated statistical formulas.
In this example, a school is going to hire a new economics teacher. They develop a comprehensive test that covers both micro- and macroeconomic principles. The school wants to make sure their teacher knows their stuff.
The next step of the hiring process involves interviewing each applicant. The applicants are asked a wide range of questions regarding economics and members of the hiring panel rate each applicant’s overall performance.
The hiring panel then makes a decision and hires one of the applicants. Although the person the school hired is very knowledgeable of economics, as it turns out, they are not very good at teaching it to others.
The content validity of the hiring process in this example did not fully cover all the domains of teaching economics, such as teaching.
8. Panel of Experts
One way of determining if a test has content validity or not is by having it examined by a panel of experts.
The panel can consist of anywhere from 2 to over 20 experts in the domain of interest. Each member of the panel is given a copy of the test or questionnaire and asked to rate each item in terms of whether it represents the domain of study or not.
Those ratings are then examined by the developers of the test. Individual test items that given low ratings by the panel are then omitted from the final version of the test. Eventually, after all of the bad items have been removed, the test will have content validity.
9. Lawshe’s Content Validity Ratio
The content validity ratio (CVR) is a statistical method for gauging the degree of agreement among subject matter experts (SMEs) regarding the suitability of individual items of a test or personality scale.
After given a full copy of the measurement tool, Lawshe (1975, p. 567) stated that each SME should respond to the question: “Is the skill or knowledge measured by this item ‘essential,’ ‘useful, but not essential,’ or ‘not necessary’ to the performance of the job?”
The data are put into a statistical program such as SPSS and the CVR is calculated. The CVR value will range from -1 (complete disagreement) to +1 (complete agreement). The closer the CVR value is to +1, the stronger the content validity.
10. Getting Married too Early
The problem with finding a romantic partner is that we never have enough information to tell if they’re really “the one” until we’ve invested a lot of time and effort into it. This is why many couples date for a long period of time before getting married!
When in the dating world, a lot of singles are solely interested in one thing: finding their soulmate. It can be an arduous journey. A person may have to go on a lot of dates, for years, before they find the man/woman of their dreams.
Because no one really wants to waste a lot of time, a lot of people in the dating pool have developed their own assessment criteria for determining if their date is Mr./Mrs. Right. This can include carefully observing how their date treats the wait staff in a restaurant, grooming habits, or estimating lifetime earnings potential.
Unfortunately, most of these criteria fail to capture the true essence of marriage. The attributes needed for a successful marriage are not going to be assessed in a 90-minute dinner date.
Perhaps if people in the dating-game were to take a course on content validity, they could develop assessment procedures more applicable to marriage.
Conclusion
Content validity refers to whether the items on a test or personality scale assess the entire subject domain of interest. A good test will have questions that cover all aspects of the construct being measured.
We can see many examples of content validity in the real world. Driver’s license tests usually lack content validity because the driving portion takes place in a parking lot and fails to mimic real driving conditions.
The NFL combine may be one of the most famous examples of a testing process with lots of individual challenges that have virtually no semblance to the requirements of the job on gameday.
The same can be said of employment interviews. Questions asked during the hiring process often fail to represent aspects of the job.
Although the concept of content validity is fairly simple, obtaining it in real-life situations can be quite challenging.
References
Cronbach, L. J. (1970). Essentials of Psychological Testing. New York: Harper & Row.
Haynes, S. N., Richard, D. C. S., and Kubany, E. S. (1995). Content validity in psychological assessment: a functional approach to concepts and methods. Psychological Assessment. 7, 238–247.
Hinkin, T. R., & Tracey, J. B. (1999). An analysis of variance approach to content validation. Organizational Research Methods, 2(2), 175–186. https://doi.org/10.1177/109442819922004
Johnston, M., Dixon, D., Hart, J., Glidewell, L., Schröder, C., & Pollard, B. (2014). Discriminant content validity: A quantitative methodology for assessing content of theory‐based measures, with illustrative applications. British Journal of Health Psychology, 19(2), 240-257. https://doi.org/10.1111/bjhp.12095
Koller, I., Levenson, M. R., & Glück, J. (2017). What do you think you are measuring? A mixed-methods procedure for assessing the content validity of test items and theory-based scaling. Frontiers in Psychology, 8, 126. https://doi.org/10.3389/fpsyg.2017.00126
Kuzmits, F. E., & Adams, A. J. (2008). The NFL combine: does it predict performance in the National Football League? Journal of strength and conditioning research, 22(6), 1721–1727. https://doi.org/10.1519/JSC.0b013e318185f09d
Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel Psychology, 28, 563-575.
Mason, J., Classen, S., Wersal, J., & Sisiopiku, V. P. (2020). Establishing Face and Content Validity of a Survey to Assess Users’ Perceptions of Automated Vehicles. Transportation Research Record: Journal of the Transportation Research Board, 2674(9), https://doi.org/10.1177/0361198120930225
Mussio, S. J. & Smith, M. K. (1973). Content validity: A procedural manual. International Personnel Association, Chicago.
Polit, D. F., Beck, C. T., & Owen, S. V. (2007). Is the CVI an acceptable indicator of content validity? Appraisal and recommendations. Research in Nursing & Health, 30(4), 459-467. https://doi.org/10.1002/nur.20199
Yao, G., Wu, C. H., & Yang, C. T. (2008). Examining the content validity of the WHOQOL-BREF from respondents’ perspective by quantitative methods. Social Indicators Research, 85(3), 483-498.