Construct validity is a type of validity that looks at whether a test is measuring what it is supposed to measure.
The social sciences involve the study of many phenomena that cannot be directly observed, like emotional intelligence and self-esteem. You can’t put these abstract constructs on a scale and measure their weight!
Researchers must get very creative in developing ways to validly measure these abstract constructs.
To measure construct validity, social scientists use three tryps of construct validity tests, outlined below.
3 Types of Construct Validity Tests
1. Convergent Validity Testing
Self-esteem rating scales examine a construct that’s quite abstract: self-esteem. To determine the construct validity of a self-esteem rating scale, you could compare it to other established self-esteem rating scales to see if they correlate. We call this a convergent validity test.
Construct: Self-Esteem
Construct Validity Measure: Convergent validity test
Generally speaking, self-esteem is defined as whether a person likes themselves or if they feel they are a person of worth. There are many self-esteem rating scales available (Rosenberg et al., 1995; Coopersmith, 2002).
Each self-esteem rating scale defines self-esteem slightly differently. However, most of the scales are conceptually very similar.
To establish construct validity of any one of these scales would simply involve administering the scale of interest (self-esteem scale #1) with one of the others (self-esteem scale #2). The researcher then only needs to calculate a correlation between scores of the two tests.
If the correlation is close to 1, then it could be said that self-esteem scale 1 has construct validity. This type of construct validity is called convergent validity. It involves assessing the degree of similarity between two scales that measure the same construct.
Related (But Different): Content Validity Examples
2. Divergent Validity Testing
You could also determine the construct validity of a self-esteem rating scale by comparing it to a different rating scales to see if they correlate. Here, we want to see low correlation because they’re testing different things. We call this a divergent validity test.
Construct: Self-Esteem
Construct Validity Measure: Divergent validity test
To establish the construct validity of a self-esteem scale, a researcher could also set out to prove that it does not measure the same thing as a scale that measures a different construct. This type of construct validity is referred to as divergent validity.
For this example, we could compare a self-esteem rating scale to an introvert/extravert rating scale. Hopefully, these scales do not correlate.
To conduct this assessment simply requires administering both scales to the same population, and then calculating the correlation between the scores on each scale.
Ideally, there would be a very low correlation (close to 0) between the two because they are measuring two theoretically distinct constructs.
3. Exploratory Factor Analysis (EFA)
Exploratory factor analysis (EFA) is a statistical procedure for assessing the individual questions on a measurement scale. It involves comparing questions that measure a component to see if the answers to these questions all correlate, demonstrating high construct validity.
Construct: Extraversion
Construct Validity Measure: Exploratory Factor Analysis
For example, if a researcher wants to develop a measure of extraversion, they would start by identifying the theoretical components of that construct. They may be:
- Friendliness
- Agreeableness
- Sociability
- cheerfulness
Since the construct has four components, the scale should contain multiple questions that assess each one component – perhaps 8 questions per component for a total of 32 questions.
Of the 8 questions about ‘friendliness’, we should see that the respondents provide similar answers for all eight. In this case, the questions have a high degree of relatedness with each other and likely have high construct validity.
If the scale has good construct validity, then this pattern should hold for each component.
However, some questions may not be as related as they should, which means they are not measuring the same thing that the others are measuring. This weakens construct validity.
Examples of Studies that Establish Construct Validity
1. Purchase Intention and Purchase Behavior
We can determine construct validity by following-up later on to see if the answers to a questionnaire correlated with actual behavior. For example, after completing a questionnaire indicating you’re interested in movies, did you end up purchasing DVDs or going to the cinema?
Construct: Purchase Intention
Construct Validity Measure: Subsequent Consumer Behavior
A marketing department wants to be able to measure whether consumers will purchase their company’s products or not. So, they develop a questionnaire that asks questions such as: “Do you plan to buy this product?” The staff are able to generate a total of 10 similar questions.
To determine if the questionnaire has construct validity, and really does assess if consumers intend to buy a product, they conduct a simple test. First, they create an online survey that pops-up after internet users view a short video ad of a given product. They then make use of cookies to connect that user to credit card purchases over the next two weeks.
By correlating the responses to the survey with sales data, the marketing department can assess the construct validity of their survey by demonstrating that it has a high correlation with actual purchasing behavior (i.e., a similar construct).
2. New Math Aptitude Test
When implementing a new school exam, a test is conducted to compare the new exam’s test results to the old exam’s test results. If the students get similar results in each exam, then the new exam likely has strong construct validity.
Construct: Aptitude at Math
Construct Validity Measure: Convergent Validity Test
The SAT contains multiple subject tests, including a section on mathematics. It is a well-established assessment tool that has been used for decades. However, the latest version of the test is considered outdated by educators and college admissions boards.
So, the owners of the test (the College Board) decide to develop a new version of the mathematics section. After an experienced team of teachers and test developers design a new test, they assess its construct validity.
They administer the new version, the older version, and the same section from the ACT to a large sample of randomly selected university math majors.
Construct validity is determined by correlating the scores among the three tests. If the new version has construct validity, then it should show very high correlations with the other two math tests. In this example, convergent validity is used to establish construct validity.
3. Self-Esteem vs Self-Love
To demonstrate that two similar concepts are, in fact, different from one another, you can conduct two tests and explore the correlation in results. Ideally, there is a low correlation that will demonstrate divergence between the concepts.
Constructs: Self-Esteem and Self-Love
Construct Validity Measure: Divergent Validity Test
A therapist and practitioner of positive psychology has devised a new theory of self-love which they believe is at the heart of depression and anxiety. Their formulation for personal and spiritual growth involves deep self-reflection and exercises in self-affirmation.
They devise a personality questionnaire to assess a person’s degree of self-love. However, the construct of self-love has been criticized as being too similar to self-esteem. So, they decide to conduct a divergent validity study to demonstrate that the two constructs are distinct.
Both scales are distributed to a very large, randomly selected sample of university college students and working adults. The results indicate that the correlation between the two scales is around .29, which means that the two constructs are related, but not identical.
4. Dimensions of Math Skills
The construct validity of a math test that explores arithmetic, algebra, and calculus can be assess through an exploratory factor analysis. The test designers would single-out all questions related to calculus and ensure students have roughly similar results for all ‘calculus’ questions, indicating that each question does appear to be testing the same concept.
Constructs: Self-Arithmetic, Algebra, Geometry, Calculus, Statistics
Construct Validity Measure: Exploratory Factor Analysis
Math consists of basic arithmetic, algebra, geometry, calculus, and statistics and probability, just to name a few areas.
In order to develop a comprehensive math test, all types of math should be assessed. For the sake of explanation, let’s just start with the five domains mentioned above. To demonstrate that our math test has construct validity, we generate 25 questions that cover each domain.
After administering the test to a large sample of randomly selected university seniors majoring in math, we conduct an exploratory factor analysis (EFA). If our test has construct validity, then all the questions that are supposed to assess algebra skills will show a high degree of relatedness with each other.
If one or two questions on the algebra portion of the test are not related to the other questions, then that tells us there is something wrong with those items; they are measuring something else. The more the items in each domain are related with each other, the stronger the construct validity.
5. Driving Course Performance for Bus Drivers
When a VR driving test is developed, the construct validity needs to be established by comparing test results on the VR test to test results on the real-life driving test. If people who did both tests score the same in each test, then we have established construct validity.
Construct: Driving skill
Construct Validity Measure: Convergent Validity Test
A large city has decided to update their hiring procedures by developing a VR version of their road test. Previously, prospective hires were given a robust driving challenge in a large parking lot. Performance was observed by trained professionals and each applicant received a score on a scale of 0-100.
The new version of the driving course uses a VR headset and a special seat hooked-up to sensors and hydraulics to make it move and shake like a real bus.
To establish construct validity of the new VR test, the bus company uses a convergent validity methodology. So, all current bus drivers are asked to take both driving tests. Scores on each are then correlated using SPSS.
The two tests have a correlation of .79 with each other, which is quite high and says that both tests are measuring the same construct. Therefore, the VR test has construct validity.
6. Personal Integrity and Egocentrism
To ensure a test looks at personal integrity, we can compare it to a test that looks at egocentrism. The personal integrity test should have very low correlation to egocentrism to show construct validity. This is another example of the divergent validity testing method.
Constructs: Personal integrity and Egocentrism
Construct Validity Measure: Divergent Validity Test
Personal integrity can be defined as consisting of honesty, fairness, and keeping promises. Egocentrism can be defined as consisting of self-centeredness and a concern for taking advantage of situations regardless of the consequences for others.
So, it would appear that these two constructs consist of diametrically opposed characteristics. This is a perfect situation for assessing construct validity of scales using the divergent validity methodology.
After a team of researchers have generated a set of questions that assess both constructs, the two scales are given to a large sample of randomly selected corporate executives and charity workers.
A correlation is calculated between the two scales, which reveals a correlation of .05. This means that the two scales are in fact measuring two completely different constructs.
7. Emotional Intelligence Scale
To test the construct validity of an emotional intelligence (EQ) test, an exploratory factor analysis can be conducted to see if the EQ questions about social skills correlate.
Construct: Emotional Intelligence
Construct Validity Measure: Exploratory Factor Analysis
Emotional intelligence (EQ) consists of five dimensions: self-awareness, social skills, decision-making, self-regulation, and empathy. Although these concepts are fairly abstract, it is possible to develop an app that can assess these traits.
So, a small research team brainstorms for a couple of hours and generates a list of 10 questions for each dimension. They then place the survey online and send it to all of their friends and relatives. Each one of those individuals is asked to forward the survey’s link to at least three of their friends or relatives.
After receiving responses of over 200 individuals, the data are input in SPSS and an exploratory factor analysis (EFA) is performed.
The results reveal that items related to the social skills dimension are all related to each other, but less related to the other dimensions. This pattern holds true for all the dimensions. This is considered solid evidence for construct validity.
Conclusion
Construct validity is primarily concerned with whether a measurement tool is measuring what it is intended to measure. There are three main ways to assess construct validity: convergent validity, divergent validity, and exploratory factor analysis (EFA).
Assessing the validity of a measurement device is of paramount concern for psychologists and other researchers in the social sciences. Because the phenomena under study are abstract, measurement is inaccurate and difficult.
However, by using a variety of methods, researchers can develop and refine their measurement techniques and eventually achieve a reasonable level of confidence in the validity of those tools.
References
Cohen, R. J., & Swerdlik, M. E. (2005). Psychological testing and assessment: An introduction to tests and measurement (6th ed.). New York: McGraw-Hill.
Coopersmith, S. (2002). Revised Coopersmith self-esteem inventory manual. Redwood City: Mind Garden.
Cronbach, L. J., and Meehl, P. E. (1955) Construct validity in psychological tests. Psychological Bulletin, 52, 281-302.
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C., & Strahan, E. J. (1999). Evaluating the
use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272–299.
Rosenberg, M., Schooler, C., Schoenbach, C., & Rosenberg, F. (1995). Global self-esteem and specific self-esteem: Different concepts, different outcomes. American Sociological Review, 60(1), 141–156.
Simms, L. (2007). Classical and Modern Methods of Psychological Scale Construction. Social and Personality Psychology Compass, 2(1), 414 – 433. https://doi.org/10.1111/j.1751-9004.2007.00044.x