Concurrent validity is a type of validity measure in social sciences research. It offers a way of establishing a test’s validity by comparing it to another similar test that is known to be valid. If the two tests correlate, then the new study is believed to also be valid.
The term “concurrent” means ‘simultaneous’. Both the new test and the validated test are done concurrently, or ‘at the same time’.
The degree of concurrent validity is determined by the correlation between scores on the new test and the established test.
The stronger the correlation, the better the concurrent validity. The value of the correlation should be between 0 and 1; the closer to 1 the better.
Why Conduct a Concurrent Validity Test?
There are two reasons to conduct a concurrent validity test. The first is to ensure that your measure is measuring the construct that you think it’s measuring.
The second is to supersede the original test.
If the new test has excellent concurrent validity with an already accepted criterion test, then it may be used as a substitute.
For example, if the concurrent test is shorter, simpler, or less expensive than the criterion test, then it may be beneficial to use it instead.
Concurrent validity is a sub-type of criterion validity.
Examples of Concurrent Validity
1. Student Ratings of Self-esteem and Teacher’s Judgements
Summary: The old student self-esteem test required teacher input. The new one only requires student input, which will save the teacher’s time. To see if this new, simpler, test is valid, the researchers conduct both tests at once and see if the new test’s results correlate with the old test’s results. If so, the old test can be scrapped and the new one becomes the standard. It has concurrent validity.
A researcher wants to establish the concurrent validity of a self-esteem scale for 8th graders. In previous research, most studies ask teachers to rate the level of self-esteem of their students. This is viewed as an acceptable practice and considered a valid assessment.
However, it takes a long time for teachers to provide these ratings and many teachers are reluctant due to their incredibly busy schedules. Therefore, it may be of value to develop a test for students to take themselves.
So, the researcher spends considerable time writing questions that are age appropriate and comprehensive. Now he is ready to administer his self-esteem scale to students and respectfully ask their teachers to provide ratings as well.
When the ratings are collected, he inputs the data into SPSS and calculates the correlation between the two measures. The results show a correlation of .89. This means that his new scale has strong concurrent validity.
2. Job Simulation and Nursing Competence
Summary: The old way of assessing nursing competence was to ask the nurse’s supervisor. A new method is established involving experienced outside professionals who will observe nurses at work instead of relying on supervisor input. The two tests run concurrently, but results show that the supervisor and the outside experts come to different conclusions about the nurses’ competence. The test is found to lack concurrent validity.
Hinton et al. (2017) conducted an interesting concurrent validity study involving nurses.
First, each nurse participated in various simulated medical scenarios with manikins in a highly realistic laboratory. Their performance was observed and rated by more experienced professionals.
To assess concurrent validity, scores on the simulation were then correlated with supervisors’ ratings of the nurses’ performance on the job.
Unfortunately, scores on the simulated scenarios “…were not well correlated with self-assessment and supervisor assessment surveys” (p. 455).
This indicates that the simulated test does not have concurrent validity with supervisors’ ratings of the nurses’ job performance. Sometimes the results of a study are disappointing and fail to support the researchers’ goals.
3. Biology Test and Grades
Summary: A new biology test is created. Once the test is administered, the researchers compare the test results to the students’ current GPAs in biology classes. If the biology GPAs correlate with this test’s results, concurrent validity is established.
A researcher has developed a comprehensive test of knowledge in biology. The goal is to have an efficient way to assess students’ knowledge, which can then be used for identifying areas in the curriculum that need improvement.
So, the test is administered to all recent graduates of the biology program at the university. At the same time, the GPAs of those graduates are also obtained. By calculating the correlation between the test and GPA, concurrent validity can be assessed.
The closer the correlation is to 1, the stronger the concurrent validity. However, if the correlation is close to 0, then the test has no concurrent validity.
4. Observed Leadership and On-the-job Ratings
Summary: To establish the validity of a leadership aptitude test, a company compares test results to supervisors’ assessments of the research participants.
Because it can take several years for a company to identify which of their employees have leadership potential, the HR department is interested in finding a quicker, more efficient method.
They develop a set of experiential activities that simulate various job scenarios that involve leadership skills. A randomly selected group of employees participate in the scenarios and their leadership traits are then rated by trained observers.
The HR department then compares ratings of their performance in the experiential activities with their leadership potential as rated by their supervisor. The results reveal a correlation of .45, which indicates a moderately strong association between the two measures. Therefore, the HR department concludes that the simulated scenarios have acceptable concurrent validity with supervisors’ ratings.
5. Programming Talent
Summary: A computer programming challenge is created for job applicants. To see if it’s valid, the company gets current staff to do the challenge and compares results to supervisors’ assessments of each staff member’s performance.
A cybersecurity firm has just been awarded a very large contract that will last for years. So, the company will need to hire approximately 150 programmers.
Since interviewing hundreds of applicants is inefficient and potentially very inaccurate, they develop a series of programming challenges that mimic the kinds of tasks required in the contract.
To determine if the programming challenges will help identify good programmers, they conduct a concurrent validity assessment.
First, all of the company’s existing programmers attempt the challenges. Their performance is scored objectively and then compared with their yearly performance evaluations from supervisors.
If scores on the challenges are highly correlated with supervisors’ evaluations, then the company can save a lot of time and money by asking job applicants to take the programming challenges online. Those that pass will be contacted for an interview.
6. Neuroimaging and Anxiety
Summary: A self-administered anxiety scale is created. To see if it’s valid, the researchers scan the amygdala of each research participant. The people with high anxiety scores on the written exam should also have the most active amygdala, indicating that the test does, in fact, assess a person’s level of anxiety.
The brain is a pretty amazing structure. It’s not very large, but it sure does a lot. Research using neuroimaging has revealed that one area of the brain, the amygdala, is linked to anxiety.
For people that suffer from anxiety, sometimes the amygdala is over-reactive. It has an exaggerated response to situations that can create feelings of anxiety.
Rather than relying on time-consuming and expensive neuroimaging testing, it would be better to develop a short and simple paper and pencil measure of anxiety.
To accomplish this goal, researchers could use an existing anxiety scale, administer it to a sample of participants, and also perform a neuroimaging analysis of their amygdala.
If scores on the scale are correlated with activity levels of the amygdala, then the scale has good convergent validity. This means that in some situations, the scale can be used instead of expensive neuroimaging analysis.
7. Aging Adults Mobility Tests
Summary: A new mobility test for aging adults is constructed. To ensure it’s valid, the results on this test are compared to another more established mobility test that we know is valid.
Unfortunately, one of the drawbacks of getting older is losing mobility. It happens to all of us. However, healthcare workers need an accurate and objective method of assessing mobility. If you just ask a person to rate themselves, they may give themselves a higher rating than they deserve.
According to Weber et al. (2018), many tests that are currently in use have the fundamental flaw of being too easy. That means a lot of people score very high.
“This makes the currently available balance and mobility tests less suitable when the aim is to determine intervention eligibility aimed at preventing decline in balance and mobility at an early stage” (Weber et al., 2018, p. 2).
So, Weber and colleagues decided to develop a more challenging mobility test. They conducted a study that involved older adults taking several different mobility tests. The scores on all of those tests were then correlated with each other.
The results indicated that the Weber test was correlated with the other tests, but was also more challenging. Therefore, the Weber test has concurrent validity and is better at assessing a wider range of mobility than other tests.
8. Driving Course Performance for Bus Drivers
Summary: When a VR driving test is developed, concurrent validity needs to be established by comparing test results on the VR test to test results on the real-life driving test. If people who did both tests score the same in each test, then we have established concurrent validity.
A large city has decided that it needs to improve its hiring process to better identify drivers that will be safe and cautious. So, they hire an IT company to design a VR simulation of a challenging driving course.
The simulation looks very realistic and contains many potentially dangerous scenarios that occur throughout the city, including icy roads, pedestrians, and careless automobile drivers.
Performance on the simulation is scored automatically by the program, so the assessment is objective and standardized.
The company then requires all of its current drivers to take the simulation test. Scores on the test are then compared with the actual safety records of the drivers found in their personnel files.
In this example, data for the simulation tests are collected at the same time as data from the personnel files to assess concurrent validity. If the driving scores and actual safety records are highly correlated, then the VR test has concurrent validity.
9. The Ainsworth Strange Situations Test
Summary: A new attachment styles test is created that is easier than the Ainsworth strange situations test. To ensure it’s valid, people who did the Ainsworth strange situations test also do the new paper test. If the results correlate, then from now on we can do the easier test and not bother with the longer one.
The strange situations test is a series of 8 situations involving a parent (usually the mother) and child. By observing the child’s behavior in each situation, trained observers identify the child’s attachment style.
The test is somewhat artificial, requires extensively trained observers, and caregiver and child must travel to the testing laboratory, which can be time-consuming and inconvenient.
Wouldn’t it better if there were an easier way to assess the child’s attachment? Fortunately, Deneault et al. (2020) have been developing the Preschool Attachment Rating Scales (PARS). The PARS is a paper and pencil measure of a child’s attachment style which is much easier to use and score.
To determine the suitability of the scale’s use as a substitute assessment tool, it would need to be administered to caregivers at nearly the same time as they participated in the strange situations test. If the PARS has concurrent validity, then scores on both assessments should be highly correlated.
10. Fitness Tracker Shoe Implants
Summary: To see if a new shoe-implanted step counter works, a shoe company has research participants wear the step-counting shoes and a step-counting watch simultaneously. Afterwards, they look at the two separate results to see if they correlate.
A shoe company wants to develop a fitness tracker in their shoes. Instead of people having to strap their phone on their arm when they go for their daily jog, all they have to do is press a button on their shoe.
The shoe company hires three tech companies to develop the implants. Each company’s tracker is then implanted into different shoes. Test subjects are then recruited to run on a treadmill for 5 minutes while wearing the shoes with the implants. At the same time, an observer uses a digital counter to tally the number of strides.
Data from the trackers is then compared to the digital counter tallies; a correlation is calculated to determine concurrent validity.
Conclusion
Concurrent validity involves administering two tests at the same time. The higher the correlation between test #1 and test #2, the criterion test, the better. This means that test 1 has concurrent validity and may be used as a substitute for the criterion.
This is desired if the usual method of assessing the criterion is flawed, time-consuming, or expensive.
For example, developing a paper and pencil measure of anxiety may be a better option than performing an expensive neuroimaging analysis. Using a VR-based driving course may be more realistic and accurate than a driving course on a parking lot.
There are many ways of assessing the validity of a test, concurrent validity is just one. Ideally, over a period of time, a test will undergo many types of validity testing.
References
Ainsworth, M. D. S., & Bell, S. M. (1970). Attachment, exploration, and separation: Illustrated by the behavior of one-year-olds in a strange situation. Child Development, 41, 49-67.
Cohen, R. J., & Swerdlik, M. E. (2005). Psychological testing and assessment: An introduction to tests and measurement (6th ed.). New York: McGraw-Hill.
Deneault, A. A., Bureau, J. F., Yurkowski, K., & Moss, E. (2020). Validation of the Preschool Attachment Rating Scales with child-mother and child-father dyads. Attachment & Human Development, 22(5), 491–513. https://doi.org/10.1080/14616734.2019.1589546
Drevets, W. (2001). Neuroimaging and neuropathological studies of depression: Implications for the cognitive-emotional features of mood disorders. Current Opinion in Neurobiology, 11(2), 240-249. https://doi.org/10.1016/S0959-4388(00)00203-8
Hinton, J., Mays, M., Hagler, D., Randolph, P., Brooks, R., DeFalco, N., Kastenbaum, B., & Miller, K. (2017). Testing nursing competence: Validity and reliability of the nursing performance profile. Journal of Nursing Measurement, 25(3), 431. https://doi.org/10.1891/1061-3749.25.3.431
Weber, M., Van Ancum, J., Bergquist, R., Taraldsen, K., Gordt, K., Mikolaizak, A. S., Nerz, C., Pijnappels, M., Jonkman, N. H., Maier, A. B., Helbostad, J. L., Vereijken, B., Becker, C., & Schwenk, M. (2018). Concurrent validity and reliability of the Community Balance and Mobility scale in young-older adults. BMC Geriatrics, 18(1), 156. https://doi.org/10.1186/s12877-018-0845-9