Internal consistency reliability is a way to determine if all the questions on a survey, test, or personality scale are measuring the same thing.
For example, if designing a test on geometry, then all questions on the test should be about geometry.
There are two main ways to measure internal consistency. The first is by using Cronbach’s alpha. The second is by using split-half reliability testing. Examples of each are shown below.
How to Measure Internal Consistency
1. Cronbach’s Alpha
The most popular method of assessing internal consistency reliability is called Cronbach’s alpha (Cronbach, 1951).
Yes, the procedure was created a long time ago. It is professional practice to cite the person that came up with something first, rather than be concerned with appearing “up-do-date,” so we give credit where credit is due.
If a scale has good internal consistency, then the value of Cronbach’s alpha (α) will be close to 1. The lower the value of α, the lower the internal consistency.
Cronbach alpha Examples
- Assessing how related to each other all the questions are on a customer satisfaction survey that is specifically about quality of the product
- Calculating the Cronbach alpha score for a 20-item scale that measures empathy as a unidimensional construct
- Administering a questionnaire that assesses “overall marital satisfaction” that does not take into account the many different aspects of marriage and calculating the Cronbach alpha
- Assessing only one attachment style of young children in primary school based on the ratings of their teacher
- Defining intelligence as a single construct that involves the ability to solve practical problems and then correlating how well each question is correlated with the others
- Calculating the Cronbach alpha score for each dimension of an emotional intelligence personality test
2. Split-Half Reliability
Another common method of determining internal consistency is by calculating the split-half reliability. To do this, the researcher correlates all of the odd-numbered questions with the even-numbered questions.
If all of the questions on the scale are measuring the same thing, then two halves of the scale should be highly correlated (r) with each other. Again, the closer the value of r = 1, the better the internal consistency.
Split-Half Reliability Examples
- Determining the internal consistency of a construct that consists of only one dimension by correlating the even-numbered items with the odd-numbered items
- Determining the internal consistency of a 10-item measure of shyness as measured by teacher’s ratings and then randomly selecting half of the questions to correlate with the other half of the questions
- Designing a questionnaire about charismatic leadership that contains 10 questions about the ability to inspire and motivate others and then correlating the total scores of the odd and even-numbered items with each other
- Narrowly defining social support as only regarding perceived trustworthiness of friends and then correlating one half of the questions with the other half
Internal Consistency Reliability Examples in Real-Life Studies
1. The Social Desirability Scale
Social desirability refers to a person’s tendency to try to create a favorable impression. The most frequently used scale of social desirability was developed by Crowne and Marlow (1960).
However, as Stöber (2001) pointed out, “After 40 years, however, it is questionable if all the items of the Marlowe-Crowne Scale are still up to date” (p. 3). For example, the Marlowe-Crown Scale contains questions such as: “I am always courteous, even to people who are disagreeable.”
Therefore, Stöber created a modern version of this scale, called the Social Desirability Scale (SDS-17). Sample questions include: “I always eat a healthy diet,” and “I take out my bad mood on others now and then.”
Both scales were administered to a sample of university students. Convergent validity was then assessed by calculating a correlation between scores on the two measures. As stated by Stöber (2001):
“A correlation of .74 with the Marlowe-Crowne Scale demonstrated substantial convergent validity” (p. 3).
2. The Servant Leadership Style and Concern for Others
A servant leadership style can be defined as when a leader puts the needs of his/her staff above their own. This can mean sacrificing one’s personal time to help employees or going the extra mile to help your team fulfill their career dreams.
As stated by Greenleaf (1997), the servant leadership style emphasizes “increased service to others; a holistic approach to work; promoting a sense of community; and the sharing of power in decision making” (p. 4).
This leadership style has several components that are similar to a scale called Concern for Others (Welburn, 2015). This scale includes items such as: “I care about what happens to the people around me,” and “I am a compassionate person.”
To assess the convergent validity of a newly developed measure of the servant leadership style, we would simply administer both scales to a large sample of participants. The closer the correlation between the two scales is to 1, the stronger the convergent validity.
3. Emotional Intelligence (EQ) Measures
Emotional intelligence refers to a person ability to perceive, understand, and cope with the emotions of others and themselves. This personality construct has been widely researched in the area of leadership styles and effectiveness.
Brackett & Mayer (2003) conducted a convergent validity study on several measures of EQ: the Mayer-Salovey-Caruso-Emotional Intelligence Test (MSCEIT), and two self-report measures of EI; the Emotional Quotient Inventory (EQ-i) and the self-report EI test (SREIT).
The scales were administered to approximately 200 U.S. undergraduate college students. Correlations were conducted on all the tests, assessing the degree of convergent validity between each test and the other two.
The results showed that “…the MSCEIT was most distinct among EI measures (r = .21, .18, with the EQ-i and SREIT, respectively). The SREIT and EQ-i, however, were moderately interrelated (r=.43)” (p.1153).
Unfortunately, these somewhat low correlations mean that these scales have low convergent validity with one another.
4. Adult Attachment and Close Relationships
Ever since the work of Bowlby and Ainsworth’s independent lines of research on infant attachment, researchers have been interested in how these early bonding experiences manifest in adulthood.
Adult attachment style has been studied primarily through the use of scores on personality scales and questions about behavior and perceptions of adult romantic relationships.
Two measures related to this area of study are the Social Support Scale (SSS) and the Experiences in Close Relationships Scale (ECR).
Questions on the SSS include (Zimet, et al. 1988): “My friends really try to help me,” and “I can talk about my problems with my family.” Questions on the ECR include (Wei, et al., 2007): “I don’t mind asking romantic partners for comfort, advice, or help,” and “I tell my partner just about everything.”
Frias, et al. (2015) found that the two scales have convergent validity with one another, which means that there is a strong correlation between scores on one measure and scores on the other.
5. Confirmatory Factor Analysis (CFA)
Confirmatory factor analysis (CFA) is a way to assess both convergent and discriminant validity of various scales. After administering all of the scales of interest to a sample, the data is put into a statistical program such as SPSS. By clicking on various options, the program will then produce an output that shows how all of the questions on the scales relate to one another.
For example, if a researcher wants to assess the convergent validity of a math IQ test they have developed with the math subscales of other, more established IQ tests, then the CFA is the perfect option.
First, a large group of students with varied levels of math skills takes both tests. Then, each student’s score on each item on both tests are put into the program.
The CFA will display a number for each question that is similar to a correlation. Scores on each test should be highly correlated with each other. This means that the questions on both tests are measuring very similar constructs, in this case, very similar math skills.
If there are some questions on the new test that do not correlate well with other items on the established test, or other items on the new test, then it means there may be something odd about that particular item. The fewer items that show a pattern like that, the better the internal consistency.
Internal consistency refers to the degree to which all items on a scale are measuring the same construct. There are two primary ways of assessing internal consistency: Cronbach’s alpha and split-half reliability.
Both methods will result in a statistical value that typically ranges from -1 to +1. The closer to +1, the greater the internal consistency.
If there are some questions on a scale that are measuring something different from all of the other items, then internal consistency will be lowered. The more dissimilar questions, the lower the consistency.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin, 56(2), 81.
Cohen, R. J., & Swerdlik, M. E. (2005). Psychological testing and assessment: An introduction to tests and measurement (6th ed.). New York: McGraw-Hill.
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334.
Cronbach, L. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educational and Psychological Measurement, 64(3), 391-418.
Brackett, M. A., & Mayer, J. D. (2003). Convergent, Discriminant, and Incremental Validity of Competing Measures of Emotional Intelligence. Personality and Social Psychology Bulletin, 29(9), 1147–1158. https://doi.org/10.1177/0146167203254596
Crowne, D. P., & Marlowe, D. (1960). A new scale of social desirability independent of psychopathology. Journal of Consulting Psychology, 24, 349-354.
Frías, T. M., Shaver, P. R., & Mikulincer, M. (2015). Chapter 15- Measures of Adult Attachment and Related Constructs. Gregory J. Boyle, Donald H. Saklofske, Gerald Matthews, (Eds). Measures of Personality and Social Psychological Constructs. Academic Press, 417-447. https://doi.org/10.1016/B978-0-12-386915-9.00015-2
Greenleaf, R. K. (1977). Servant Leadership: A Journey into the Nature of Legitimate Power and Greatness. New York, NY: Paulist Press.
Stöber, J. (2001). The Social Desirability Scale-17 (SDS-17): Convergent validity, discriminant validity, and relationship with age. European Journal of Psychological Assessment, 17(3), 222.
Wei, M., Russell, D., Mallinckrodt, B., & Vogel, D. (2007). The Experiences in Close Relationship Scale (ECR)-Short Form: Reliability, validity, and factor structure. Journal of Personality Assessment, 88, 187-204. https://doi.org/10.1080/00223890701268041
Welburn, Ken. (2015). Welburn Empathic Concern Scale. https://doi.org/10.13140/RG.2.1.2790.1521
Zimet, G. D., Dahlem, N. W., Zimet, S. G., & Farley, G.K. (1988). The Multidimensional Scale of Perceived Social Support. Journal of Personality Assessment, 52, 30-41.