February 2018: Interpreting Reading Comprehension Test Results

Collins, A. A., Lindström, E. R., & Compton, D. L. (2018). Comparing students with and without reading difficulties on reading comprehension assessments: A meta-analysis. Journal of Learning Disabilities, 51, 108–123.

Summary by Dr. Nancy Scammacca


Assessing reading comprehension is a complex undertaking, in part due to the multiple cognitive processes required to make meaning from text and in part due to the difficulty of measuring the mix of observable and unobservable processes involved. Many different assessments have been developed, but recent research has shown that the correlation between them is not as large as would be expected for measures of the same construct. As a result, researchers have explored factors that might explain why these measures are not correlated more strongly. Two such factors are the types of items used to assess reading comprehension across different assessments and key differences in the characteristics of the students being assessed. Different item formats may tap slightly different sets of cognitive abilities. Differences in reader characteristics may contribute to variation in the gap between the average scores of typical readers and those with reading difficulties on different assessments.

Collins, Lindström, and Compton (2018) synthesized research that compared reading comprehension assessment in students with reading difficulties and typical readers to determine the impact of item response format and reader characteristics on the difference in scores between students who do and do not struggle with reading.

Reading Comprehension Item Formats

Among the differences between reading comprehension assessments that are likely responsible for lower-than-expected correlations between measures, the format of items used in each test is thought to be important. Collins et al. (2018) examined the following six most common item formats:

  • Multiple choice, in which a question is followed by a set of possible answers and students must select the best answer choice. This format is often used in large-scale assessments such as annual state reading tests due to ease of scoring. It is thought that the need to evaluate each answer choice in this type of item places high cognitive demands on readers.
  • Cloze, in which students are presented with a sentence with a word missing and must select the word that correctly completes the sentence. As with multiple-choice items, cloze items are easy to score and therefore are used often in assessments involving large groups of students. These items have been found to tap into decoding and word-reading skills in addition to comprehension skills.
  • Sentence verification, in which students must determine whether a sentence accurately reflects information presented in a passage. These true/false items are found more frequently in comprehension measures designed by researchers than in those designed by classroom teachers.
  • Open ended, in which students provide oral or written responses to questions about a passage. These items tap into language skills such as vocabulary knowledge to a greater degree than some other item types. Open-ended items typically are found on standardized measures in addition to classroom tests.
  • Retell, in which students provide an oral or written account of every detail they recall from a passage. Similar to open-ended items, retell tasks reflect vocabulary and other language skills in addition to reading comprehension.
  • Picture selection, in which students must determine which picture from a set of pictures best represents the meaning of a passage. Small differences between pictures require students to recall details about the passage. This format is used in some standardized assessments.

Achievement Gaps Between Typically Developing Students and Students With Reading Difficulties

Collins et al. (2018) noted that previous research has found that the average difference in reading comprehension scores between struggling readers and typical readers varies across tests. They suggested that differences in the skills required to respond correctly to different types of items are at least partially responsible for variations in this achievement gap. Students with reading difficulties have been shown to perform more like typically developing peers on items requiring sentence-level comprehension, such as cloze tasks, than on those that require more complex cognitive processing, such as open-ended items.

In addition, other features of an assessment may contribute to variation in the size of the achievement gap. For example, some assessments have time limits (requiring faster cognitive processing). Some assessments require students to read a passage silently before responding to comprehension questions, and other assessments require oral reading. These differences may alter the comprehension task enough to change the magnitude of the gap in scores between students with and without reading difficulties. It is important to determine whether test characteristics contribute meaningfully to the difference in performance between typical and struggling readers because of the role that reading comprehension scores often play in determining whether a student has a reading disability and in evaluating the efficacy of interventions for students with reading difficulties.

Study Purpose and Methodology

The primary purpose of Collins et al.’s (2018) study was to determine whether the reading comprehension achievement gap between students with and without reading problems varies depending on the format of the items on the assessment. They also explored differences in this gap based on other test features (e.g., text genre, timed vs. untimed, standardized vs. unstandardized) and student characteristics (e.g., grade level, how students were identified as having reading difficulties).

To address their research questions, Collins et al. conducted a meta-analysis, which is a method for synthesizing the results of many studies within a topic area. Often referred to as a “study of studies,” a meta-analysis takes a systematic, comprehensive approach to combining data from a set of prior studies to determine what the research literature at large says about a particular issue. 

In conducting their meta-analysis, Collins et al. (2018) included 82 studies conducted between 1975 and 2014. All studies met the following criteria: 

  • Students were in kindergarten to grade 12.
  • Data were reported on a measure of reading comprehension separately for at least 10 students with reading difficulties and 10 average readers similar in age.
  • No more than 20% of students were English language learners.
  • The students with reading difficulties included students with dyslexia, poor word- or text-level reading skills, or low achievement in reading, and students at risk for developing a reading disability or a learning disability in reading that was not accompanied by additional disabilities such as attention deficit hyperactivity disorder or a math learning disability.
  • The comparison group did not consist of gifted, high-achieving, or significantly above- or below-average readers.
  • Data for both groups were reported at the same time-point (for intervention studies, the time-point had to be prior to start of the intervention).
  • Reading comprehension was measured in English using only one of the six item formats listed above for at least 85% of items.

Across the 82 studies that met these criteria, results from more than 5,000 students with reading difficulties were compared to nearly 6,500 typically developing students.

Collins et al. (2018) quantified the size of the achievement gap between typical and struggling readers in each study using the Hedges’ g effect size statistic. Hedges’ g represents the number of standard deviation units that separate the two groups. It provides a common metric that can be averaged across the set of studies in the meta-analysis to determine whether factors such as item format, test features, and student characteristics are associated with larger or smaller gaps between students with reading difficulties and typically developing students. This average gives more weight to studies that involved larger numbers of students than those with smaller numbers of students because larger studies typically produce more precise results. In reporting the average effect size in a meta-analysis, the 95% confidence interval is also reported to give the range of values that likely includes the true effect size.

Key Findings by Item Response Format

The results of the meta-analysis indicated that students with reading difficulties scored lower on average than typically developing students on all reading comprehension item types (meaning that the effect sizes were negative). The difference was statistically significant for all item formats except for sentence verification. The average number of standard deviations separating the two groups of students differed by item response format. The authors found the following average effect sizes by item format (the 95% confidence intervals are listed in parentheses):

  • Multiple choice: –1.55 (–1.81, –1.30)
  • Cloze: –1.26 (–1.56, –0.95)
  • Sentence verification: –1.09 (–2.37, –0.19)
  • Open ended: –1.50 (–1.87, –1.14)
  • Retell: –0.60 (–0.74, –0.46)
  • Picture selection: –1.80 (–2.61, –1.00)

For two item types, other characteristics of the assessment were associated with differences in the magnitude of the effect size for the difference between struggling and typical readers. Multiple-choice assessments with time limits showed significantly smaller differences between groups, and tests in which the passage was removed before the students answered the questions resulted in significantly larger differences. For open-ended items, reading comprehension tests showed significantly larger gaps between groups if the tests used only expository texts, were administered in a group setting, and were administered by researchers. However, tests with open-ended items were associated with significantly smaller gaps when the tests included items that gradually increased in difficulty and used basal and ceiling rules for administration and scoring. No student characteristics were associated with significant differences in the magnitude of the effect size for the difference between struggling and typical readers (including grade level, how students had been identified as having reading difficulties, and whether they had been identified as having learning disabilities or reading disabilities). 

Implications and Recommendations for Research and Practice

Results of the meta-analysis shed additional light on the nature of the reading comprehension deficits seen in students with reading difficulties. In particular, these students performed closer to average on retell, sentence-verification, and cloze assessments and further below average on open-ended, multiple-choice, and picture-selection assessments. According to Collins et al. (2018), this finding indicates that reading comprehension item types that tap into skills such as decoding and sentence-level comprehension seem to be less difficult for struggling readers than those requiring higher-level cognitive processing. Further, the authors indicated that their results show that students with reading difficulties have specific deficits in constructing complete and accurate mental models of the meaning of a text, which is reflected in poorer performance on certain item types.

Protocols for identifying students who have reading difficulties must account for the finding that the size of the achievement gap between typical and struggling readers depends on the format of the items used to assess reading comprehension. Given the variation found across different assessment formats, using multiple measures of reading comprehension merits consideration. Assessing a student for a reading disability with items that measure sentence-level comprehension (i.e., cloze and sentence verification) and items that measure the ability to form more complex mental representations of text, such as open-ended items, would inform educators about the nature and extent of the student’s reading difficulty. Additionally, doing so would help educators make the best determination about the student’s need for special education services.

The findings of this meta-analysis support those of other researchers who reported lower-than-expected correlations between different measures of reading comprehension. As a result, it appears that different item formats measure either somewhat different reading constructs or different aspects of the construct of reading comprehension. Therefore, those who develop reading comprehension assessments (whether for classroom, state accountability, or research purposes) should consider the impact of item format and make careful choices about the type or types of items to include. Further research is needed to gain additional insight into the constructs measured by existing reading comprehension assessments. Such research should involve both students with reading difficulties and typical readers to shed additional light on the differences between these groups in their performance on reading comprehension assessments.