May 2017: Interpreting Reading Comprehension Test Results

Hua, A. N., & Keenan, J. M. (2017). Interpreting Reading Comprehension Test Results: Quantile Regression Shows that Explanatory Factors Can Vary with Performance Level. Scientific Studies of Reading, 21(3), 225-238. DOI: 10.1080/10888438.2017.1280675

Summary by Dr. Philip Capin


A growing number of studies have emerged showing reading comprehension tests vary in the underlying skills they assess. Previous studies show decoding and linguistic comprehension—the components of reading identified in the simple view of reading (Gough & Tunmer, 1986)—account for substantial variance in reading comprehension (e.g., Catts, Adlof, & Weismer, 2006). However, the relative contribution of these components varies significantly based on the reading comprehension test used (Collins, 2015; Cutting & Scarborough, 2006; Francis, Fletcher, Catts, & Tomblin, 2005; Keenan, Betjemann, & Olson, 2008). In other words, different reading comprehension tests measure different skills. Previous study findings show reading comprehension tests also vary in whom they identify as the highest and lowest performers (Keenan & Meenan, 2014). For instance, Keenan and Meenan (2014) demonstrated that a student with poor word reading skills but strong listening comprehension skills will appear to be a poor comprehender on tests that use a cloze response format because word recognition skills more strongly predict performance on this item type (Francis, Fletcher, Catts, & Tomblin, 2005). Conversely, the same student will appear to be a good comprehender when administered a multiple-choice test, which provides students with greater context to facilitate word recognition. Considering students’ academic services and identification decisions are impacted by their results on reading comprehension tests, it is important that policymakers and practitioners understand how the specific reading comprehension tests used may influence the conclusions drawn.

Examining How the Component Skills of Reading Vary Based on Students’ Performance Level

Previous studies examining differences in the underlying skills reading comprehension tests estimated a single, uniform relation between the component skills of reading comprehension and reading comprehension using a methodological technique called ordinary least squares (OLS) regression. Results from these analyses do not allow for examination into how differences between tests may vary as a function of students’ reading comprehension performance level. In the study under review, Hua and Keenan (2017) used an approach called quantile regression that allows for the evaluation of whether the relations between word recognition and listening comprehension to reading comprehension differ for students at different skill levels.

Study Purpose and Methodology

Hua and Keenan sought to extend the existing literature on reading comprehension assessments by investigating whether the relative importance of listening comprehension and word reading skills varied based on students’ reading comprehension performance level. The authors used quantile regression analyses to examine performance-level differences across five reading comprehension tests: the Woodcock-Johnson Passage Comprehension test (WJPC-3; Woodcock, McGrew, & Mather, 2001), the Peabody Individual Achievement Test Reading Comprehension subtest (PIAT; Dunn & Markwardt, 1970), the Gray Oral Reading Test (GORT-3; Wiederholt & Bryant, 1992), and the Retellings and Comprehension Questions from the Qualitative Reading Inventory (QRI-3; Leslie & Caldwell, 2001). The reasons for using multiple reading comprehension tests were two-fold: (1) Multiple reading comprehension tests allowed for a more rigorous test of the presence of performance-level differences and (2) Hua and Keenan sought to understand if the features of these tests (e.g., response format and length of passage) were related to performance-level differences in the contributions of the component skills. Hua and Keenan also examined whether the patterns present in the OLS and quantile regression results for the full sample were the same for younger and older students. To address their research questions, Hua and Keenan analyzed a sample of 834 students native English speakers who ranged in age from 8 to 18 (M =11.51, SD = 2.54).

Study Findings

  1. The OLS regression results replicated the between-test differences previously reported for these same reading comprehension tests (Keenan et al., 2008). This finding provides further evidence that these reading comprehension tests are not equivalent in the component skills they measure when examining average effects for students across performance levels.
  2. Quantile regression results showed there were performance-level differences in the relative contributions of word recognition and listening comprehension to reading comprehension for three of the five reading comprehension tests (GORT, QRI–Questions, QRI–Retells). For example, results from the GORT showed word recognition was more predictive of reading comprehension for students in the 10th percentile (.07) than for students in the 50th and 90th percentile (.30 and .29, respectively). These results suggest previous OLS regression results about the relative contributions of component skills to reading comprehension may not hold constant for readers of varying skill levels.
  3. Results for younger and older students revealed the same pattern of findings that were present for the full sample (i.e., between-test and performance-level differences). Additionally, results with younger and older students aligned with previous findings showing that as children become older, listening comprehension plays a larger role than word recognition in explaining reading comprehension.

Interpretation of Study Findings

Hua and Keenan provided an explanation for why significant differences between performance-levels were present on some reading comprehension tests but not others. First, the authors addressed differences in the role of word recognition. In light of past research showing the role of word recognition diminishes over time as students become more proficient readers, one may expect the role of word recognition on reading comprehension to be lower for students who demonstrate higher levels of reading performance. However, the authors found there were no performance-level differences on the PIAT and WJPC. The authors reasoned that the PIAT and the WJPC did not show differences because these tests include passages that become increasingly difficult to read (e.g., more multisyllabic words, complex vocabulary) as students progress through the test. Thus, the contribution of word recognition remains approximately constant across performance levels. Conversely, the texts presented in the QRI subtests do not increase in difficulty. The authors hypothesized that word recognition made less of a contribution to reading comprehension among higher performing students because these students were better able to read the passages. Although the GORT involves an increasing amount of unfamiliar words in higher-level passages, Hua and Keenan argued that performance-level differences between lower and higher performing students may be attributed to a problem in the GORT that disproportionately affects performance on lower level passages. Specifically, previous research shows test takers can guess correctly some of the multiple-choice questions without reading the passages and that these problematic items are more often found in the lower-level passages (Keenan & Betjemann, 2006).

Hua and Keenan provided two explanations for why performance-level differences were present in the contribution of listening comprehension to reading comprehension on only the GORT and QRI subtests. For one, the authors posited that the role of listening comprehension might have also been impacted by the increase in the number of multisyllabic, complex vocabulary found in higher-level passages. Because vocabulary also influences listening comprehension, Hua and Keenan reasoned the tests that involve increasingly complex vocabulary in later passages—the WJPC and PIAT—maintain the contribution of listening comprehension to reading comprehension across performance levels. However, the authors note that the results showing differences in the contributions of listening comprehension on the GORT (a test that includes increasingly complex vocabulary in more advanced passages) did align with this reasoning, which leads to the authors’ second explanation. Quantile regression results from the GORT and the QRI-Retells showed listening comprehension accounted for more variance in reading comprehension for higher performing students. Hua and Keenan suggested the increasing length of the passages in these tests may require more skilled readers to use higher-level comprehension skills (e.g., make inferences and integrate information across several sentences). Thus, listening comprehension would contribute more strongly to reading comprehension for higher performing students on the GORT and QRI-Retells whereas the amount of variance would remain unchanged across performance levels on the tests that include only single-sentence passages (WJPC and PIAT).


These findings have important implications for practitioners. The findings provide further evidence that educational assessments are imperfect and that tests that attempt to assess the same construct often vary. In this case, the findings show there is variation in the underlying skills that reading comprehension tests measure. This may be unsurprising given reading comprehension is a particularly complex construct. Policymakers, practitioners, and researchers would be prudent to consider test features (e.g., how vocabulary and text length varies across passages) when selecting reading comprehension assessments and interpreting results. Moreover, it may benefit educators to incorporate multiple measures of reading comprehension, especially when using these tests to make high-stakes decisions. Furthermore, the finding that there were differences in the relative contributions of word recognition and listening comprehension as a function of students’ performance level on some reading comprehension measures suggests that interpretations of reading comprehension results may need to take into account students’ performance level. The information presented in this paper about the five commonly used reading comprehension measures can inform researchers and practitioners who use these measures.