Rodgers, W. J., Morris-Matthews, H., Romig, J. E., & Bettini, E. (2021). Observation systems in special education: A synthesis of validity for observation systems. Review of Educational Research. Advance online publication. http://dx.doi.org/10.3102/00346543211042419
Summary by Jeremy Miciak, Ph.D.
Context for the Study
Learning and learning difficulties are the result of a complex interaction between individual factors (e.g., language, cognition, and background knowledge) and environmental factors (e.g., exposure to language and literacy, instruction at school). Historically, special education research has disproportionately focused on individual traits that predict learning and/or learning difficulties. As a result, there is a robust literature on the reliability and validity of instruments to measure individual traits that are associated with educational outcomes. However, relatively fewer studies have employed classroom observation systems to identify promising instructional practices and environmental factors that shape students’ classroom experiences. Fewer still have systematically evaluated the reliability and validity of classroom observation instruments. This represents a significant gap in special education observation research—a gap addressed by the current study.
Purpose of the Study
All scientific research relies on the collection of reliable and valid data. This is true when a study employs a test of reading comprehension to evaluate an intervention or includes an observation instrument to characterize classroom instruction. In this study, Rodgers and colleagues (2021) conducted a systematic review of the evidence for validity of observation instruments used in special education classroom observation research. Their goal was to determine to what extent authors conducting observational research report evidence for the validity of the instruments they utilize to collect observational data, as well as summarizing how instruments are utilized in observational research.
What is Test Validity?
Test validity is a poorly understood construct in educational research and practice. In short, many consumers of research frame test validity as a binary decision: Is this test valid for the purpose for which I am using it? This framing is misleading as it suggests that there exists a definitive criterion by which a test can be deemed valid or not valid. No such criterion exists.
Rodgers and colleagues utilize an argument-based approach to validity, in which validity is defined as the extent to which evidence supports the inference that you wish to draw from test-generated data. This formulation is preferable to binary framings in that it allows researchers and practitioners to consider the specific inference to be made, as well as the accrued evidence for that inference.
In this study, Rodgers and colleagues evaluate three key inferences that researchers may wish to make based on classroom observation data:
- Scoring inference refers to the inference that the score obtained by the instrument translates to the observation of a target behavior without bias, or the introduction of extraneous factors that might affect scores.
- Generalization inference refers to the inference that the observed score is associated with, or can be generalized to, performance over a larger number of observations.
- Extrapolation inference refers to the assumption that scores obtained via the observation instrument relate to a broader construct of empirical or theoretical interest.
Studies that report evidence for each of these inferences allow for stronger inferential claims—consumers of this research can have greater confidence in the study’s findings.
Methods Employed in This Study
A systematic review includes several features that distinguish it from a narrative review or an essay:
- A focus on a pre-identified question
- A comprehensive literature search
- Explicit criteria by which articles are included and/or excluded
- A structured format by which study data and findings are obtained
- Transparent reporting so that the study can be replicated
Rodgers and colleagues followed this process well:
Research Question 1: What evidence have researchers provided to support scoring, generalization, and extrapolation inferences in classroom observation studies?
Research Question 2: How has the evidence provided by researchers for these inferences changed from 1975 to 2020?
Literature Search and Inclusion/Exclusion Criteria: The authors include peer-reviewed studies published between 1975 and 2020. To be included, the study had to observe instruction for teachers of students with disabilities, use an observational instrument in which coding schemes were identified prior to observation, and focus on classroom activities.
Obtaining Study Data: Study data were systematically coded by the research team, who worked together to develop the key constructs to capture and ensure accuracy among the team. Of particular interest, the authors captured information about evidence provided for different types of inferences one might make from observational data.
Key Findings
Across the three inferences of interest, a clear pattern emerged: few studies reported robust evidence for the observation instrument employed. No study reported robust, comprehensive evidence for each of the inferences of interest. There were also particularly glaring gaps in reporting: (1) very few studies reported on potential bias; (2) most studies did not provide evidence to support the generalization from the sample of observations in the study to a broader universe of observations; and (3) most studies did not provide strong evidence extrapolating to constructs of interest.
Implications of Findings
The results of this literature review of 102 studies highlight several shortcomings in special education research. First, the large number of available instruments (more than 35) likely impedes converging evidence for validity such as that observed when considering measures of academics or cognition. Researchers who utilize bespoke observation instruments cannot rely on a large body of research on the validity of the instrument to report in their own study.
It is incumbent upon researchers to generate and report on evidence of this kind in research studies. In this review, the authors noted that most authors did not take (or did not report) the steps necessary to examine the appropriateness of the observation instruments they employed, even in cases where the observation instrument was used for the first time in that very study.
This state of practice represents a significant challenge to observational research. Few would accept measures of student learning that have not been thoroughly evaluated to ensure evidence exists for their validity. Consumers of observational research should require similar evidentiary levels.
Importance of Study
This study highlights a critical challenge in special education research. Observational research is meant to identify promising instructional practices that may be evaluated in subsequent intervention studies. For this purpose, observational studies may be particularly powerful because they most often occur in authentic education contexts. However, to meet this purpose, special education researchers must pay careful attention to the instruments they use to generate data. The findings of this review suggest there is work to be done in better characterizing instructional and environmental factors that influence student learning.