Learning Disabilities Research

Learning Disabilities Research & Practice, 25(2), 60–75C© 2010 The Division for Learning Disabilities of the Council for Exceptional Children

Creating a Progress-Monitoring System in Reading for Middle-SchoolStudents: Tracking Progress Toward Meeting High-Stakes Standards

Christine Espin, Teri Wallace, Erica Lembke, Heather Campbell,and Jeffrey D. LongUniversity of Minnesota

In this study, we examined the reliability and validity of curriculum-based measures (CBM)in reading for indexing the performance of secondary-school students. Participants were 236eighth-grade students (134 females and 102 males) in the classrooms of 17 English teachers.Students completed 1-, 2-, and 3-minute reading aloud and 2-, 3-, and 4-minute maze selectiontasks. The relation between performance on the CBMs and the state reading test were examined.Results revealed that both reading aloud and maze selection were reliable and valid predictorsof performance on the state standards tests, with validity coefficients above .70. An exploratoryfollow-up study was conducted in which the growth curves produced by the reading-aloud andmaze-selection measures were compared for a subset of 31 students from the original study. Forthese 31 students, maze selection reflected change over time whereas reading aloud did not. Thispattern of results was found for both lower- and higher-performing students. Results suggestthat it is important to consider both performance and progress when examining the technicaladequacy of CBMs. Implications for the use of measures with secondary-level students forprogress monitoring are discussed.

In recent years, much attention has been directed to earlyintervention and prevention in reading. An alternative to asingular focus on early intervention is an approach in whichearly intervention is combined with continuous, long-term,intensive interventions for struggling readers. “Long term”in this approach refers to reading instruction that extendsinto the high school years. The goal of such an approachwould be to diminish the magnitude of reading difficultiesexperienced by struggling readers and increase the likelihoodof postgraduation success. Supporting the notion that long-term, intensive reading interventions may be needed for aselect group of students are two sources of data: (1) resultsof early intervention studies and (2) results of secondary-school studies for students with learning disabilities.

Need for Long-Term, Intensive InterventionEfforts

Recent research on the effects of early identification andintervention programs have produced promising outcomesand demonstrated reductions in the magnitude and preva-lence of reading failure (O’Connor, Fulmer, Harty, & Bell,2005; O’Connor, Harty, & Fulmer, 2005; Vaughn, Linan-Thompson, & Hickman, 2003). However, these studies also

Requests for reprints should be sent to Christine Espin, Wassenaarseweg52, PO Box 9555, 2300 RB Leiden, The Netherlands. Electronic inquiriesshould be sent to

have uncovered a small group of children who “fail to thrive”(Vaughn et al., 2003), even when given intensive and poten-tially powerful interventions. Such children either do notreach a level of performance that warrants placement into atypical instructional setting or do not maintain satisfactorylevels of performance without continued intensive interven-tions. These students have reading difficulties that seem to beespecially resistant to change (see Torgesen, 2000) and areoften considered to have learning disabilities (LD).

Research at the secondary-school level reveals that stu-dents with LD continue to experience reading difficultieswell into their high school years. Secondary-school studentswith LD experience difficulties with phonological, languagecomprehension, and reading fluency skills (Fuchs, Fuchs,Mathes, & Lipsey, 2000; Vellutino, Fletcher, Snowling, &Scanlon, 2004; Vellutino, Scanlon, & Tanzman, 1994; Vel-lutino, Tunmer, Jaccard, & Chen, 2007). They typically per-form at levels 4–6 years behind non-LD peers in readingand score in the lowest decile on reading achievement tests(Deshler, Schumaker, Alley, Warner, & Clark, 1982; Levin,Zigmond, & Birch, 1985; Warner, Schumaker, Alley, & Desh-ler, 1980). For example, on the 2007 National Assessmentof Educational Progress (Lee, Grigg, & Donahue, 2007), 66percent of students with disabilities in public schools scoredbelow a Basic Level, compared to only 24 percent of studentswithout disabilities. (A Basic Level implies partial masteryof the knowledge and skills needed for proficient work at agiven grade level.)


Taken together, research on younger and older childrenwith reading difficulties produces a picture of students whosereading difficulties begin early and persist throughout theirschool career. For such students a program of interventionthat begins early—and then continues throughout their schoolcareers—is needed.

Reading Interventions at the Secondary-SchoolLevel

Two questions arise when considering reading interventionsfor secondary-school students with LD. The first is: At whatlevel do students need to read to be successful after highschool graduation? In recent years, this question often hasbeen addressed through the development of state standardstests in reading. Such tests define, by design or default, thelevel of reading considered to be necessary for students tobe successful at the secondary-school level—this despite thefact that the extent to which many state tests reflect the type ofreading necessary for success either in school or in postsec-ondary settings is unknown. However, given the high-stakesnature of state tests for schools in terms of meeting No ChildLeft Behind standards, and for students who are required topass reading tests to graduate (as is the case in 23 states;Center on Education Policy, 2008), the tests are an importantoutcome for students and schools at the secondary-schoollevel.

The second question is: How can we determine whetherour reading interventions are effective? The reading progressof secondary-school students with LD might prove to beslow and incremental—but not necessarily unimportant. Forexample, improvement of even one grade level (to use atypical metric) in reading over the course of 4 years in highschool might translate into large advantages in post–highschool settings. Yet are there instruments that are sensitiveto such slow and incremental growth? Are those instrumentsreliable and valid, and can they be tied to success on tasksof importance, such as performance on state reading testsor performance in postsecondary educational settings? Oneinstrument that might potentially fulfill these requirements iscurriculum-based measurement (CBM).


CBM is a system of measurement designed to allow teachersto monitor student progress and evaluate the effectiveness ofinstructional programs (Deno, 1985). The success of CBMrelies on two key characteristics: practicality and technicaladequacy (Deno, 1985). With respect to practicality, if themeasures are to be given on a frequent basis, they must betime efficient and easy to develop, administer, and score andmust allow for the creation of multiple equivalent forms. Withrespect to technical adequacy, if the measures are to provideeducationally useful information, they must be valid and re-liable indicators of performance in an academic area. For ameasure to be considered a valid indicator of performance,evidence must demonstrate that performance on the measurerelates to performance in the academic domain more broadly.

In reading, the number of words read correctly in 1 minuteis often used as a CBM indicator of general reading perfor-mance at the elementary-school level (Wayman, Wallace,Wiley, Ticha, & Espin, 2007). One-minute reading-aloudmeasures are time efficient and easy to develop, adminis-ter, and score, and they allow for the creation of multipleequivalent forms. Further, a large body of research supportsthe relation between the number of words read aloud in 1minute and other measures of reading proficiency, includingreading comprehension (see reviews by Marston, 1989; Way-man et al., 2007). Although most CBM reading research hasfocused on a reading-aloud measure, support also has beenfound for the technical adequacy of a maze-selection mea-sure (see Wayman et al., 2007). In a maze-selection measure,every seventh word of a passage is deleted and replaced witha multiple-choice item consisting of the correct word plustwo distracters. Students read through the text and choosethe correct word for each multiple-choice item. Specific tothe present study, both reading-aloud (Crawford, Tindal, &Stieber, 2001; Hintze & Silberglitt, 2005; McGlinchey &Hixson, 2004; Silberglitt & Hintze, 2005; Stage & Jacobsen,2001) and maze-selection measures (Wiley & Deno, 2005)have been shown to predict performance on state standardstests.

Although research supports the technical adequacy ofboth reading aloud and maze selection, the majority ofthat research has been done at the elementary-school level(Wayman et al., 2007). Far less research has been conductedin reading at the secondary-school level, even though the re-sults of cross-age studies suggest that the nature and type ofCBM in reading might need to change as students becomeolder and more proficient readers (Jenkins & Jewell, 1993;MacMillan, 2000; Yovanoff, Duesbery, Alonzo, & Tindal,2005). Many of the studies that have been conducted in read-ing at the secondary-school level have focused on readingas it relates to learning in the content areas (e.g., Espin &Deno, 1993a, 1993b; Espin & Deno, 1994–1995; Fewster &MacMillan, 2002) rather than on the development of generalreading proficiency. However, a small group of studies hasfocused on general reading proficiency.

Fuchs, Fuchs, and Maxwell (1988) examined the va-lidity of reading aloud for students with mild disabili-ties across grades 4–8. Across-grade correlations betweenwords read correctly (WRC) in 1 minute and scoreson comprehension and word study subtests of a stan-dardized achievement test were .91 and .80, respectively;however, because the study was not specifically focusedon the secondary-school level, correlations were not re-ported separately for the secondary-school students in thestudy.

Three subsequent studies focused specifically onsecondary-school students. Espin and Foegen (1996) ex-amined the validity of three CBMs—reading aloud,maze selection, and vocabulary matching—on thecomprehension, acquisition, and retention of expositorytext for students in grades 6–8. Comprehension, acquisi-tion, and retention were measured with researcher-designed,multiple-choice questions given immediately after reading(comprehension), immediately after instruction on the text(acquisition), and a week or more following instruction


(retention). Correlations ranged from .54 to .65 and weresimilar for comprehension, acquisition, and retention mea-sures. Brown-Chidsey, Davis, and Maya (2003) examinedthe reliability and validity of a 10-minute maze task—asomewhat long task by CBM standards—as an indicator ofreading for students in grades 5–8. They found that scoresgenerally differentiated students by grade level and specialeducation status. Rasinski et al. (2005), in discussing theimportance of reading fluency for high school students, re-ported correlations between WRC in 1 minute and scoreson a state standards test of .53 for ninth-grade students.Descriptive data and methods were not reported in thearticle.

In sum, little research has been conducted at thesecondary-school level on the development of CBM read-ing measures as indicators of general reading proficiency,and that which has been done has been limited in terms ofmeasures and methodology, or has not focused specificallyon secondary-school students. What is more, the research todate has focused on the characteristics of the measures asperformance or static measures, not as progress or growthmeasures. The validity and reliability of the measures maydiffer based on their intended use.

In this article, we examine the technical adequacy of CBMreading measures for secondary-school students. Specif-ically, the reliability and validity of CBMs as predic-tors of performance on a state standards test in readingis examined. Differences related to time frame and scor-ing procedure are examined. Reading-aloud and maze-selection measures are selected because of previous re-search demonstrating their practical and technical adequacyat the elementary-school level and their potential promiseat the secondary-school level. Time frames are examinedbecause longer samples of work might be needed at themiddle-school level to obtain a distribution of student scores.For example, reading-aloud scores might bunch together at1 minute but spread out at 3 minutes. Finally, scoring pro-cedures are examined to determine the influence of er-rors on the reliability and validity of students’ scores.For example, counting the number of correct selectionson a maze task is less time consuming than counting thenumber of correct minus incorrect selections, but usinga correct minus incorrect score may help to control forguessing.

Two research questions are addressed in the study:

(1) What are the reliability and validity of reading aloudand maze selection for predicting performance on astate standards test in reading?

(2) Do reliability and validity vary with time frame andscoring procedures?

Our primary focus was on the technical adequacy ofCBMs as static measures or indicators of performance ata single point in time. However, we were also able to col-lect progress measures on a small subsample of the orig-inal sample. Thus, we conducted an exploratory study inwhich we compared the growth rates produced by reading-aloud and maze-selection measures for this subsample ofstudents.



Setting and Participants

The study took place in two middle schools in an urban dis-trict of a large, midwestern metropolitan area. The districtenrolled over 47,000 students. Seventy-five percent of thestudents were from diverse cultural backgrounds, 24 percentreceived ESL services, 67 percent were eligible for free andreduced lunches, and 13 percent were in special education.The first school had 669 students in grades 6–8. Eighty-three percent of the students were from diverse cultural back-grounds, 35 percent received ESL services, 83 percent wereeligible for free and reduced lunches, and 15 percent werein special education. The second school had 778 studentsin grades 6–8. Sixty-two percent of the students were fromdiverse cultural backgrounds, 18 percent received ESL ser-vices, 56 percent were eligible for free or reduced lunches,and 16 percent were in special education.

All eighth-grade students were invited to participate inthe study to ensure a range of student performance levels.Participants were 236 eighth-grade students (134 femalesand 102 males) in the classrooms of 17 English teachersfrom the two schools. Fifty-eight percent of the participantswere eligible for free or reduced lunches. Students wereCaucasian (34 percent), Asian American (24 percent),African American (20 percent), Hispanic (19 percent), andNative American (3 percent). Nine percent of the studentswere receiving special education services for learningdisabilities or mild disabilities (4 percent), speech andlanguage (3 percent), emotional and behavior disorders (1percent), or other health impaired (1 percent). Fifty-eightpercent of the students spoke English at home. The restspoke Spanish (18.5 percent), Hmong (16 percent), Laotian(4 percent), Vietnamese (1 percent), Cambodian (1 percent),Amharic (.5 percent), Chinese (0.5 percent), and Somali(0.5 percent). The mean standard score on the state standardsreading test for Sample 1 was 626.9. This compared to astate-wide mean score of 640.6 and a district-wide meanscore of 607.3.

Note that the sample did not consist of struggling read-ers only, even though the primary purpose of the study wasto identify performance and progress measures for strug-gling readers. To establish the reliability and validity ofCBM, it was necessary to have a sample that representeda range of student ability levels, because validity and relia-bility coefficients could be negatively affected by a truncateddistribution of scores. We had two options. One was to se-lect students who were struggling readers across a rangeof grade levels, similar to the approach taken by Fuchset al. (1988). A second was to work within one grade level,but to include students across a range of performance lev-els within that grade. Given that the purpose of the studywas to tie the CBM to performance on a state standardstest, and given that the state standards test was given in onlyone grade, we chose the latter approach. This approach isnot unique. In a review of the CBM research in reading,


(Wayman et al., 2007), 28 of the 29 technical adequacy stud-ies conducted at the elementary-school level used generaleducation samples (13 studies) or mixed samples of generaland special education (15 studies). Only 1 used an exclusivelyspecial education sample.


Predictor variables. Predictor variables were scores on twoCBM tasks: reading aloud and maze selection. The reading-aloud and maze-selection tasks were drawn from human-interest stories published in the local daily newspaper andwere selected on the basis of content, readability level, length,and scores on a pilot test conducted with four students whowere not involved in the study. Passages whose content wasdetermined to be too technical or culturally specific were notused. To ensure that students would not complete the CBMtasks before time was expired, only passages that were longerthan 800 words were selected. Readability was calculated us-ing the Flesch-Kincaid formula (Kincaid, Fishburne, Rogers,& Chissom, 1975) via Microsoft Word, and the Degrees ofReading Power (DRP; Touchstone Applied Science and As-sociates, 2006). Readability levels for the selected passagesranged from fifth to seventh grade and DRP levels rangedfrom 51 to 61. Means (number of words read aloud in 3 min-utes) and standard deviations from the pilot study for selectedpassages were: 421.5 (SD = 80.5), 489.5 (SD = 117), 432.5(SD = 140.5), and 401.7 (SD = 75).

The reading-aloud task was administered to students onan individual basis using standardized administration proce-dures. Students read aloud from the passage while the exam-iner followed along on a numbered copy of the same passage,making a slash through words read incorrectly or words sup-plied for the student. The examiner timed for 3 minutes usinga stopwatch, marking progress at 1, 2, and 3 minutes. Read-ing aloud was scored for total words read (TWR) and WRCat 1, 2, and 3 minutes.

Maze-selection passages were created from the same sto-ries used for reading aloud. Every seventh word was deletedand replaced by the correct choice and two distracters. Thedistracters were within one letter in length of the correctword but started with different letters of the alphabet andcomprised different parts of speech (see Fuchs, Fuchs, Ham-lett, & Ferguson, 1992, for maze-construction procedures).The three word choices were underlined in bold print andwere not split at the end of the sentence in order to preservecontinuity for the reader.

The maze selection task was administered to students in agroup setting using standardized administration procedures.Students read silently for 4 minutes, making selections foreach multiple-choice item. Examiners timed for 4 minutesand instructed students to mark their progress with a slashat 2, 3, and 4 minutes. Examiners monitored to ensure thatstudents made the slashes. Maze selection was scored for cor-rect maze choices (CMC) and correct minus incorrect choices(CMI) in 2, 3, and 4 minutes. As a control for guessing, andfollowing the procedures used in previous research on mazeselection (Espin, Deno, Maruyama, & Cohen, 1989; Fuchset al, 1992), maze scoring was stopped when three consec-

utive incorrect choices were made. A recent investigationcomparing different maze-selection scoring procedures re-vealed no differences in criterion-related validity associatedwith using a two-in-a-row versus three-in-a-row incorrectrule (Wayman et al., 2009).

Criterion variables. The criterion variable in this studywas performance on the Minnesota Basic Standards Test(MBST) in reading, a high-stakes test required for gradu-ation. The MBST was designed by the state of Minnesotato test the minimum level of reading skills needed for sur-vival (MN Department of Education, 2001) and, at the timeof the study, was administered annually in the winter to alleighth-grade students in Minnesota.1 The untimed test com-prised four or more passages of 500 words or more selectedfrom newspaper and magazine articles. Passages were bothnarrative and expository and had average DRP levels rangingfrom 64 to 67. Each passage was followed by multiple-choicequestions, with approximately 40 questions per test. The testwas constructed so that 60 percent of the questions on the testwere literal, 30 percent inferential, and 10 percent could beeither. The test was machine-scored on a scale from 0 to 40,and then the raw score was converted to a scale score between375 and 750. A passing scale score was 600, which corre-sponded to 75 percent correct (MN Department of Education,2001). Students who did not pass the test were permitted toretake it two times each year. Students had to pass the test inorder to graduate from high school.

The MBST Technical Manual (MN Department of Educa-tion, 2001), reported reliability and validity information forthe MBST Reading test. Internal consistency measures forreliability were based on the Rasch model index of personseparation. The Kuder–Richardson 20 internal consistencyreliability estimate was .90. No alternate-form reliability wascalculated. Content validity, according to the manual, wasdetermined by the relationship of the reading test items tostatewide content standards as verified by educators, itemdevelopers, and experts in the field. Construct validity wasmeasured by item point-biserial correlations (the correlationbetween students’ raw scores on the MBST and their scoreson individual test items). The mean point biserial correlationwas .38. There were no criterion-related validity statisticsnoted.


In the fall, students completed two maze passages in a groupsetting in their classrooms. On a subsequent day in the sameweek, students completed two reading-aloud passages indi-vidually. Type of measure (reading aloud vs. maze selection)and passage were counterbalanced across students, as was theorder in which the students completed the passages withinreading aloud or maze selection. Examples of each task weregiven to students prior to administration. The MBST wasadministered by teachers to students in February.

Sixteen graduate students administered and scored thereading-aloud and maze-selection measures. Prior to datacollection, the graduate students were interviewed by mem-bers of the research team to ascertain their ability to work


with students and to accurately score reading samples. Fol-lowing this initial screening, the graduate students partic-ipated in two 2-hour training sessions on administrationand scoring. During training, the graduate students admin-istered and scored three samples. Inter-scorer agreementon the three passages between the data collectors and thetrainer was calculated by dividing the smaller by the largerscore and multiplying by 100. Inter-scorer agreement ex-ceeded 95 percent on maze selection and 90 percent on read-ing aloud for all scorers. During data collection and scor-ing, 33 percent of the reading-aloud and 10 percent of themaze-selection probes were randomly selected to be checkedfor accuracy of scoring. Inter-scorer agreement exceeded90 percent for all measures.


Means and standard deviations for reading-aloud and maze-selection scores for each time frame are reported in Table 1.Examination of mean scores reveals that students worked ata steady pace across the duration of the passages. Studentsread aloud approximately 125 words with 6 errors per minuteacross the 3 minutes and made approximately 6 correct mazechoices with 0.5 errors per minute across the 4 minutes ofmaze. The mean score for study participants on the MBST inreading was a standard score of 626.90 (SD = 65.66), with arange of 475–750.

To determine alternate-form reliability, correlations be-tween scores on the two forms of the maze-selection andreading-aloud measures were calculated for each time frameand scoring procedure (see Table 2). Reliabilities for bothreading aloud and maze were generally above .80. Relia-bilities for reading aloud ranged from .93 to .96, and weresimilar across scoring method and sample duration. Relia-bilities for maze ranged from .79 to .96, and were generallysimilar for scoring method, but increased somewhat with timeframe. The highest obtained reliability coefficient was for the4-minute maze passages scored for CMI (r = .96); however,reliabilities for the 3-minute maze selection were above .85,regardless of scoring method.

TABLE 1Means and Standard Deviations for Reading Aloud and Maze

Selection by Scoring Procedure and Time Frame

Curriculum-Based Measurementsand scoring procedure Time

Reading aloud 1 minute 2 minutes 3 minutesTotal words read 125.88 250.46 373.31

(43.75) (85.05) (125.95)Words read correct 119.82 238.54 355.27

(47.29) (92.14) (136.92)Maze selection 2 minutes 3 minutes 4 minutes

Correct choices 12.33 18.76 25.24(7.12) (10.87) (14.53)

Correct minus 11.18 17.17 23.10incorrect choices (7.53) (11.40) (15.17)

Note: Standard deviations are in parentheses.

TABLE 2Alternate-Form Reliability for Reading Aloud and Maze Selection

by Scoring Procedure and Time Frame

Curriculum-Based Measurementsand scoring procedure Time

Reading aloud 1 minute 2 minutes 3 minutesTotal words read .93 .96 .95Words read correct .94 .96 .94

Maze selection 2 minutes 3 minutes 4 minutesCorrect choices .80 .86 .88Correct minus .79 .86 .96

incorrect choices

Note: All correlations significant at p < .01.

TABLE 3Predictive Validity Coefficients for Reading Aloud and Maze

Selection with MBST by Scoring Procedure and Time Frame

Curriculum-Based Measurementsand scoring procedure Time

Reading aloud 1 minute 2 minutes 3 minutesTotal words read .76 .77 .76Words read correct .78 .79 .78

Maze selection 2 minutes 3 minutes 4 minutesCorrect choices .75 .77 .80Correct minus .77 .78 .81

incorrect choices

Note: All correlations significant at p < .01. MBST : Minnesota BasicStandards Test.

To examine the predictive validity of the measures, cor-relations between mean scores on the two forms of reading-aloud and maze-selection measures and scores on the MBSTwere calculated (see Table 3). Correlations ranged from .75to .81. The magnitude of the correlations was similar acrosstype of measure (reading aloud and maze) and method ofscoring. For reading aloud, correlations for 1, 2, and 3 min-utes were virtually identical. For maze selection, a consis-tent but small increase in correlations was seen across timeframes, with correlations of .75 (CMC) and .77 (CMI) forthe 2-minute measure and .80 (CMC) and .81 (CMI) for the4-minute measure.

In summary, results revealed that both maze selection andreading aloud produced respectable alternate-form reliabil-ities, although reading aloud yielded consistently larger re-liability coefficients than maze. Few differences in reliabili-ties were seen for scoring procedure or time frame with theexception that reliabilities for the maze selection increasedsomewhat with time. Predictive validity coefficients weresimilar for the two types of measures. Correlations were sim-ilar across scoring procedures for both measures. With regardto time frame, small but consistent increases in correlationswere seen for maze selection.


In this study, we examined the reliability and validity of read-ing aloud and maze selection as indictors of performance on


a state standards test. Difference in technical characteristicsrelated to time frame and scoring procedure were examined.

Both reading aloud and maze selection showed reason-able alternate-form reliabilities at all time frames, with mostcoefficients at or above .80. In general, reading aloud re-sulted in higher alternate-form reliability coefficients (rang-ing from .93 to .96) than did maze selection (ranging from.79 to .96), but reliability for maze selection was in therange typical for CBM. Time frame did not influence re-liability coefficients for reading aloud but had some influ-ence on maze selection. Obtained reliability coefficients formaze increased with time frame, with coefficients for the2-minute time frame hovering around .80, but increasing for3-minute (r’s = .86) and 4-minute (r = .88 and .96) timeframes. Finally, scoring procedure had little effect on relia-bility, with the exception that when 4-minute maze selectionwas scored for CMI, reliability was somewhat larger (r =.96) than when it was scored for CMC (r = .88).

Like reliability coefficients, validity coefficients werequite similar across type of measure, time frame, and scoringprocedure. Validity coefficients for reading aloud ranged be-tween .76 and .79 and were similar across scoring procedureand time frames. Maze-selection coefficients ranged between.75 and .81 and also were similar across scoring procedure.A systematic increase in validity coefficients was seen withan increase in time for maze, but differences were small.

We wish to make two observations regarding the magni-tude of the validity coefficients found in the performancestudy. First, the correlations obtained in our study werelarger than those found in previous research at the middle-school level. For example, Yovanoff et al. (2005) reportedcorrelations of .51 and .52 between WRC in 1 minute andscores on a reading comprehension task for eighth-grade stu-dents. Espin and Foegen (1996) reported correlations of .57and .56, respectively, between WRC in 1 minute and CMCin 2 minutes and scores on a reading comprehension task.

One might hypothesize that the differences in correla-tions are related to the materials used to develop the CBMs,although no consistent pattern of differences can be seenacross studies. Yovanoff et al. (2005) used grade-level prosematerial, Espin and Foegen (1996) used fifth-grade level ex-pository material, and we used fifth- to seventh-grade human-interest stories from the newspaper—material that might beconsidered to be both narrative and expository. Moreover,previous research conducted at the elementary-school levelhas revealed few differences in reliability and validity forCBMs drawn from material of different difficulty levels orfrom various sources (see Wayman et al., 2007, for a review).

It is possible that differences are related to the criterionvariable used. Both Yovanoff et al. (2005) and Espin andFoegen (1996) used a limited number of researcher-designedmultiple-choice questions as an outcome, whereas in ourstudy we used a broad-based measure of comprehension de-signed to scale student performance across a range of levels.Supporting this hypothesis are data from two studies demon-strating nearly identical correlations (in the .70s) to those wefound between the CBM reading-aloud and maze-selectionmeasures and the MBST (Muyskens & Marston, 2006; Ticha,Espin, & Wayman, 2009). In addition, Ticha et al. (2009)

found high correlations between maze-selection scores anda standardized achievement test.

Second, the state standards test used in the current studywas designed to test the minimal reading competency forstudents in eighth grade. Thus, one might question whetherthe CBMs would predict reading competence as well if thecriterion measures were measures of broader reading compe-tence. Results of Ticha et al. (2009) indicate that the readingmeasures predict performance on a standardized reading testas well as (or better than) they predict performance on thestate standards test. Perhaps the nature of the state test servesto reduce the overall variability in scores and thus serves toreduce the correlations. Replication of the current study withother outcome measures of reading proficiency is in order.

In summary, the results supported the reliability and va-lidity of both reading aloud and maze selection as indictorsof performance on a state standards reading test for middle-school students. For reading aloud, our data, combined withpractical considerations, would suggest use of a 1-minutesample scored for TWR or WRC as a valid and reliable indi-cator of performance. Little was gained in technical adequacyby increasing the reading time. Given that reading aloud istypically scored for WRC, and given that this scoring pro-cedure is no more time consuming than scoring TWR, wewould recommend scoring the sample for WRC rather thanTWR.

For maze selection, our data, combined with practicalconsiderations, would suggest use of a 3-minute selec-tion task scored for CMC as a valid and reliable indicatorof performance. Although reliability and validity coeffi-cients were the strongest for 4 minutes, the differences be-tween 3- and 4-minute coefficients were small in magnitude,and both data collectors and teachers reported anecdotallythat a 4-minute maze task was tedious for the students tocomplete.

Although our data support the use of both WRC in 1minute and CMC in 3 minutes as predictors of performanceon a state standard test, one might ask how teachers canuse such data in their decision making. A common approachis to create a district-wide cutoff score on the CBM thatis associated with a high probability of passing the statestandards test. For example, district-wide data may showthat, of students who read 145 WRC in 1 minute, 80 percentpass the state standards test. Teachers might then set a goalof 145 WRC in 1 minute for their students. The disadvantageof a cutoff score for students who struggle in reading is thatthese students often perform well below the cutoff score. Analternative approach is to present the relationship betweenperformance on the CBM measures and the likelihood ofpassing the state standards test along the entire performancecontinuum. For example, district-wide data may show that,of students who read 100 WRC in 1 minute, 26 percent passthe state standards test, but of students who read 126 WRCin 1 minute, 57 percent pass. Teachers may choose to set anannual goal of 126 WRC for a student who begins the yearreading only 100 WRC. This goal would move the studentcloser to a level of likely success. A method that can be usedto create these Tables of Probable Success using CBM datais explained and illustrated in Espin et al. (2008).


In conclusion, our study supports the technical adequacyof both reading aloud and maze selection as indicators ofperformance on a state standards reading test. In the past,technical adequacy research would stop here, with the as-sumption that, if both measures were shown to be valid andreliable with respect to performance on a criterion measure,then both measures would reflect growth as progress mea-sures. However, development of more advanced statisticaltechniques such as Hierarchical Linear Modeling now allowfor examination of the characteristics of CBM measures asprogress as well as performance measures (e.g., see Shin, Es-pin, Deno, & McConnell, 2004). From our original sample,we had access to one classroom with 31 students for weeklyprogress monitoring. Although this sample size was too smallto produce generalizable results about typical growth rates onthe measures, it was large enough to conduct an exploratory,within-subject comparison of the growth rates produced bythe two measures for a sample of students. Specifically, weexamined differences in the sensitivity of the two measuresto growth and their relation to performance on the state stan-dards test. Results of this exploratory study could help us togenerate hypotheses for future research.




Participants in exploratory progress study were selected fromthe original sample and were 31 (10 male; 21 female) stu-dents from one classroom in the first school described above.Fifty-five percent of the students were eligible for free or re-duced lunches. Students were Caucasian (42 percent), AsianAmerican (26 percent), African American (16 percent), His-panic (10 percent), and Native American (6 percent). Tenpercent of the students received special education servicesfor emotional disturbance or speech-language difficulty. Six-teen percent of the students were identified as English lan-guage learners (ELL) but did not receive ESL services. Themean standard score on the state standards reading test forthe students was 646.17.


Students were monitored weekly on both a maze-selectiontask, administered in a group setting by the classroom teacher,and a reading-aloud task, administered on an individual basis

TABLE 4Alternate-Form Reliability for Reading-Aloud and Maze-Selection Progress-Monitoring Passages

Reading aloud, words correct, 1 minutePassages 1 and 2 2 and 3 3 and 4 4 and 5 5 and 6 6 and 7 7 and 8 8 and 9 9 and 10

.92 .91 .85 .88 .88 .86 .79 .84 .83

Maze, correct choices, 3 minutes.72 .84 .69 .80 .80 .85 .90 .83 .74

n = 25 to 31.Note: All correlations significant at p < .01.

by a member of the research team. The maze-selection andreading-aloud tasks were created from the same passageseach week. Sixteen passages were selected from human-interest stories from the newspaper. Passages that requiredspecific background knowledge (e.g., knowledge of the gameof baseball) were eliminated from consideration. For the re-maining passages, readability levels were calculated usingboth the DRP (Touchstone Applied Science and Associates,2006) and Flesch-Kincaid (Kincaid et al., 1975). In addi-tion, teachers were consulted regarding appropriateness ofthe passages for secondary-school students. A final set of10 passages was selected based on readability formula andteacher input. DRP scores ranged from 51 to 61, representingapproximately a sixth-grade level, and Flesch-Kincaid read-ability level was between the fifth and seventh grade levels.Passages were on average 750 words long. Alternate-formreliabilities between adjacent pairs of passages are reportedin Table 4. All reliabilities were statistically significant, allbut one were above .70, and all but three were above .80. Forreading aloud, reliabilities ranged from .79 to .92, and formaze selection from .69 to .90.

Maze selection was administered first, usually on a Mon-day, and reading aloud was administered on a subsequent daywithin the same week, usually on a Friday. Progress data werecollected over a period of approximately 3 months, yieldingan average of 10 data points per student (note that duringvacation weeks, no data were collected).


Maze selection was administered by the classroom teacherusing a standard script. Maze-selection probes were scoredby graduate students. Prior to administering the measure thefirst time, the teacher observed one of the members of the re-search team administering the maze task to her class. Fidelityof treatment checks were conducted at equal intervals threetimes during the course of the study to assess accuracy ofthe administration and timing of the maze. For each fidelitycheck, the teacher was found to read the directions and com-plete the timings correctly. Reading aloud was administeredand scored by 11 of the data collectors from the originalstudy. Every week, 10 of the reading-aloud samples weretape-recorded and checked for fidelity and reliability, and 10of the maze-selection passages were checked for accuracy ofscoring. On all occasions, data collectors read the directionsand timed correctly for the reading-aloud samples. Accu-racy of scoring for reading aloud and maze selection waschecked by the two graduate students involved in the study.


Percentage agreement between the graduate students andscorers was calculated by dividing agreements by agree-ments plus disagreements. Accuracy of scoring across thestudy for both maze selection and reading aloud ranged from97 percent to 100 percent.


The sensitivity of the measures to growth and the validity ofthe growth rates produced by the measures were examined.Growth curve analyses were carried out using the MIXEDprocedure of SAS 9.1. Linear mixed effects growth curveswere used (see Fitzmaurice, Laird, & Ware, 2004, ch. 8) withdifferent models specified for each measure. Similar patternsof results were found for each time frame and scoring pro-cedure within both reading aloud and maze selection, thuswe report results selectively. First, we report on the mea-sures with the best reliability, validity, and efficiency fromthe performance study—WRC in a 1-minute reading-aloudand CMC in a 3-minute maze-selection task. Second, to pro-vide a direct comparison of reading aloud and maze selectionwith time held constant, we also report results for WRC ina 3-minute reading-aloud task. Finally, taking into consid-eration practicality of use, we report results for CMC in a2-minute maze selection. A 2-minute maze selection is morepractical than a 3-minute maze selection for ongoing and fre-quent progress monitoring, and we considered the reliabilityand validity coefficients for 2-minute maze selection to bewithin an acceptable range for progress monitoring.

Sensitivity to growth for reading aloud. The growthcurve model for reading aloud was a simple linear growthcurve,

Yi j = β0 + β1ti j + εi j , (1)

where i is the participant subscript, i = 1, . . , N , and j isthe wave subscript, j = 1, . . ., ni. In Equation (1), β 0 isthe intercept (status at wave 1), and β 1 is the linear slopewith tij = j − 1, and ε ij = b0i + b1itij + eij, which is therandom effects structure with b0i being the deviation of anindividual’s individual intercept from the mean intercept, b1i

being the deviation of an individual’s slope from the meanslope, and eij is random error (Fitzmaurice et al., 2004, ch.8). Restricted maximum likelihood was used for parameterestimation, and degrees of freedom (df ) for the t tests ofthe parameter estimates were estimated using the method ofKenward and Roger (1997).

The observed means and predicted means based on thelinear model for 1-minute reading aloud are presented at thetop of Figure 1. Detailed results of all the growth curve anal-yses are in Table 5. Results reveal that the linear slope wassignificant, β̂1 = .84, t(245) = 2.63, p = .009, and the in-tercept was significant, β̂0 = 139.01, t(28.8) = 24.87, p <.0001. The variance of the linear slopes was estimated to bezero, so the df were based on the sum of all the time pointsover all participant, � ini, rather than the number of partic-ipants (N). This resulted in relatively high df (i.e., 245) forthe t test of the slopes. However, the result is still significant

when based on the same df used for testing the intercept (i.e.,t(28.8) = 2.63, p = .014). The mean linear slope indicatesthat the number of WRC in 1 minute tended to increase at arate of .84 per wave.

The observed means and predicted means based on thelinear model for 3-minute reading aloud are presented at thebottom of Figure 1. The linear slope for WRC3 was notsignificant, β̂1 = −0.41, t(29.4) = −0.55, p = 0.58, but theintercept was significant, β̂0 = 420.04, t(29) = 26.71, p <0.001. The linear slope indicates that WRC in 3 minutesdid not show a significant increase over time. (Although notreported here, the 2-minute reading-aloud measure scoredfor WRC also showed no significant change over time.)

Sensitivity to growth for maze selection. The observedand predicted means for 3- and 2-minute maze selectionscored for CMC are presented in Figure 2. As illustratedin Figure 2, there was a change of direction at wave 8 forthe maze-selection scores. This presented a problem for theanalyses as the major goal was to estimate linear growth overtime. After a close examination of the data and discussionswith the teacher, it was determined that this shift might bedue to a passage effect. To address this problem, a piecewiseor spline model was used to fit an additional linear predictorstarting at wave 8 to account for the observed nonlinearity(see Ruppert, Wand, & Carroll, 2003, ch. 3). That is, wedecided to model the near-linear growth apparent before wave8 without deleting any of the data. The spline growth curvemodel was

Yi j = β0 + β1ti j + β2t∗i j + εi j . (2)

In Equation (2), β 0 is the intercept (constant across all 10waves), β 1 is the linear slope over wave 1 to wave 7 (withtij = j − 1), and β 2 is the linear slope starting at wave 8,with tij

∗ = 0 for wave 1 through wave 7 and tij∗ = (tij − 7)

starting at wave 8. Random effects terms were specified foreach linear slope and the intercept, that is, ε ij = b0i + b1itij

+ b2itij∗ + eij. Primary interest was on β 1 as this was the

linear slope for waves 1–7.The observed and predicted means based on the spline

model for 3-minute maze selection are shown at the topof Figure 2. The results show that each parameter estimateof the spline model was significant, β̂0 = 21.22, t(25.3) =17.22, p < .0001, β̂1 = 2.88, t(32.7) = 12.79, p < .0001,and β̂2 = −7.26, t(227) = −10.82, p < .0001. (The randomeffects component of the linear slope starting at wave 8 wasestimated to be zero accounting for the higher df for the testof H0: β 2 = 0, also see Table 5.) The latter two estimatesindicate there was an overall rate of increase of 2.88 CMCin 3 minutes per wave, but a decrease of 7.26 per wave atwave 8.

The observed and predicted means based on the splinemodel for 2-minute maze selection scored for CMC are pre-sented at the bottom of Figure 2. Each parameter estimateof the spline model was significant, β̂0 = 13.58, t(24.00) =16.93, p < .0001, β̂1 = 2.17, t(32.7) = 12.85, p < .0001,and β̂2 = −5.75, t(225) = −10.19, p < .0001. Thus, for the2-minute maze, there was an overall rate of increase of 2.17CMC in 2 minutes per wave, but a decrease of 5.75 per wave


Linear Model: Words Read Correct 1 Minute













WRC1 Obs.

WRC1 Pred.

Linear Model: Words Read Correct 3 Minutes













WRC3 Obs. WRC3 Pred

FIGURE 1 One- and 3-minute reading aloud (words read correctly) observed and predicted means by wave.


TABLE 5Detailed Results of the Growth Curve Analysis

Parameter Estimate SE df t value p value

WRC1β 0 138.1 5.5538 28.8 24.87 <.0001β 1 0.8391 0.319 245 2.63 .0091

WRC3β 0 420.04 15.7232 29 26.71 <.0001β 1 −0.4057 0.7333 29.4 −0.55 0.5842

CMC2β 0 13.5786 0.8022 24 16.93 <.0001β 1 2.1674 0.1686 32.7 12.85 <.0001β 2 −5.747 0.5642 225 −10.19 <.0001

CMC3β 0 21.22 1.2321 25.3 17.22 <.0001β 1 2.8808 0.2253 32.7 12.79 <.0001β 2 −7.2639 0.6716 227 −10.82 <.0001

Note: WRC1, WRC3: Words read correctly in 1 and 3 minutes. CMC2,CMC3: Correct maze choices in 2 and 3 minutes.

at wave 8. (Although not reported here, a similar pattern ofresults was found for 4-minute maze-selection measures.)

In summary, the results of the growth curve analysisfor the reading-aloud and maze-selection measures demon-strated that reading aloud showed minimal or no growth overtime, whereas maze selection showed significant and substan-tial growth except after week 7. This pattern of results heldgenerally across scoring procedure and time frame. Thus,there were no differences in patterns of growth found for thedifferent scoring methods for either reading aloud or mazeselection. With regard to time frame, maze selection demon-strated significant and substantial growth for both 2-minute(2.17 CMC) and 3-minute (2.88 CMC per week) time frames(although recall that these growth rates were obtained fol-lowing a correction for a score shift). However, results forreading aloud revealed a statistically significant but minimalgrowth rate for the 1-minute (.84 words per week) reading-aloud measure, but no significant growth for the 3-minute (or2-minute) measure.

One might conjecture that reading aloud might be moresensitive to growth for lower-performing than for higher-performing students. However, examination of Figure 3,which presents individual student data across time, revealsthat growth across time for reading aloud (top graph) wasfairly flat for those at both the lower and higher levels ofCBM performance. In contrast, growth across time for mazeselection (bottom graph) reflects a fanning out of scores overtime, with students at the higher levels of CBM performancereflecting larger gains than those at the lower levels of perfor-mance. The meaningfulness of the growth rates produced bythe maze-selection measures were examined in the follow-ing analysis. Reading aloud was not included in this analysisdue to the lack in interindividual variability in growth rates,which would mean that slopes would not be correlated withother variables.

Relation between growth rates and performance on theMBST. To examine the validity of the growth rates, we

investigated the extent to which the growth rates produced bythe 3-minute maze-selection (CMC) measures were relatedto performance on the MBST. Specifically, we used MBSTscores as predictors of linear slopes and intercepts but wereprimarily interested in the former. A random effects wasassociated with each fixed effects in the same manner as thegrowth curves above (i.e., ε ij = b0i + b1itij + eij, or ε ij =b0i + b1itij + b2it∗ij + eij). The mean score on the MBSTfor participants in Study 2 was a standard scored of 646.17(SD = 38.11), with a range of 587 to 750. Let mi = the MBSTscore for the ith participant, which is a static predictor (notvarying over time). For the CMC, the statistic predictor wasincorporated into the spline model,

Yi j = β0 + β1ti j + β2t∗i j + β3mi + β4mi ti j + εi j . (3)

In Equation (3), β 4 represents the association between theMBST and the slopes for waves 1–7.

The number of CMC in 3 minutes resulted in significantgrowth over time, and this growth was related to performanceon the MBST, with students passing the MBST obtaininghigher rates of growth over time than those not passing theMBST (β̂4 = 0.009, t(31.3) = 2.35, p = 0.025). Figure 4shows the predicted curves based on the estimated parametersof Equation (3) for MBST scores of 500 (not passing) and700 (passing). Note the students passing the MBST starthigher and increase at a faster rate of change. Although notreported here, this same pattern of results also was found forthe 2-minute maze-selection measure.


The characteristics of the measures as progress measureswere compared for a small subset of our original partici-pant sample. For these students, only maze selection resultedin substantial and significant growth over time (2.88 selec-tions per week for a 3-minute sample), while the 1-minutereading-aloud measure revealed statistically significant, butminimal growth over time (.84 WRC per week). Controllingfor time frame and using a 3-minute reading-aloud task didnot increase the amount of growth. In fact, a 3-minute sam-ple of reading aloud resulted in no significant growth overtime. Further, the growth rates produced by maze selectionwere significantly related to performance on the MBST—students with higher scores on the MBST also grew moreon the maze-selection measures. Growth on the 1-minutereading-aloud measure was not related to performance on theMBST.

The differences in growth for reading-aloud and maze-selection measures are surprising and difficult to explain,especially in light of the good technical adequacy of bothmeasures as performance measures. One obvious explana-tion might be differences in the materials used to constructthe measures—but recall that the reading-aloud and maze-selection tasks were created from the same passages eachweek. Another obvious explanation might be a bunching ofscores over time because of a ceiling effect on the readingaloud; however, inspection of the data reveals no ceiling ef-fect for the reading-aloud scores in the original study. In


Spline Model: Correct Maze Choices 3 Minutes














CMC3 Obs.

CMC3 Pred.

Spline Model: Correct Maze Choices 2 Minutes














CMC2 ObsCMC2 Pred

FIGURE 2 Three- and 2-minute maze selection (correct maze choices) observed and predicted means by wave.


Words Read Correctly in 1 Minute














Correct Maze Choices in 3 M inutes











W ave



t ch




FIGURE 3 Individual growth rates for 1-minute reading aloud (words read correctly) and 3-minute maze selection (correct maze choices).

addition, examination of the individual growth rates overtime, as illustrated in Figure 3, reveals no bunching of scoresover time. Both higher- and lower-performing students (interms of CBM scores) tended to maintain their relative levelsof performance over time. When viewing a similar pictureof the maze selection 3-minute task (Figure 3), one sees afairly steady increase in scores for all students, with a fan-ning out of the scores over time. This fanning out is dueto higher-performing students showing greater growth thanthe lower-performing students, an observation confirmed bythe subsequent analysis with the state standards test. A fi-

nal possible explanation is that the maze-selection task wasrelatively novel to the students while the reading-aloud taskwas not and that growth on the maze selection was due topractice on the task, rather than improvements in readingperformance. However, both the reading-aloud and maze-selection tasks were novel to the students (i.e., the district didnot regularly collect reading-aloud data on the students), andthe relation between growth on the maze and the criterionvariable would argue against simple practice effects.

A more plausible reason for differences in growth ratesproduced by the measures might be the order in which the
















FIGURE 4 Relationship between 3-minute maze selection (correct mazechoices) and Minnesota Basic Standards Test (MBST) scores.

measures were administered. In the current study, maze al-ways was administered first and reading aloud second. Per-haps completing the maze task diminished the sensitivity ofthe reading-aloud measure to change over time. Althoughthe effect of order must be considered, we would note that, ina follow-up study in which the reading-aloud measure wasadministered first and maze selection second (Ticha et al.,2009), a similar pattern of results was obtained, with mazeselection reflecting growth over time but reading aloud not.2

Another plausible reason for differences in the measuresmay be related to the small convenience sample used in thisexploratory study. Compared to the larger sample, studentsin the exploratory study had relatively higher mean MBSTscores (646.17 vs. 626.90). Correlations between the CBMand MBST scores for this subsample were lower than for thelarger sample (see Table 6), ranging from .30 to .34 for read-ing aloud, and from .48 to .57 for maze selection (despite norestriction in the range of scores on the predictor and crite-rion variables). In addition, correlations for reading aloud areconsistently and substantially lower than for maze selection.Perhaps for this particular sample of students, reading alouddid not function as a reasonable indicator of growth, butmaze selection did. However, in the follow-up study referredto earlier (Ticha et al., 2009), similar results were obtainedregarding growth rates produced by reading aloud and mazeselection. It is important to note that, in the follow-up study,performance-related correlations between the CBM and theMBST were virtually identical to those obtained in thisstudy.

Our results suggest that there may be differences in thecharacteristics of the measures when used as predictors ofperformance versus measures of progress, and that bothshould be considered when examining the technical ade-quacy of the measures. Our sample is too small to drawconclusions regarding the best measure for monitoring theprogress of secondary-school students, but it does suggestthe need for further research examining the characteristics ofthe two measures for reflecting growth over time for olderstudents. It may be that, although students as a group donot reach a ceiling in scores, each individual student reachesa “natural” level of reading fluency that, when compared to

TABLE 6Correlations for Reading Aloud and Maze Selection with MBST for

Study 2 Participants at Time of MBST

CBM measure and scoring procedure Time

Reading aloud 1 minute 2 minutes 3 minutesTotal words read .30 .32 .33Words read correct .32 .33 .34

Maze selection 2 minutes 3 minutes 4 minutesCorrect choices .52∗∗ .50∗∗ .48∗∗Correct minus .57∗∗ .55∗∗ .51∗∗

incorrect choices

n = 31.∗∗p < .01.Note: MBST : Minnesota Basic Standards Test.

others, reveals a general level of reading proficiency but doesnot change with time. In addition, despite the fact that in ourstudy neither lower- nor higher-performing students showedgrowth on the reading-aloud measure, it would be importantto replicate the findings with a large, cross-grade sample ofjust struggling readers.

Although not the original intent of the study, this ex-ploratory study also provides us with data regarding between-passage variability. The order in which the measures wereadministered was not counterbalanced across students, pre-venting us from drawing conclusions regarding typicalgrowth rates for lower- and higher-performing readers. How-ever, the design does allow us to examine characteristics ofthe passages themselves as growth measures. As is evidentin Figure 2, students displayed a steady rate of growth onthe maze-selection measure until week 8, when there was asudden spike in scores for nearly all students, followed bya fairly low score in week 9 for nearly all students. Evenmore interesting, if one examines the reading-aloud graph inFigure 2, this same spike in scores on week 8 is not evi-dent (recall that the same passages were used for readingaloud and maze selection each week). Perhaps the simplestexplanation for this pattern would be administration error onthe particular days that maze passages 8 and 9 were given.Although fidelity checks on teacher administration of mazeselection revealed that the teacher administered the passagescorrectly, those checks were conducted only three times dur-ing the course of the study and were not done on the daysthat passages 8 and 9 were administered. However, whenasked, the teacher reported no particular problems with ad-ministration on those days (although one must still consideradministration error a potential explanation for the pattern ofresults).

There is, however, a potential, somewhat troubling ex-planation for the pattern seen at points 8 and 9, related tobetween-passage variability. As discussed in Wayman et al.(2007), determination of passage difficulty is an important,but complicated, task for progress monitoring. The task isimportant because passages must be equivalent if we are toattribute growth over time to change in student performanceas opposed to passage variability. The task is complicated be-cause it is difficult to predict what may make a passage easy


or difficult. The most common technique for determiningpassage equivalence, use of readability formulas, is not reli-able (see Ardoin, Suldo, Witt, Aldrich, & McDonald, 2005;Compton, Appleton, & Hosp, 2004). Especially troublingwith respect to our data is that whatever affected the scoresin week 8 on maze selection did not affect the scores on read-ing aloud. Thus, it would not be merely the characteristics ofthe passage itself, but the characteristic of the passage as amaze passage that produced the variability for this particularsample.3

We suggest further examination of the effects of passagevariability on growth and emphasize the need for careful andsystematic approaches to developing equivalent passages forCBM progress monitoring, especially when that progressmonitoring is to be used as a part of a high-stakes decision-making process. Perhaps the best guarantee of passage equiv-alence is to administer the passages to a group of studentsand examine whether the group as a whole obtains higher orlower scores on particular passages or to use the same passagefor repeated testing (see Griffiths, VanDerHeyden, Skokut, &Lilles, 2009, for a discussion of these approaches). If usingparallel forms, it would be wise to counterbalance the orderin which the passages are administered, especially if the goalis to establish normative growth rates on the measures.


We examined the technical characteristics of two CBM read-ing measures as indicators of performance for middle-schoolstudents. We also conducted an exploratory study to examinethe characteristics of the measures as progress measures. Ourgoal was to develop measures that could be used to moni-tor the progress of students with reading difficulties; how-ever, to examine technical characteristics of the measures,we needed to include students across a range of performancelevels. Our results supported the use of both reading aloudand maze selection as indicators of performance on a statestandards test representing survival levels of reading perfor-mance. Reliability and validity were good for both measures,and within the range of levels found in previous research atthe elementary-school level. Further, few differences werefound related to scoring procedure or time frame, althoughreliability did increase somewhat for maze with an increasein time frame. Given the results of the performance study,and taking into account practical considerations, we wouldrecommend use of WRC on a 1-minute reading-aloud taskor CMC on a 3-minute maze-selection task as indicators ofperformance.

Results of the exploratory progress study implied that itis important to consider technical adequacy of the measuresas both performance and progress measures. In our study,only maze selection revealed growth over time, reading alouddid not. Further, growth on the maze-selection measure wasrelated to performance on the MBST.

The studies represent only the first step in the developmentof CBM for monitoring the progress of students with readingdifficulties at the secondary-school level. First, our results ap-

ply only to the sample used in the study, and replication withother samples is necessary. Second, our research addressedstudents at the middle-school level. There is a need for re-search at the high school level. We do not know whetherour results would generalize to older students. Third, ourprogress study was a pilot study. Results must be replicatedwith a larger, representative sample. Fourth, once reliable andvalid measures are developed, it will be important to examinewhether teacher use of the measures leads to improvement forstruggling readers. That, of course, is the ultimate goal of theresearch program and one which will need to be examineddirectly at both the middle- and high school level becauseone cannot assume that the positive results for use of themeasures at the elementary-school level (see Stecker, Fuchs,& Fuchs, 2005, for a review) will necessarily replicate at thesecondary-school level. Finally, although our results supportthe use of the CBM reading measures as indicators of per-formance in reading, we would support the use of multiplemeasures for determination of students’ need for additionalintensive reading instruction.


The research reported here was funded by the Office ofSpecial Education Programs, U.S. Department of Education,Field Initiated Research Projects, CFDA 84.324C. We thankMary Pickart for her contributions to this research and Stan-ley Deno for his insights. We also thank the NetherlandsInstitute for Advanced Study in the Humanities and SocialSciences for its support in the preparation of this manuscript.


1. The MBST in reading is being replaced by the MinnesotaComprehensive Assessment, a more broad-based readingtest that is given annually in 3rd through 8th grade andagain in 10th grade.

2. We would like to note that there was an error in the Tichaet al. (2009) article. In the methods section, the mazeselection is said to be given first, and the reading aloudsecond. Later in the discussion section, the reading aloudis said to be given first and the maze selection second. Infact, the reading-aloud measures were given first, and themaze-selection measures second.

3. We hypothesize that the difficulty of the passage in week8 had to do with a fairly infrequent word appearing in thevery first maze selection item that created difficulties forall students. In the reading-aloud measure, this word wassupplied after 3 seconds and, thus, may have had less ofan effect on the overall score of the students.


Ardoin, S. P., Suldo, S. M., Witt, J., Aldrich, S., & McDonald, E. (2005).Accuracy of readability estimates’ predictions of CBM performance.School Psychology Review, 20, 1–22.

Brown-Chidsey, R., Davis, L., & Maya, C. (2003). Sources of variancein curriculum-based measures of silent reading. Psychology in theSchools, 40, 363–377.


Center on Education Policy. (2008). State high school exit exams: A move to-ward end-of-course exams. Washington, D.C.: U.S. Government Print-ing Office. Retrieved on March 25, 2009, from

Compton, D. L., Appleton, A., & Hosp, M. K. (2004). Exploring the relation-ship between text-leveling systems and reading accuracy and fluencyin second grade students who are average and poor decoders. LearningDisabilities Research & Practice, 19, 176–184.

Crawford, L., Tindal, G., & Stieber, S. (2001). Using oral reading rate to pre-dict student performance on statewide achievement tests. EducationalAssessment, 7, 303–323.

Deno, S. L. (1985). Curriculum-based measurement: The emerging alterna-tive. Exceptional Children, 52, 219–232.

Deshler, D. D., Schumaker, J. B., Alley, G. B., Warner, M. M., & Clark, F. L.(1982). Learning disabilities in adolescent and young adult populations:Research implications. Focus on Exceptional Children, 15(1), 1–12.

Espin, C. A., Deno, S. L., Maruyama, G., & Cohen, C. (1989). The BasicAcademic Skills Samples (BASS): An instrument for the screening andidentification of children at risk for failure in regular education class-rooms. Paper presented at the National Convention of the AmericanEducational Research Association, March.

Espin, C. A., & Deno, S. L. (1993a). Performance in reading from con-tent area text as an indicator of achievement. Remedial and SpecialEducation, 14, 47–59.

Espin, C. A., & Deno, S. L. (1993b). Content-specific and general readingdisabilities of secondary-level students: Identification and educationalrelevance. The Journal of Special Education, 27, 321–337.

Espin, C. A., & Deno, S. L. (1994–95). Curriculum-based measures forsecondary students: Utility and task specificity of text-based readingand vocabulary measures for predicting performance on content-areatasks. Diagnostique, 20, 121–142.

Espin, C. A., & Foegen, A. (1996). Validity of general outcome measuresfor predicting secondary students’ performance on content-area tasks.Exceptional Children, 62, 497–514.

Espin, C. A., Wallace, T., Campbell, H., Lembke, E. S., Long, J. D., & Ticha,R. (2008). Curriculum-based measurement in writing: Predicting thesuccess of high-school students on state standards tests. ExceptionalChildren, 74, 174–193.

Fewster, A., & MacMillan, P. D. (2002). School-based evidence for the valid-ity of curriculum-based measurement of reading and writing. Remedialand Special Education, 23, 149–156.

Fitzmaurice, G. M., Laird, N. M., & Ware, J. H. (2004). Applied longitudinalanalysis. New York: Wiley.

Fuchs, D., Fuchs, L. S., Mathes, P. G., & Lipsey, M. W. (2000). Readingdifferences between low-achieving students with and without learningdisabilities: A meta-analysis. In R. Gersten, E. Schiller, & S. Vaughn(Eds.), Research syntheses in special education (pp. 81–104). Mahwah,NJ: Erlbaum.

Fuchs, L. S., Fuchs, D., Hamlett, C. L., & Ferguson, C. (1992). Effectsof expert system consultation within curriculum-based measurement,using a reading maze task. Exceptional Children, 58, 436–450.

Fuchs, L. S., Fuchs, D., & Maxwell, L. (1988). The validity of informalreading measures. Remedial and Special Education, 9, 20–28.

Griffiths, A. J., VanDerHeydeyn, A. M., Skokut, M., & Lilles, E. (2009).Progress monitoring in oral reading fluency within the context of RTI.School Psychology Quarterly, 24, 13–23.

Hintze, J. M., & Silberglitt, B. (2005). A longitudinal examination of thediagnostic accuracy and predictive validity of R-CBM and high-stakestesting. School Psychology Review, 34, 372–386.

Jenkins, J. R., & Jewell, M. (1993). Examining the validity of two measuresfor formative teaching: Reading aloud and maze. Exceptional Children,59, 429–432.

Kenward, M. G., & Roger, J. H. (1997). Small sample inference for fixedeffects from restricted maximum likelihood. Biometrics, 53, 983–997.

Kincaid, J. P., Fishburne, R. P., Rogers, R. L., & Chissom, B. S. (1975).Derivation of new readability formulas (Automated Readability In-dex, Fog Count, and Flesch Reading Ease Formula) for Navy enlistedpersonnel (Research Branch report 8–75). Memphis, TN: Naval AirStation.

Lee, J., Grigg, W., & Donahue, P. (2007). The Nation’s Report Card:Reading 2007 (NCES 2007–496). National Center for EducationStatistics, Institute of Education Sciences, U.S. Department ofEducation, Washington, DC. Retrieved January 16, 2008, from:

Levin, E. K., Zigmond, N., & Birch, J. W. (1985). A followup study of52 learning disabled adolescents. Journal of Learning Disabilities, 18,2–7.

MacMillan, P. (2000). Simultaneous measurement of reading growth, gen-der, and relative-age effects: Many-faceted Rasch applied to CBMreading scores. Journal of Applied Measurement, 1, 393–408.

Marston, D. (1989). A curriculum-based measurement approach to assessingacademic performance: What it is and why do it. In M. Shinn (Ed.),Curriculum-based measurement: Assessing special children (pp. 18–78). New York: Guilford.

McGlinchey, M. T., & Hixson, M. D. (2004). Using curriculum-basedmeasurement to predict performance on state assessments in reading.School Psychology Review, 33, 193–203.

Minnesota Department of Education. (2001). Minnesota Basic SkillsTest Technical Manual. Accountability_Programs/Assessment_and_Testing/Assessments/BST/BST_Technical_Reports/index.html

Muyskens, P., & Marston, D. (2006). The relationship between Curriculum-Based Measurement and outcomes on high-stakes tests with secondarystudents. Minneapolis Public Schools. Unpublished manuscript.

O’Connor, R. E., Fulmer, D., Harty, K. R., & Bell, K. M. (2005). Layers ofreading intervention in kindergarten through third grade: Changes inteaching and student outcomes. Journal of Learning Disabilities, 38,440–445.

O’Connor, R. E., Harty, K. R., & Fulmer, D. (2005). Tiers of interventionin kindergarten through third grade. Journal of Learning Disabilities,38, 532–538.

Rasinski, T. V., Padak, N. D., McKeon, C. A., Wilfong, L. G., Friedauer,J. A., & Heim, P. (2005). Is reading fluency a key for successful highschool reading? Journal of Adolescent and Adult Literacy, 48, 22–27.

Ruppert, D., Wand, M. P., & Carroll, R. J. (2003). Semiparametric regression.New York: Cambridge University Press.

Shin, J., Espin, C. A., Deno, S. L., & McConnell, S. (2004). Use of hierarchi-cal linear modeling and curriculum-based measurement for assessingacademic growth and instructional factors for students with learningdifficulties. Asia Pacific Education Review, 5, 136–148.

Silberglitt, B., & Hintze, J. (2005). Formative assessment using CBM-Rcut scores to track progress toward success on state-mandated achieve-ment tests: A comparison of methods. Journal of PsychoeducationalAssessment, 23, 304–325.

Stage, S. A., & Jacobsen, M. A. (2001). Predicting student success on a state-mandated performance-based assessment using oral reading fluency.School Psychology Review, 30, 407–419.

Stecker, P. M., Fuchs, L. S., & Fuchs, D. (2005). Using curriculum-basedmeasurement to improve student achievement: Review of research.Psychology in the Schools, 42, 795–819.

Ticha, R., Espin, C. A., & Wayman, M. M. (2009). Reading progress moni-toring for secondary-school students: Reliability, validity, and sensitiv-ity to growth of reading aloud and maze selection measures. LearningDisabilities Research & Practice, 24, 132–142.

Torgesen, J. K. (2000). Individual differences in response to early interven-tions in reading: The lingering problem of treatment resisters. LearningDisabilities Research & Practice, 15, 55–64.

Touchstone Applied Science and Associates. (2006). Degrees of readingpower. Brewster, NY: Author.

Vaughn, S., Linan-Thompson, S., & Hickman, P. (2003). Response to in-struction as a means of identifying students with reading/learning dis-abilities. Exceptional Children, 69, 391–409.

Vellutino, F. R., Fletcher, J. M., Snowling, M. J., & Scanlon, D. (2004).Specific reading disability (dyslexia): What have we learned in the pastfour decades? Journal of Child Psychology and Psychiatry, 45, 2–40.

Vellutino, F. R., Scanlon, D. M., & Tanzman, M. S. (1994). Components ofreading ability: Issues and problems in operationalizing word identifi-cation, phonological coding, and orthographic coding. In G. R. Lyon(Ed.), Frames of references for the assessment of learning disabilities:new views on measurement issues (pp. 279–332), Baltimore: BrookesPublishing.

Vellutino, F. R., Tunmer, W. E., Jaccard, J. J., & Chen, R. (2007). Componentsof reading ability: Multivariate evidence for a convergent skills modelof reading development. Scientific Studies of Reading, 11, 3–32.

Warner, M. M., Schumaker, J. B., Alley, G. R., & Deshler, D. D. (1980).Learning disabled adolescents in the public schools: Are they differentfrom other low achievers? Exceptional Education Quarterly, 1(2), 27–36.


Wayman, M. M., Ticha, R., Wallace, T., Espin, C. A., Wiley, H. I., Du, X., &Long, J. (2009). Comparison of different scoring procedures for CBMmaze selection measures. (Technical Report No. 10). Minneapolis, MN:University of Minnesota, Research Institute on Progress Monitoring.

Wayman, M. M., Wallace, T., Wiley, H. I., Ticha, R., & Espin, C. A.(2007). Literature synthesis on curriculum-based measurement in read-ing. Journal of Special Education, 41, 85–120.

Wiley, H. I., & Deno, S. L. (2005). Oral reading and maze measures as pre-dictors of success for English learners on a state standards assessment.Remedial and Special Education, 26, 207–214.

Yovanoff, P., Duesbery, L., Alonzo, J., & Tindal, G. (2005). Grade-levelinvariance of a theoretical causal structure predicting reading compre-hension with vocabulary and oral reading fluency. Educational Mea-surement: Issues and Practice, 24, 4–12.

About the Authors

Christine Espin is a professor in Education and Child Studies at Leiden University, Leiden, the Netherlands. She is alsoan adjunct professor in Cognitive Sciences at the University of Minnesota. Her research interests focus on the developmentof curriculum-based measurement (CBM) procedures in reading, written expression, and content-area learning for secondarystudents with learning disabilities and teachers’ use of CBM data.

Teri Wallace is an associate professor of Special Education at Minnesota State University in Mankato. Her research focuses onthe development of general outcome measures for students with significant cognitive disabilities, implementation of responseto intervention and utilization of data in decision making at the student, classroom, school, and district level.

Heather Campbell is an assistant professor of Education at St. Olaf College in Northfield, Minnesota. She works witheducational opportunity programs at St. Olaf, and her research interests include the development of CBM procedures in writtenexpression for English language learners.

Erica Lembke is an associate professor in the Department of Special Education at the University of Missouri. Her researchinterests focus on the development of CBM procedures in reading, written expression, and mathematics for students in earlyelementary grades as well as implementation of response to intervention in classrooms.

Jeffrey D. Long is an associate professor of Educational Statistics in the Quantitative Methods in Education program in theDepartment of Educational Psychology, University of Minnesota. His interest is longitudinal data analysis.

Copyright of Learning Disabilities Research & Practice (Blackwell Publishing Limited) is the property of

Wiley-Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without

the copyright holder's express written permission. However, users may print, download, or email articles for

individual use.

Calculate the price of your order

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
The price is based on these factors:
Academic level
Number of pages
Basic features
  • Free title page and bibliography
  • Unlimited revisions
  • Plagiarism-free guarantee
  • Money-back guarantee
  • 24/7 support
On-demand options
  • Writer’s samples
  • Part-by-part delivery
  • Overnight delivery
  • Copies of used sources
  • Expert Proofreading
Paper format
  • 275 words per page
  • 12 pt Arial/Times New Roman
  • Double line spacing
  • Any citation style (APA, MLA, Chicago/Turabian, Harvard)

Our guarantees

Delivering a high-quality product at a reasonable price is not enough anymore.
That’s why we have developed 5 beneficial guarantees that will make your experience with our service enjoyable, easy, and safe.

Money-back guarantee

You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.

Read more

Zero-plagiarism guarantee

Each paper is composed from scratch, according to your instructions. It is then checked by our plagiarism-detection software. There is no gap where plagiarism could squeeze in.

Read more

Free-revision policy

Thanks to our free revisions, there is no way for you to be unsatisfied. We will work on your paper until you are completely happy with the result.

Read more

Privacy policy

Your email is safe, as we store it according to international data protection rules. Your bank details are secure, as we use only reliable payment systems.

Read more

Fair-cooperation guarantee

By sending us your money, you buy the service we provide. Check out our terms and conditions if you prefer business talks to be laid out in official language.

Read more
Open chat
Hello. Can we help you?