Skip to main content
elementary teacher talking with students during lesson
Timothy Shanahan
Shanahan on Literacy
Timothy Shanahan

What About the Textbook Reviews?

Are third-party evaluations of commercial reading programs aligned with research? Shanahan identifies six problems with these reviews and suggests solutions to each.

Parent question

Our school district is using a program that has received many bad reviews, including by EdReports. We raised that with our School Superintendent, and she indicated that EdReports is revamping its review process so their evidence doesn’t mean anything. What do you think?


Teacher question

EdReports and Knowledge Matters Campaign and others are requiring that high-quality texts build background knowledge — a good thing. However, they are expecting it to be through a topical approach not a broader thematic approach. One curriculum that is touted as strong in this area addresses one topic for 18 weeks!  So the question I am asking is: what is the difference between a topical approach or a thematic approach and which is preferred?

Literacy Experts Say Some EdReports Ratings Are Misleading (opens in a new window)


Shanahan’s response

Over the past few weeks, I’ve been inundated by emails and phone calls about EdReports and a couple of other textbook review protocols (those issued by Knowledge Matters Campaign and Reading League).

I usually stay away from this kind of topic since I help design commercial programs and try to avoid conflicts of interest. At this point, the problems with these reviews have gotten so broad and so general, that I can discuss them without any danger of conflict.  

I’ve noticed six problems with these reviews and have suggested solutions to each.

1. Educators are placing way too much trust in these reviews.

These review processes have been undertaken by groups who want to make sure that curriculum materials are up to snuff. But what that means differs from review process to review process. Each review organization has different beliefs, goals, and methodologies. Accordingly, reliance upon any one of these reviews may be misleading.

The federal government requires non-profits to file 990 forms. These forms require a declaration of purposes and description of their activities. For instance, the Reading League says that it encourages “evidence-aligned instruction to improve literacy outcomes” and EdReports aims to provide “evidence-based reviews of instructional material.”

The Knowledge Matters Campaign is a bit different. They are not a free-standing entity but part of another nonprofit, “Standards Work”. In their 990, Knowledge Matters Campaign is described as an “advocacy effort” that showcases “high-quality, knowledge-building curriculum.” There is nothing wrong with advocating for favorite commercial programs. That just isn’t the best basis for providing objective reviews or sound review tools. They can provide such reviews, but I think their documents should transparently state those prior commitments. One can only wonder about a review process that starts with the programs they like, and then subsequently formulates review criteria based on that.

The Reading League and EdReports both explicitly claim to support “evidence-aligned” curricula. This is a bit of shift for EdReports since originally its goal was to ensure alignment with the Common Core State Standards. Knowledge Matters Campaign does not seem to make that “evidence-based” assertion, though their review protocol mimics the others in how it uses research citations.

The point here is that these kinds of review often have other motives other than those of the educators who use them. Unless you can be certain of their motives — both in terms of declared purposes and in their alignment with those claims — buyer beware! It’s one thing to try to ensure that school practices are in accord with what we know, it is quite another to establish review criteria based on other considerations, no matter how well-meaning those consideration may be.

All these reviews are spotty at best when it comes to this alignment so I would discourage curriculum selection processes that depend entirely or mainly on any of these reviews. I wouldn’t ignore them entirely; I would just independently verify each demerit they assign — including determining whether that criterion even matters.

A cool thing that that Reading League does is that is shares the publisher’s responses to their “red flag” warnings. That kind of transparency is good because it should help school districts to consider both sides of an issue. For instance, in a program I’m involved in, Reading League flagged a particular practice their reviewers didn’t like. The fact that this device was in three lessons out of about 900 in the program or that we could provide substantial research support for the specific practice made no difference to them. In such instances, having both the review claim and the publisher response should help districts to examine such issues and to decide for themselves if that is really a problem or a big enough problem to matter.

That’s how it should work. When districts surrender all judgment to these organizations — refusing to consider or purchase any program that gets dinged no matter the evidence — then the game is lost. Instead of getting the best programs available, schools will end up with the programs that best meet some groups’ ideological positions.

2. What constitutes evidence?

Despite the rhetoric of these groups, the term “evidence aligned” is meaningless. Often there is no direct evidence that what is being required has ever benefited children’s learning in a research study.

By contrast, the National Reading Panel and the What Works Clearinghouse have required — before they will say anything works — that studies have directly evaluated the effectiveness of that something and found it to be advantageous to learners.

The review documents cite research studies for each criterion. This may seem convincing to some district administrators. However, if you look more closely, you’ll find that the evidence is woefully uneven. In some cases, the research is substantial, in others there is no direct evidence — only poorly controlled correlational studies or evidence that a particular topic is important, but with no proof about how pedagogy might best address that important issue.

I wish they would all take the approach that the What Works Clearinghouse (WWC) takes. The Clearinghouse allows their guest experts to make any claims they want to, but then it checks out the nature and quality of the evidence supporting those claims. Their reports tell you what the experts say, but then report how strong that case is in terms of actual research support.

That kind of reporting would allow districts to know that the phonics requirements for Grades K-2 were supported by substantial research, but that the phonics claims for the upper grades are proffered with little evidence.

What Works would allow the encouragement of decodable texts and of favored approaches to teaching background knowledge. But they would require an admission that those criteria are not really evidence aligned.

Sadly, too many district administrators assume that the opposite must be true. They are wrong. These reviews adopt review criteria and then seek any kind of evidence to support those choices — no matter how unconvincing and uneven that evidence may be.

3. Grain-size problems

Not all the criteria included in these reviews appear to be of equal importance.  

For example, the Reading League requires that reading programs require both explicit teaching of phonics and handwriting. A program that lacks either can be smacked for the omission, and some districts, by policy, will then prohibit the consideration of such programs.

Don’t get me wrong. It makes sense for schools to explicitly teach both phonics and handwriting. Core reading programs should include phonics, given that such instruction contributes fundamentally to early reading development, and it seems prudent to align the decoding lessons with the rest of the program (though, admittedly, there is no direct research evidence supporting that concern).

The benefits of teaching handwriting, however, do not accrue directly to reading, and I am aware of no data that shows either necessity or benefit of aligning such instruction with the rest of the reading lessons. It would not be perverse for a district to purchase separate reading and handwriting programs.

To me these criteria are not equivalent. A low mark in one should be a real concern. A low mark in the other may be informative but it should not be determinative. There are many such false equivalences throughout these evaluation schemes: some criteria flag essentials and some could safely be ignored.

4. Measurement problems

Even when a criterion is spot on one might wonder about how to determine if that criterion has been sufficiently addressed. Knowledge Matters Campaign encourages the teaching of comprehension strategies — a reasonable thing to do given the extensive research supporting their benefits — and, yet how much strategy teaching will be sufficient to meet the standard? It is easy to see how two well-meaning and careful reviewers could disagree about an issue like that.

The teacher letter included above points out a reading program that devotes 18 weeks to one content topic. Such a program would surely meet the Knowledge Matters Campaign criteria, though to me that sounds like overkill — certainly not something that possesses research support. If I reviewed it, I’d be critical that the program is too narrow in focus while other reviewers might conclude that it addresses the knowledge building criteria appropriately.

The more specific the review criteria are, the more reliable should be the reviews. However, the more specific they are, the harder it is to justify them given the nature of research. For me, I’d prefer that everyone has all the information available:

We reviewed this program and judged that it met our knowledge building criteria. That’s a plus. However, it is so narrowly focused that we wondered if that is best (and we know of no direct research evidence on this matter). Students taught from this program are likely to know more about electricity than any group of fourth graders in the history of mankind. If they are ever again asked to read about electricity, they will likely achieve the highest reading comprehension ever recorded — if they do not run screaming from the testing room.

There is a body of research on sentries and their ability to protect military installations. These studies find that the more specific and exacting the criteria for entering a camp (e.g., how exactly someone needs to know the password), the more likely that friendly troops will be shot. The more liberal those entry procedures, the more likely an enemy will gain entrance. The Knowledge Matters Campaign criteria look to me to be the most general and the Reading League ones seem most specific. That probably means that if you go with one, you will be more likely to reject sound programs and with the other, weaker programs may sneak through.

My preference would be for districts to appreciate the limitations of these reviews. That doesn’t mean ignoring their information but considering their claims with the same gimlet eye that should be use with any of the claims made for the commercial programs. Do they just say there is a problem, or do they specifically document their concerns, perhaps like this:

We did not think programs should include lessons that encouraged this-or-that kind of an activity. We reviewed six grade levels of this program and found it included 420 fluency lessons. It earned a demerit because twice in the second-grade program it encouraged the this-or-that activity.

That way, a district could decide whether such an inclusion mattered much to them, either in terms of how serious the infraction or its extent. It would also be good if the producer of that curriculum weighed in to either admit they screwed up or to defend their approach. By reporting not just that there was an infraction to the review criteria, but the extent of the problem, districts would better be able to use the reviews appropriately.

5. Effectiveness versus potential effectiveness

Product reviews don’t tell us what works in terms of improving reading achievement. No, they only reveal the degree to which the program designs are consistent with research, standards, or someone’s ideology.

The National Reading Panel reported that fluency instruction in grades 1-4 and with struggling readers 1-9 improved reading achievement. These program reviews all require that programs address fluency, and in some cases they even specify some preferred details about that instruction.

The idea is that a program that includes fluency teaching like the fluency teaching delivered in the studies is going to be advantageous. That is a hope and not a fact, because most core programs have no data showing that their fluency lessons boost reading achievement.

We are aiming for a possibility. The idea is that when research proves that an approach can be effective, we should encourage schools to replicate such instruction in the hopes that they will obtain the same results.

This is nothing like the standards that we have for medical and pharmacological research. They must show that their version of a treatment works; it is not enough to show that they are trying to do something like what has worked elsewhere.

This is an important distinction.

The Bookworms program apparently received low reviews from EdReports, despite having rigorous, refereed research studies showing its actual effectiveness — not that it was designed to look like what was done in the studies, but that its design really did pay off in more student learning.

I’m flabbergasted that EdReports (and the other reviews) don’t leave themselves an out here: If there is sound, high-quality research showing the effectiveness of a specific program, who cares whether it matches your predictive review criteria? Program A looks like the research, Program B doesn’t look as much like the research, but it is very effective in teaching children.

The review agencies should have a provision saying that they will give an automatic pass to any program with solid direct research support concerning its actual effectiveness.

I would still review their program. However, my purpose here would be to try to figure out how a program that failed to meet my criteria did so well. Perhaps the reviews were sloppy, which might require more rigorous training of reviewers. Another possibility is that the criteria themselves may be the problem. Maybe some of the “non-negotiables” should be a lot more negotiable after all.

6. Usability

I’m a bit thrown by the usability requirements in some of these reviews. I agree with them in one sense. If teachers struggle to use a program then it’s unlikely to be effective no matter what non-negotiables it addresses.

However, I know of no research that can be used as the basis of evaluating usability, so what constitutes it is more of an act of reason than of evidence alignment. Knowledge Matters Campaign wants programs to include not just what it is that teachers are supposed to do, but explanations for why those things should be done. I love that, but have no idea whether that would improve practice.

I think the reason for this emphasis on usability may come from the fidelity evaluations that are now common in instructional research studies. Researchers, to ensure that it is their instruction that is making the difference, do many things to try to guarantee fidelity to their plan. This includes teaching the lessons themselves, using video teachers, scripting the lessons, and observing their delivery, and so on. That kind of thing makes great sense in a 12-week research study which didn’t include any second language students or kids below the 40th percentile and was only taught to children whose parents granted approval.

It is a lot harder to argue for especially-narrow prescriptive, non-adjustable approaches — lessons aimed at making certain the teachers don’t screw it up by varying from what’s in the teacher’s guide — in real classrooms. The idea of teaching everyone the same lesson, no matter what they already know, may make sense to some “reading advocates.” Nevertheless, it is a lousy idea for kids and reading achievement.

Many districts, in their selection procedures, require tryouts of new programs — or at least they used to. Some of their teachers try to deliver the lessons for several weeks to see how workable the product may be. This makes a lot more sense to me than the armchair usability criteria in these reviews. Again, districts make a big mistake in ceding their responsibility to these reviews. Some things should be done carefully in-house.


What are the big take-aways here?

  1. The development of commercial programs for teaching reading are serious endeavors that can provide valuable supports to teachers. However, such program designs are fallible. They require the contributions of dozens, perhaps hundreds, of people whose knowledge and efforts can range greatly. It is sensible for districts to purchase such programs, and essential that they take great care in this to try to end up with supports that really help teachers and students succeed.
  2. There are benefits to having third-party reviews of these kinds of programs, both by government agencies (e.g., What Works Clearinghouse) and by nonprofits that are not commercially entangled with the corporations that sell these programs. These external reviews can warn consumers (the school districts) of egregious problems, and they can push publishers to do better.
  3. These kinds of reviews are likely to be most useful when they depend substantially on high-quality research — approving programs that have been proven to provide learning advantages to students and encouraging the close alignment of programs with existing research data.
  4. Just as school districts need to be skeptical of undocumented claims of commercial companies (the folks who sell the programs), they must be just as skeptical of the claims of those who critique them. The more transparent, specific, and well-documented these critiques the better. Districts should be wary of simply accepting any negative judgments by reviewers — requiring evidence that the criteria are truly essential to quality and that research really rejects what a program is doing.
  5. Districts should adopt sound procedures for choosing programs. These procedures should include consideration of these reviews. However, no district should adopt policies that automatically accept or reject programs based on these reviews.

Here are links to each of these product reviewing organizations:


Selected comments

Comment from Matt

Tim, what do you think about the IES review process (opens in a new window)?

Response from Shanahan

Matt,

As for those IES criteria, unlike the ones I wrote about, I agree with all of the criteria that they include (they have been careful to stay to the research, rather than making up stuff just because they like it). However, I think it is incomplete, particularly when one considers the role of cognitive strategies and language in comprehension development, nor do they connect the reading and writing at any level (spelling and phonics, writing about text and comprehension). If I were director of reading for a school district (again), and wanted to create text selection criteria, I would start with these and then look at the others to see if something defensible was in them that was not here.

Comment from Donald

Would you also agree with the following statement? “It would not be perverse for district to purchase separate reading and spelling programs.” Quite frankly, I think dedicated spelling programs such Zaner-Bloser Spelling-Connections (which I have taught with outstanding success) and other like it, are quite superior to the embedded spelling programs I have seen in recent reading adoptions.

Response from Shanahan

Donald,

That isn’t unreasonable and as with my handwriting example, it would be foolish for a district going that way to reject a reading program because one of these reviews marked down its spelling program or its lack of a spelling program (why would it matter if you’re not going to use that anyway?).

Comment from Christy

I was hoping you would address the concern about topical vs. thematic units. Our district is currently reviewing ELA textbooks for 6-12 and most of the units are thematic, not topical. I don’t think this will result in the same vocabulary and background knowledge building as a topical unit based on my understanding of the science of reading, but I am still learning, so I’m not sure.

Response from Shanahan

Christy,

There is no research showing that either of these necessarily does any better than the other. These days there are folks promoting reading programs with a content emphasis (reading about photosynthesis, planets, Civil Rights movement, etc.). There is no evidence that these programs improve students’ reading ability any more than traditional programs have — though they very well might increase student knowledge of some of those topics. Given that, either approach might make some sense.

My take is that the books you are considering are aimed (or should be aimed) at teaching literature. With literature, it is very reasonable that textbooks might be focused on literary themes. Indeed, a high school unit on some topic drawn from science or history may lead to exposure of very different language than one usually gains from literature — but that is a plus and it is why it’s essential that English classes put great emphasis on literature. The “content” of literature includes things like human development, relationships, and emotions — love, loyalty, trust, embarrassment, greed, conflict are more likely to come up in a literature class than a science class.

There tend to be two different takes on theme — one that treats it like topical subject matter (the one-word list above is an example of that), and the other (the preferred approach) expresses themes more as a stance that someone takes on a topic. A unit on trust for example might expose students to several stories, poems, or essays that get into themes like: it takes time to gain trust, it is difficult to trust adults during one’s loss of innocence, it is possible to regain trust.

There is a content to literature, but as you can see it is very different from the kinds of information that secondary students should be reading about in their other classes.

Another way that these programs may organize instruction is by genre (reading different genres of literacy fiction — romance, adventure, etc., poetry, literary nonfiction) or by literary elements (themes, characterization, plot, meter, etc.). That is a reasonable choice also.

I think you need to think hard about what it is that you are trying to teach students in an English class (what they should know and be able to do by the end of the year or years) and choose accordingly. You can’t depend on the science of reading to sort this out.

Comment from Sabrina

I was hoping the same as Christy. We are currently going through a K-5 adoption. Following our screening process, we have two finalists and they differ in this way exactly. One claims to be knowledge building, and it may build a bit of knowledge about some things, but is broad and thematic, and heavily skills/strategies focused, much like what we are currently using. The other definitely takes a more topic-based approach where students spend longer on that topic and learn more specifics, with skills/strategies embedded and taught to K-2 somewhat in a separate capacity as well. We’re not finding clear research that helps us with this decision. Both programs did just fine on EdReports. One was not included in the first round of The Reading League reports, while the other received a good evaluation. Based on the work the adoption steering committee has done (I’m mostly referring to a book study of The Knowledge Gap), we feel like the more obvious knowledge-building program is where we should be going, rather than a program that looks an awful like what we already use that isn’t serving our students very well. But we also wonder if we’re just drinking the Kool-Aid, so to speak, and buying into an idea that may or may not be as sound as we think.

Response from Shanahan

Sabrina,

The reading instruction provided in Grades K-5 needs to be a bit broader than what secondary English teachers are responsible for — in the elementary grades we kind of do everything (we aren’t as specialized as the middle and high school teachers).

The Knowledge Gap is a very interesting and well-argued book, but it is certainly not the science of reading. Natalie Wexler isn’t a scientist and she uses research to support her ideas rather than drawing her ideas from the research (that means she can ignore studies that don’t advance her point of view, etc.).

At this time, it is clear from the research that if content oriented text is to contribute to more powerful reading ability that will require much more than just including informational text. Those texts or units need to be organized and taught in ways that allows that content to be more than inert information — it has to be generalizable to other content. For the time being, there is no reason to believe that there would be any difference to the learning outcomes to be derived from those two types of programs. Make your decision more on how the programs may differ in decoding, fluency, reading comprehension strategies, written language (vocabulary, morphology, syntax, cohesion, text structure), writing and the quality of the literature and content included — including its appeal to the teachers and students.

Comment from Mark

Don’t you think that the Knowledge Matters Campaign should also disclose that their parent company, Standards Works, received money from ARC, Core Knowledge, and Open Up Resources for undisclosed reasons? The programs that these companies/organizations produce are all reviewed favorably on the Knowledge Matters website.

This information is freely available online.

Response from Shanahan

Mark,

I wasn’t aware of that conflict of interest. At one time I was on the Advisory Board of the Knowledge Matters Campaign. I was happy to help folks who were encouraging that we teach kids more content. However, recently they decided to change direction a bit and it appeared to me that they were out to get certain commercial programs so I dropped off. Conflicts of interest are a serious problem and at least some of the made up stuff in their criteria may have landed there if they were trying to float their own boat.

Of course, I could mention something similar with Reading League. Their standards promote the use of decodables to an extent that far exceeds the conclusions of any scientist who has written about this or the results of any of the published studies. Maybe it is not surprising that they are marketing decodables and draw income from that.

Whether or not there is any real conflict in either of these cases (perhaps the individuals who worked on these criteria were not aware of where the money comes from), the appearance of it should caution school districts about their use. I’m aware of no such conflicts with EdReports. I just hope they improve their process without knuckling under to the pressure groups that have an axe to grind.

See all comments here (opens in a new window)

About the Author

Literacy expert Timothy Shanahan shares best practices for teaching reading and writing. Dr. Shanahan is an internationally recognized professor of urban education and reading researcher who has extensive experience with children in inner-city schools and children with special needs. All posts are reprinted with permission from Shanahan on Literacy (opens in a new window).

Publication Date
May 29, 2024
Top