What is the Evidence?

Deborah Meier, in collaboration with her faculty at Central Park East Secondary School, developed five habits of mind that were at the heart of their school. One of those habits of mind was to ask “What is the evidence?”

I was rereading an article on Direct Instruction(1) that I have my teaching credential students read. The article ends with the claim that Direct Instruction, unlike discovery approaches to learning, has research evidence demonstrating its effectiveness. However, as educational reformer Deborah Meier keeps reminding us about such claims, we have to always ask what counts as evidence? How is achievement defined? Effective at what?

In educational research test score results almost always constitute the evidence, and more and more particularly, the scores on the standardized test mandated by each state to meet the rules of the No Child Left Behind legislation.

However, we must look at all the assumptions that are built into using such test scores as evidence of learning. The assumption that test scores are meaningful and accurate has been one that is questioned by many educational experts (see, for example, Alfie Kohn’s The Case against Standardized Testing(2), or the FairTest website for more in depth information on this topic).

CausationOne assumption is that such tests actually test what they claim to test. If what we really want to know is how people can use a skill in an authentic situation, how close to that performance are their results on a multiple choice paper and pencil test? Can you imagine if we only used the written test to decide whether someone could drive? When researchers have looked how people do at using math algorithms in school, and then how they try to solve real problems that require the same math in their daily lives, they see little connection between to the two.

Even in something that seems as basic as reading, where one does read in the test and then answer questions about it, researchers have found that often the reason students get the answer right or wrong has as much to do with their prior knowledge and cultural assumptions about the content as it does about being able to read the passage(3). And often, in the case of so called reading tests, it is not reading at all that is tested, but what are called reading subskills, which are believed by some to be precursors to skilled reading, such as recognizing certain sound or spelling patterns. However, doing well on such subskills has not been shown to be connected to comprehension of what one reads (see my article on Reading First for more on this(4)). Typical standard reading tests also test other aspects of knowledge of language, such as recognizing synonyms and homonyms. While these and others may be a good terms to understand, does knowing the terms make one a better reader, or just more knowledgeable about linguistics?

The next major assumption I want to challenge is that short term results on such tests predict long term results. This is often not the case. If early learning is speeded up in order to improve short term test results, it can result in leaving students with a shaky foundation, therefore actually leading to poorer long term results. There is a parallel in business. When financial institutions and businesses go for short term profits to please stockholders, it is often at the risk of the long term stability and interest of the company, as we have seen with our recent economic collapse. In math, teaching the rote memorization of algorithms may help students pass the next test, where each problem is presented just as you taught it, but then in the following years, without a foundation in the concepts that underlie those algorithms, such students’ abilities to understand more complex concepts and solve the more complex problems that go with those concepts will not be there, and their scores will collapse like a house of cards. This sort of short-sightedness exists in many areas of the curriculum, especially when there are large pressures to get those short term results.

Another aspect I want to challenge is whether the possible side effects have been looked at. When pharmaceutical companies tests new drugs, they are required to not just look at whether the drug cures the ailment, but also what are the possible side effects on other aspects of health. This never seems to be done in educational research. In the pursuit of raising test scores, might the new methods create other problems? We act as if the child is made up of discrete skills and knowledge, each of which can be taught and measured separately, without an effect on anything else, rather than looking at the child as a whole being. For instance, are we increasing obesity, as schools cut out recess and other activities in which students are more active to spend more time studying the tested subjects?

Even in terms of the activity we are testing, might the way we teach have an effect not just on how well one does it, but whether one wants to do it? Stephen Krashen pointed out in his book on whole language(5) that studies comparing free reading time to direct instruction of reading found the test scores were similar. However, which is more likely to lead to a love of reading—students who get to choose what they read, or those who read decontextualized texts over which they no say, and then get tested regularly on those passages? Yet, this love and desire to read is not assessed.

The last assumption I want to examine is that what we are testing is what matters most. No one questions that students should be able to read, write and do arithmetic. But if you ask parents and teachers what they mean by a well educated person, and what they want their children to get out of school, these generally are not the first ones they mention. How does the students treat others? How motivated are they for further learning? Do they like school? Do they have empathy for others? Are they likely to be civic minded and civically active?

Others questions we might ask are: how persistent is a student in the face of difficult tasks? What is their ability to put together knowledge and abilities from a variety of areas and use them in novel ways? Can they express their ideas effectively? Do they listen to the ideas of others? How and what we teach can and does have effects on these as well. There are many others each of us might think are equally or more important. Yet, these almost never get asked or taken seriously in educational research, particularity not the research that is used for policy. The very question of what is most important to assess is not even asked.

There have been a few exceptions to this trend. In the area of progressive education, for instance, I can name several. In the 1930s, there was the Eight Year Study(6) which matched students who went to high schools implementing progressive methodologies to those in traditional high schools, and then followed them through college. This study looked at a wide variety of definitions of success, finding that those who attended the more progressive schools showed better results.

David Bensman did a study of the progressive Central Park East schools, (a group of public schools in New York City serving predominantly low income African-American and Latino students) that looked not just at the test scores, but looked at college, employment, civic involvement and their impressions of the impact of the school in their lives(7). He also found that these students did much better than their counterparts who went to neighboring schools.

A friend just sent me a recent master’s thesis on the Peninsula School, a progressive independent k-6 school, comparing the graduates in regards to their high school achievement to a random sample of their high school classmates who had gone to other elementary schools—finding the students at the progressive school did better academically. Not only that, but the study also found they had better attitudes toward school and their learning experiences(8).

A study done on types of programs for second language learners, while not going beyond test scores, was at least longitudinal, using a very large sample and following students throughout the grades, found that programs that used more of the primary language, and those that used methodologies where language was taught in context embedded ways, had better results(9). This despite the fact that in the early grades the students with more English instruction and less primary language did better. Short term results were negatively correlated with long term results in this case.

Whenever someone says that the evidence proved that a certain method is better, one must ask, what is that evidence? Did the assessment really match your definition of what it means to be able to do or know that? Were the results short or long term, and if short term, what is the evidence that these short term results will add up to long term success? Also, it is important to ask what are the effects on other aspects of learning or the life of the student. And most importantly, are they assessing what really matters?


1. Tarver, Sarah G. “Direct Instruction: Teaching for Generalization, Application and Integration of Knowledge.” Learning Disabilities 10, no. 4 (2000): 201-07.

2. Kohn, Alfie. The Case against Standardized Testing: Raising the Scores, Ruining the Schools. Portsmouth, NH: Heinemann, 2000.

3. Meier, Deborah. “Why Reading Tests Don’t Test Reading.” Dissent, Fall 1981, http://deborahmeier.files.wordpress.com/2012/02/1981_whyreading.pdf. 457-66; and Meier, Deborah “The Fatal Defects of Reading Tests.” In The Open Classroom Reader, edited by Charles Silberman. New York: Random House, 1973.

4. Meier, Nicholas. “Reading First.” Critical Literacy 3, no. 2 (2009): 69-83. http://www.criticalliteracyjournal.org

5. Krashen, Stephen D. Three Arguments against Whole Language & Why They Are Wrong: Heinemann, 1999.

6. Aiken, Wilford M. The Story of the Eight-Year Study. New York: Harper and Row, 1942.

7. Bensman, David. Central Park East and Its Graduates: Learning by Heart. New York: Teachers College Press, 2000.

8. Dinwiddie, James, and Anne M. Young. “Comparative Outcomes for Progressive School and Non-Progressives School Students.” Maasters Thesis, San Jose State University, 2010.

9. Thomas, Wayne, and Virginia Collier. “School Effectiveness for Language Minority Students.” 97. Washington, DC: National Clearinghouse for Bilingual Education, 1997. http://www.escholarship.org/uc/item/65j213pt

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s