Schools and the Business Model

Policy makers and the current so-called reformers talk about the need to run our public educational system more like a business. From what I see, that is being done, with the same disastrous results.

We see businesses paying their CEOs exorbitant salaries while reducing their labor forces and cutting the wages of those who actually do the work. Often these CEOs are hired irrespective of their knowledge or experience in the field or of whatever product or service that company is engaged in. Then when their policies fail, they are paid huge sums to buy out their contracts, while a replacement CEO is paid even more.

In education we see the same rise in wages for superintendents, especially of large school districts, while cuts are being made everywhere else. Often these new leaders come with no educational backgrond at all, their only experience being in the business world. And when they fail, they too see their contracts paid off while a new superintendent is hired at a higher salary.

In business more attention is being paid to short term profits to give investors a quick return on their money, often at the cost of long-term quality or stability.

In education we see schools forced to find ways to get short term test score gains per Federal and State mandates, which are often made at the cost of building a solid educational foundation and understanding.

In business companies engage in cooking the books to make their profits look better than they might actually be, and we read about these scandals almost daily. In education we see districts and schools cooking the books and engaging in practices to make their test scores and other data look better and we act just as surprised as the same scandals appear in education.

In business quick profits are the goal, at whatever cost, legal or illegal. In public schools better test scores are the legal tender to be attained at whatever cost.

Schools are learning to act more like contemporary businesses. In business such practices have taken the world’s economies to the brink of disaster and brought us the worst recession since the Great Depression, while the gap between rich and poor continues to widen. Such practices and mindset are doing the same to our public educational system, widening the gap between the quality of education for rich and poor.

One might notice that it is practically unheard of for the private schools where the elite send their children to use the rhetoric of business practices to describe their own schools.

Standards-Based Education

Currently in education there is a lot of talk about standards-based education and the need for high standards. I will discuss in this column where that concept came from and how it has been distorted from its original use.

The idea of a standards-based educational system came from the work of Ted Sizer (1932–2009). In the early 1980s he was involved in a nationwide study of high schools that resulted in his book Horace’s Compromise (and later Horace’s School and Horace’s Hope). In Horace’s Compromise, Sizer describes the work of a typical teacher, and how no matter how willing, well-meaning, and hardworking, the teacher cannot meet the needs of the over hundred students he sees everyday, and how students by the same token cannot do deep quality work while jumping from one subject to another each with a different teacher and mostly sitting there being expected to soak up facts and concepts. In other words, the lack of real quality learning going on in schools was not the fault of teachers or students, but the design of the institution and the compromises teachers and students made with each other to survive in such an institution.

Sizer proposed that instead of students being rewarded for successfully passing a certain number of courses, and being in school a certain amount of time, they be required to demonstrate the knowledge and abilities of a successful high school student through some sort of performance assessment where students actually showed they could apply what they had learned. He also posited certain attributes that schools would need in order to carry out such an education. What came out of that directly from Sizer and likeminded educators was an organization, the Coalition of Essential Schools, which holds a set of ten common principles that schools doing such work adhere to. This organization supports schools in trying to make the changes to move toward applying these ideas. According to Sizer, how schools would measure this success, and how each school would carry out those principles in practice, needed to be locally decided.

This idea of Sizer’s that students should graduate by being measured against a set of standards rather than just seat time became popularized in the 1990s. However, in many ways the concept got turned on its head. For one thing, the term “standards” took on a new meaning from its usual everyday meaning of a level of quality. Instead “standards” became laundry lists of facts and concepts, both broad and discrete, to be learned, as well as levels of performance. These standards, rather than being locally decided as Sizer proposed, have been mandated by State authorities (and now we are moving to National mandates). In most states these laundry lists of standards are typically so long that one expert declared that it would take over 20 years if students were just exposed to the material for each standard, and much more if they were really expected to master them.

The other distortion is that meeting these standards is measured by standardized tests. Performance has come to mean not what Sizer had in mind—the ability to carry out real world tasks that used the knowledge and abilities that schools decided were important—but how one “performs” on a standardized test. These standardized tests are designed to test students’ recall of a random sample of what is on that laundry list of facts and concepts. High standards have come to mean high scores on such tests.

Sizer’s idea was that graduation by standards should free up schools to look and act differently, and free up students to demonstrate their knowledge in a variety of ways. The current practice of “standards” has meant the standardization of curriculum, pedagogy, and assessment of the students, as well as their teachers, schools, districts and states by the use of standardized tests.

So far there is no evidence that the current use of standardized curriculum and the high stakes use of standardized tests has improved the quality of education. The achievement gaps these so called reforms were to solve are as great or greater than before these changes. Graduation rates are overall no better, and we do not hear high school teachers claiming that students are coming in more prepared than they used to be. So far the only response the education establishment has offered to this lack of results is that we need more standardization, more tests, and higher stakes.

On the other hand, Sizer’s ideas of standards without standardization have also been tried out, at times with astounding success. One of the first schools to implement Sizer’s ideas was Central Park East Secondary School (CPESS) a public school of choice in New York’s East Harlem. Deborah Meier, building on her work at Central Park East Elementary School, collaborated with Ted Sizer on how to meld his ideas and hers to develop a secondary school on the Coalition principles. They came up with a school where students studied fewer topics, and worked with fewer teachers more intensively. All faculty and administrators worked as advisors who stayed with students over time and met with their advisory group daily. Students took part in internships in real world professional settings. The standards of the school were upheld through a series of portfolios and defenses of those portfolios in front of a committee. Students graduated, not after a prescribed number of years or prescribed number of completed courses, but when they had successfully passed and defended those portfolios. The standards of Central Park East were built around certain “Habits of Mind” that the faculty believed were important in all facets of life and in all disciplines. To a large extent, the demonstration of the use of those habits was the rubric used to decide if the portfolio or defense of the work met the schools standards. The students of CPESS had success at graduating high school and going on to, as well as succeeding in, college far beyond their demographic equivalents in other public high schools in New York (see David Bensman’s fascinating book Central Park East and its Graduates which documents his study of CPESS alumni).

After CPESS, a whole network of such schools sprung up all over New York City, and to some extent nationwide. Schools such as Urban Academy, the International High Schools, the Met schools, High Tech High, and Boston Arts Academy, to name just a few, continue in this tradition of high standards without standardization, of depth of knowledge over coverage, and of the importance of relationships with students as essential to successful education. While each of these school looks very different, in each school one will see students who are passionately following their own interests while being held to a common set of high standards in a non-standardized curriculum. These schools have shown that they help students beat the odds in terms of graduation and getting into college. Even more importantly, these schools produce graduates with positive attitudes toward learning and their ability to shape their own futures and contribute to the larger society.

Assessment in California Teacher Education

(This column is adapted from a talk I gave at the University of Kyoto in January, 2012)

Those in the field of assessment often refer to two important standards that assessments are expected to meet, reliability and validity. Reliability meaning that the same results would be obtained if the assessment were given again, or if a different person was scoring the assessment.

Validity means that the assessment actually measures, assesses, what it claims to be measuring/assessing—and whether it predicts how one will perform in the future (Ormrod, 2005).

One type of validity is “face validity”—that is, it is accepted that the assessment actually does measure what it claims to measure, without needing statistical proof that it does. The road test portion of the driving test might be an example of that: We can easily agree that if we want to know if someone knows how to drive, we can sit in a car with them and watch them drive. Now, what constitutes good enough driving to pass the test, that is where things might get more difficult to agree. Both how good is good enough, and which things matter most, e.g. how well the student parked, used turn signals, obeyed signs, and how much should each count, can be controversial.

Other tests need to have their validity demonstrated. The paper pencil portion of the driving test might be one of those. Do we have any evidence that those who do better on the written portion are actually better drivers?

However, while we accept that the road test has more face validity, we might wonder about its reliability, the possible subjective nature. The written portion is more reliable, you either filled in the correct bubble/answer, or you did not. However, on the driving portion, maybe the traffic conditions were more difficult when you took it than when your friend did, maybe one instructor is tougher grader than another. Maybe he had a fight with his spouse that morning! Despite these shortcomings, we accept the trade-offs as worth the advantages of such an authentic assessment. A built in safeguard is the opportunity for second, third, and as many opportunities as needed to retake the test.

It is easy to create paper and pencil assessments that are reliable and easy to administer. However, how one’s score correlates to real life application of the knowledge or skill that the assessment is designed to measure is harder to determine. Some, such as myself, argue that there is a built in tension between reliability and authenticity. Real life tasks and situation are by their nature not standardizable: Conditions vary, there is ambiguity, and there is more than one right way to approach a situation or problem. Creativity, a very important human trait, cannot be measured, and one’s ability to act effectively in novel situations is also by its nature not standardizable. Therefore to assess one’s ability to use one’s knowledge and skills in real life situation is likely to have a degree of unreliability, unpredictably.

Furthermore, what one person views as good enough, as quality, in most real life applications also varies. A movie I thought was well acted and crafted, my best friend thought was poorly acted and rang false. And that is in movies made by highly paid seasoned professionals! Multiple publishers initially turned down a number of best selling classics in literature.

Compulsory public schooling in the United States was instituted at a particular point in history, with other changes and advances happening. Part of that was the belief in scientific experts and the new field of psychology as a science rather than philosophy, and the invention of standardized intelligence tests. Americans often want to find the one right way (Smith, 1988). Americans are known for their obsession with measuring everything, and putting numbers to everything. This has played into schools in the forms of tests that can be reduced to numerical scores, and a belief that if everyone takes the same test at the same time in the same way, and test is designed by outside experts, it is therefore objective.

Critics of the standardized tests of today point out the shortcomings of such tests: they don’t really have reliability at the individual level, they are culturally biased, and their inauthenticity—their lack of actual validity in terms of measuring any important, useful skill, ability or knowledge beyond the school house walls. They also object to the indirect influence of these tests in encouraging the teaching of discreet skills and rote knowledge that is quickly forgotten once the test is over (Hursh, 2005; Kohn, 2000; Meier, 2002; Ohanian, 1999).

However, it must be remembered that standardized tests were put in place in part as a seemingly fairer alternative to an aristocratic system, where social position and money was what decided who got into the best schools and got the best jobs. Standardized tests were seen as scientifically objective tests, and therefore gave an equal chance to all. One could rise by one’s merit, not relying on family name or wealth (Smith, 1988).

What authentic assessment is proposing to do is to let people show what they know and can do based on merit, but also more accurately than standardized tests reflect the skills and abilities the person should have by seeing how they apply that knowledge in a realistic situation.

Of course even “authentic assessment” is always a matter of degree. Authentic assessments are generally applied in somewhat contrived or hypothetical situations. In school situations it is rarely practical or even possible to have students demonstrate in the real life situation, and even authentic assessments give us just a sample of the full skill being assessed. To go back to the driving test example, even on the road test, not nearly every possible driving situation is encountered. The driver is asked to carry out a predetermined set of maneuvers at the direction of the tester over a relatively short period.

A large issue for authentic assessment is to overcome the issues of “bias,” which is really an issue of reliability—would a different scorer give that person the same score? One way to address this is through multiple assessors. For instance, at some high schools that use portfolio or exhibitions for graduation, such as was developed at Central Park East Secondary School, they use multiple assessors, while also having outside experts examine their system, and watch it in practice to help them improve and refine it (Gold, 1993; Meier, 1995; Meier 2002).

Another common system to obtain more reliability that is used in authentic or performance based assessment systems is to have scorers be calibrated. A set of benchmarks are set up—examples of the performance assessment carried out at different levels, and the scorers are first trained on what qualities to look for, and then they are asked to score these benchmark examples to see if they give them the expected scores. In theory, only when they can consistently give the expected scores are they considered calibrated, and therefore the scores are considered reliable.

I will now discuss efforts in California to bring a more authentic, yet standardized, assessment in a systematic way to credential teachers.

California teachers are given their credential based a variety of factors. Some have been (and still are) standardized paper and pencil tests. However, as we have discussed, there is a sense that these are not good indicators of how well they would actually teach. These tests are used as measures of minimum knowledge of basic skills. On the more authentic side these candidates are placed in classrooms to learn to teach alongside practicing teachers. In most teacher education programs in California, this is a semester long placement. In some, such as where I currently teach, we require two semesters of student teaching. However, some worry about the standards of those assessing that experience. Were they tough enough? Are they consistent? There is no standard set of measures for that experience. The same could be said of the other criteria, that they pass their college courses to become a teacher. Were the standards from one program to another, even one class to another, consistent (Chung, 2005)?

The legislature of the State of California decided to institute a performance based assessment system on top of the other criteria to both provide an authentic, yet valid and reliable way to measure whether a candidate was ready to become a teacher.

Linda Darling Hammond of Stanford University led a consortium of universities with foundation support to develop such a system, called Performance Assessment of California Teachers (PACT) (another similar system was also developed by the Education Testing System, the CalTPA). In the PACT assessment teacher candidates develop a 3-5 day lesson plan in mathematics or reading, they carry out the lessons in their placement, and videotape those lessons. They document all of this, providing a detailed description of the context where they taught the lesson, describing the school, the classroom and what they know about the students. They provide the lesson plans, and some discussion about those lesson plans. They reflect on what happened when they gave the lessons, what changes they made along the way, and what changes they might make if they were to give these lessons again. They select a 20-minute portion of the video for the portfolio, and discuss what is in that portion. They also provide examples of the assessment used in the lesson from three students of varying abilities. They discuss what they saw overall in reviewing the student assessment, and what they learned about the three students in particular.

This portfolio is then read and scored on a set of 12 rubrics. Several rubrics address issues of planning, several look at the execution of the lessons, several others look at the issue of assessment. The issue of how the lessons helped student access and learn “academic language” is also assessed by two of the rubrics. The people who score these assessments go through a two day scoring and calibration training, and must re-calibrate every year.

In practice, despite the training and calibration, there are still sometimes disagreements (if a student fails, it automatically gets scored by a second scorer. Randomly ten percent get two scorers to check reliability). While in the large majority of cases we probably score the candidates similarly, there are cases where we have scored them quite differently. In such a system, there is room for interpretation. If the rubric asks us if the lesson was appropriate for the students, or the teacher gave clear feedback, what one of us interprets as appropriate or clear may not be the same as another.

These are the trade-offs for a more authentic system. For everything we do, that we add, something is also lost, traded. On the positive side, in my institution it has meant that we have had dialog among the faculty about creating a more cohesive experience for the student. However, as many high stakes assessment systems can do, preparing our students for the assessment itself has taken significant university class time, time that used to be spent on content. In that way students may be losing out. Some also wonder to what extent is the ability to write well, to theorize being assessed, rather than the actual ability to teach. Though assessors are told that the writing itself is not being assessed, it is for the most part a written assessment, albeit of a performance (along with the short video clip).

It is certainly a system that is more uniform than what was in place before. From my experience with the system, it does appear that the stakes have been raised for student teachers. Are the teachers who have now gone through this system, better prepared? Are we better at keeping out unprepared teachers, while not excluding prepared ones through this system? That is a much more difficult question to answer for which there are no solid “facts.”

The problem in the United States is that people are looking for a foolproof “fair” system. The attempt is to avoid human judgment, which by its nature full of biases and well, judgment! Standardized tests, paper pencil tests, offer us the illusion of avoiding judgment, but it just moves such judgment to the creator of the test. It offers reliability often at the cost of meaningfulness.

In the United States we rely on human judgment for our criminal justice system, our courts—very important high stakes decision—and while mistakes are made, maybe even often, it is seen as better than the alternative. Authentic assessment systems at heart require the same faith. A faith that the trade-off of allowing for human judgment is better than the reductionism required to assess in a standardized form. I believe we need to bring more of such human judgment back to our educational system.

References:

Chung, R. R. (2005). The performance assessment for California teachers (PACT) and beginning teacher development: Can a performance assessment promote expert teaching practice? Stanford University. Proquest dissertations and theses, 598p.
Retrieved from http://search.Proquest.Com/docview/305434959?Accountid=10355 Unpublished Dissertation, Stanford University.

Gold, J. (Producer & Director), & Lanzoni, M. (Ed). (1993). Graduation by portfolio: Central Park East Secondary School [Videotape]. New York: Post Production, 29th Street Video Inc. http://vimeo.com/13992931

Hursh, D. (2005). The growth of high-stakes testing in the USA: Accountability, markets and the decline in educational equality. British Educational Research Journal, 31(5), 605-622.

Kohn, A. (2000). The case against standardized testing: Raising the scores, ruining the schools. Portsmouth, NH: Heinemann.

Meier, D. (1995). The power of their ideas: Lessons for America from a small school in Harlem. Boston: Beacon Press.

Meier, D. (2002). In schools we trust: Creating communities of learning in an era of testing and standardization. Boston: Beacon Press.

Ohanian, S. (1999). One size fits few: The folly of educational standards. Portsmouth, NH: Heinemann.

Ormrod, J. E. (2005). Educational psychology: Developing learners (4th ed.): Prentice Hall.

Smith, F. (1988). Joining the literacy club: Further essays into education. Portsmouth, NH: Heinemann.