In resolving the recent teacher strike in Chicago, the district and union agreed that student achievement gains will be used for the first time in teacher evaluations, counting for 30 percent of a teacher's overall evaluation rating. This reflects a national trend in teacher evaluation, but it begs a question that was ignored in the public discussion: What about the other 70 percent?
In fact, even as student growth measures are being incorporated into teacher evaluations in Chicago, Pittsburgh and across the country, big changes to the other components of teacher evaluation systems are underway. Away from the heated public debate about using student achievement growth to measure teachers' "value-added" contributions, school districts like Pittsburgh's are working to replace moribund, mechanical, infrequent classroom observations with rigorous new measures of professional practice. Implemented alongside student growth measures, better measures of professional practice can make evaluation fairer, inform the improvement of teaching practice and ultimately promote better student achievement.
The fact that it is possible to measure teachers' contributions to student learning reflects the development of sophisticated statistical methods. Value-added statistical models have undergone extensive scrutiny by scholars -- and appropriately so. They need to be fair, so that a teacher's rating is determined by his or her own contributions rather than by the advantages or disadvantages of the students he or she happens to teach. And they need to be sufficiently precise to avoid misclassifying large numbers of teachers simply because of random chance.
Fortunately, the best value-added methods address both of these challenges. By accounting for student background characteristics and multiple measures of each student's prior achievement, and by averaging a teacher's rating across multiple years of teaching, the methods level the playing field and substantially reduce rates of random error.
But even the smartest number-crunchers cannot solve one inherent limitation of value-added methods: The standardized tests they rely on are imperfect. Recognizing the limitations of state assessments, the Pittsburgh Public Schools are pioneering the use (alongside state assessments) of locally developed assessments that are designed to align with the curriculum of specific courses. Applying value-added statistical methods to multiple measures of student growth can provide a richer picture of a teacher's contribution to student learning.
Even so, no assessments can measure everything we expect students to learn and teachers to teach. So student achievement growth will never provide a complete measure of teacher performance. Good measures of teachers' professional practice are therefore critical alongside value-added measures.
Teaching is difficult to evaluate. Most of it occurs behind classroom doors outside the presence of other colleagues, and administrators typically do not spent much time in classrooms observing teachers. A common result is that more than 95 percent of teachers are rated as "satisfactory," while no one is recognized as exemplary. In consequence, lacking better information, school districts sometimes end up leaning too heavily on value-added data. The absence of information about individual teachers is mirrored in the field's woefully shallow knowledge about the instructional practices used by the best teachers, reflecting a century of national underinvestment in educational R&D.
Fortunately, this situation is rapidly changing. Like many other districts across the country, the Pittsburgh Public Schools are working hard -- in close collaboration with the Pittsburgh Federation of Teachers -- to develop, pilot and implement measures of professional practice that aim to be more comprehensive, rigorous and useful than reports derived from cursory annual drop-ins by a principal.
In policy statements of the American Federation of Teachers as well as in the halls of the U.S. Department of Education, the mantra of "multiple measures" of teacher performance echoes loudly. Multiple measures are relevant in several domains. Value-added models can be applied to locally developed curriculum-based assessments as well as to state standardized tests. Classroom practice can be observed by teacher peers as well as by principals. And the perspectives of students -- who observe their teachers more than anyone else does -- can be taken into account.
Like value-added measures, these new teacher-performance measures must be scrutinized to ensure their fairness, validity and reliability. The cost of multiple observations is also a real challenge. But the development and implementation of improved measures of professional practice is critical, because they have the potential to do much more than provide better ways to identify the strongest and weakest teachers.
As union leaders like Randi Weingarten of the AFT have recognized, these measures present unprecedented opportunities to help teachers learn to improve their practices by identifying the instructional strategies associated with high achievement growth. In the long term, improving teacher productivity and applying rigorous standards may not only improve the achievement of students, but also raise the esteem in which the teaching profession is held.
Brian Gill is a senior fellow and Duncan Chaplin is a senior researcher at Mathematica Policy Research, a think-tank headquartered in Princeton, N.J. Mr. Gill is leading a team of Mathematica researchers who are assisting in developing value-added measures for Pittsburgh Public Schools.