After studying the classrooms of 3,000 teachers, including some in Pittsburgh Public Schools, the Bill & Melinda Gates Foundation has concluded the best way to determine teacher effectiveness is to use a combination of state test results, observations and student surveys.
The foundation Tuesday released the final findings from its three-year Measuring Effective Teaching project.
The MET study -- which focused on math and English language arts teachers -- builds on its earlier reports by fleshing out the best ways to use the measures.
It concludes that using the measures results in correctly identifying more effective teachers who cause students to do better on state tests as well as on other more cognitively challenging tests.
It also concludes that testing data give teachers better feedback for improving their practice.
Many teachers do not have state test data for their students because the tests are given only in a limited number of disciplines. In Pittsburgh, only about 35 to 40 percent of teachers have state test scores that can be used for evaluation.
In a phone news conference, Vicki Phillips, director of education, College Ready - U.S. Program for the foundation, said more work needs to be done on that question.
"I think you'll hear us have more to say about it going forward," she said.
When test data are used, the report on reliable measures said it "unambiguously" recommends the scores should be adjusted to account for the students' prior performance.
Traditionally, Pennsylvania teachers have been evaluated based on classroom observation -- often a single visit by a principal -- which resulted in 99 percent of them being judged satisfactory.
The MET study concluded that classroom observation alone -- even when done twice by one trained observer and two more times by another -- "performed far worse than any of our multiple measures composites."
Starting in 2013-14, Pennsylvania will require school districts to base half of the teacher evaluation on observation and the rest on "multiple measures of student achievement."
For the observation portion, the Pennsylvania districts will be using Charlotte Danielson's Framework for Teaching, which was used in the MET study and is used in Pittsburgh Public Schools.
In addition to 50 percent for observation, under the state's formula for multiple measures, 15 percent will be based on building-level data, 15 percent on teacher-specific data and 20 percent on elective data. The building-level and teacher-specific data are based on state test results.
The state law does not specifically name student surveys, but Pittsburgh plans to seek permission to use such surveys and to allot different weights to the measures than the law states.
Pittsburgh, which has a $40 million grant from Gates for the district's Empowering Effective Teachers program, also has a much smaller grant for its participation in the MET study, which largely included the student surveys.
Sam Franklin, executive director of the district's office of teacher effectiveness, said, "We are one of the few districts in the country that actually has each of the lenses on effective teaching that are supported by the MET research.
"Our teachers have access to information that can really change how they teach and help them get better results for students."
Statewide, from the results of the new state measures, Pennsylvania's teachers will be divided -- by cut scores yet to be determined -- into "distinguished," "proficient," "needs improvement" and "failing."
The MET project cautions against dividing teachers into four equal-sized groups. On classroom observations, for example, it said that 50 percent of teachers scored within 0.4 points of each other on a 4-point scale, with only 7.5 percent scoring below a 2 and 4.2 percent above a 3.
"This would suggest a large middle category of effectiveness with two smaller ones at each end," the feedback report stated.
"Rather than trying to make fine distinctions among teachers in this vast middle, efforts would be better spent working to improve their practice."
Nor does the study say the teacher data are only for high-stakes decisions such as evaluation.
"Multiple measures provides rich information to help teachers improve their practice," the feedback report said.
Often, the multiple measures are combined into a single index, with each one given a different weight.
The report says that allocations of 33 percent to 50 percent of the weight on state tests results "are sufficient to indicate meaningful differences among teachers."
For all three measures, it recommends a balanced approach, such as 33 percent for each of the three measures or 50 percent for student achievement and 25 percent each for the other two.
Steve Cantrell, chief research officer of education at College Ready, for the foundation, said the best combination would depend on the goal, such as whether the result is intended to predict state test scores.
"It's going to be a local decision. What we found within our data -- given the trade-offs, balance seems best," he said.
Over the course of the MET project, some other measures were looked at as well.
Teacher pay in Pennsylvania typically is based on years of experience and amount of education.
But Mr. Cantrell said, "A master's degree and experience predicted about a third as well as our composite measures. They fared fairly poorly in comparison."
The reports can be found at metproject.org.
• State releases audit on Pittsburgh Public Schools. Page B-2
Education writer Eleanor Chute: firstname.lastname@example.org or 412-263-1955.