Measuring Teacher Effectiveness: Are We Creating an Education Nightmare? -- THE Journal

Policy

Measuring Teacher Effectiveness: Are We Creating an Education Nightmare?

06/08/11

We seem to be setting ourselves up for disaster education. Efforts are underway not only to adopt value-added models to rate the effectiveness of individual teachers, but to use these models to identify those at the very bottom who might later lose their positions and those at the very top who might then be eligible for merit pay. Yet in all the policy discussions and public commentary, there's been little focus on learners and on how, precisely, we define the qualities of a good teacher.

The movement to revise methods for teacher evaluation to include such models came about in an effort to undermine current evaluation systems that tend to rate most teachers as satisfactory (Hull, 2011).

Educators are concerned because their evaluations will be tied to results of their students’ standardized testing, which are used in value-added calculations, while other factors, such as experience and training, are diminished. There's concern that the increase in testing that will be required to use those models to rate all teachers might come at the expense of learners, taking the joy out of learning and making it boring, as President Obama pointed out ("Remarks," 2011). And there's concern about our lack of agreement on what it means to be an effective educator.

The need for highly effective teachers is a given. But when, as part of the discussion, I heard policy makers and business leaders proclaim that experience and advanced degrees do not necessarily matter in teaching (Strauss, 2010), I took a look at my own career--which began about four decades ago--and concluded that I do not agree. Questions came to mind regarding the nature of teaching and to what degree value-added models could really help school districts identify teacher effectiveness to merit changing existing compensation systems that have traditionally been based on experience and degrees.

My goal is to shed light on several complexities that might not be reflected in test score data to better appreciate the difficulty in revising a teacher evaluation system and then link results to merit compensation. Let's begin this two-part series with a quick look at value-added models, why it's important to agree on the nature of teaching, and why experience and degrees are, in fact, relevant.

A Quick View of the Value-Added Nightmare
The debate across the nation on teaching effectiveness and its connection to students' standardized testing results has been fueled by the United States Department of Education's Race to the Top program, which has urged states and districts to use teacher performance to inform personnel decisions (U.S. Department of Education, 2009). It has resulted in several states passing laws making student achievement a significant factor in teacher evaluations--at least 50 percent in some states, according to the Institute for a Competitive Workforce (2011). For example, these are illustrated within Florida's teacher merit pay bill (Postal, 2011) and Ohio's Senate Bill 5 (Fields, 2011), both of which were signed by their respective state governors in March, 2011. Implementation plans will also require additional tests to be created for subjects not covered by existing state standardized tests. Both states plan to phase out the merit of experience and advanced degrees in their revised compensation plans.

Value-added models require annual testing of students, and if high-stakes for teachers are attached to results, all teachers will need to be included, with high-stakes testing in art, music, physical education, electives, and so on. This is likely to lead to huge funding issues to maintain all systems over time.

Consider the need for common curriculum and then new tests each year that are highly correlated with the curriculum. Tests will need to be piloted to ensure that they are valid and reliable before implementing them on the large scale. Teachers will need ongoing professional development, particularly in schools with high teacher turnover, not just to interpret results, but to learn how to use data to improve instruction. If used for teacher evaluation, results will need to consider the amount of time learners were assigned to a teacher, particularly when students might have entered after the start of a school year.

Add to the complexity the difficulty to measure annual growth of each learner, as there should be multiple indicators of achievement. ASCD (2009) pointed out, "Effective and accurate growth models can include a combination of state assessments, teacher-developed assessments, portfolios, grade point averages, and performance assessments such as essays and projects" (p. 4). Obviously we're losing sight of that in limiting growth to what is measured on standardized tests.

Value-added models need each learner's standardized test scores from the previous year to make a calculation--and that's nearly impossible to ensure for many reasons. Chief among those are that students don't necessarily study the same subjects from year to year, particularly at the high school level; prior test scores would not be available for all grades (e.g., kindergarten, and perhaps grades 1 through 3) and for courses that students take only one time or for first courses taken in a new content area. Data might also be missing for courses learners did take the prior year (owing, for example, to mobility). As value-added models differ, differences in outcomes might appear if two different models were to be applied to the same data. Analyses depend on which factors are considered for use in the statistical correlations (Goldhaber, 2010; Hull, 2011). I also have to wonder if results of student performance task assessments will fit into teacher evaluations. They are part of the Common Core State Standards initiative for math and English language arts. Will those indicators of growth be ignored in teacher evaluations, or will that type of testing also need to be expanded for other subjects?

A primary nightmare for educators is that value-added assessment does not have a research base sufficient "to support the use of [value-added models] for high-stakes decisions" (McCaffrey, Lockwood, Koretz, & Hamilton, 2003, p. 3). According to Douglas Harris (2008), "Unfortunately, we know very little about the potential of value-added accountability to improve student achievement" (p. 36). What we do know is that when data are sufficiently detailed, they can be used for diagnostic purposes suggesting where improvements might be made. What those not familiar with the nature of correlations might not understand is that such assessments by themselves "cannot identify the cause of poor student achievement" (Evergreen Freedom Foundation, 2011, p. 3) leading them to draw incorrect conclusions about those scores and teacher effectiveness.

The potential nightmare continues with the logistical and technical concerns that are cropping up once states and districts have revised models for teacher evaluation, which include value-added measures. To implement those, they will need to create new integrated systems for managing teacher-performance. Without technical in-house expertise they will need to rely on outside contractors whose expertise is also evolving in this area (Sawchuk, 2011). Districts will also need to maintain huge databases of student data and test results gathered from several years, some of which might not yet exist for factors of interest in using value-added models.

And finally, whether or not value-added is included in teacher evaluations, any ties to merit pay will also create nightmares. Merit pay systems for educators do not work for a number of reasons, as Al Ramirez (2011) indicated. Good teaching is not about money, but money is a prime issue against merit pay in education. Unlike business and industry that operate on profit and loss, school districts have constrained budgets. The "search for the right combination of behaviors and outcomes [to reward] is a slippery slope that inevitably leads to a complex and unwieldy measurement system that distracts both teachers and principals from their important work" (Administrative Problems section). If raising scores on bubble tests or other forms of standardized tests is ultimately the primary factor in determining merit pay, then that's what will get done, at the expense of goals we have for our learners. There's more, as you read on.

See What Selected States Are Doing

Colorado Department of Education Educator Effectiveness

Delaware Performance Appraisal System

Florida District Performance Appraisal Systems

New Mexico Guidelines for Performance Evaluation

Ohio Department of Education Value-Added Reports

Pennsylvania Value-Added Assessment System

South Carolina ADEPT system for assisting, developing, and evaluating professional teaching

Tennessee Framework for Evaluation and Professional Growth

What's the Real Definition of 'Teacher Effectiveness?'
Why ask this question?

We know "effectiveness" is associated with producing a desired result, but without agreement on a definition of teaching and the results we desire, how can we measure "teacher effectiveness?"

For example, those who have not been educators might believe teaching is solely an individual endeavor based on their own school experiences and a quick read of W. James Popham's (2009) summary of teaching in the 21st century. He stated that "once we strip away its external complexities, teaching boils down to teachers' deciding what they want their students to learn, planning how to promote that learning, implementing those plans, and then determining if the plans worked" (Preface section).

However, it's not so simple.

All the external complexities of teaching cannot be stripped away, and we can't afford to discount them in determining teacher effectiveness. Non-educators might not be aware of most of those in thinking that increasing standardized test scores is the true indicator. Effective teachers know and communicate subject matter and design curriculum, instruction, and multiple assessments. They know about diverse student populations, use data and technology effectively for all, communicate effectively with parents and other staff, conduct action research to improve their practice, and implement existing research containing significant findings. They are ethical and learner-centered in their approach setting high expectations and contributing to the academic, social, and emotional growth of their learners. Plus, they continually grow in the profession, maintain sanity, minimize stress, learn from mistakes, and--let us not forget--prepare students for standardized testing.

Comprehensive definitions of effective teaching--such as the one stated within Colorado's State Council for Educator Effectiveness Report and Recommendations to the State Board (State Council, 2011)--reflect many of these complexities. If there is any doubt about the collaborative nature of teaching, consider from that report: "Because effective teachers understand that the work of ensuring meaningful learning opportunities for all students cannot happen in isolation, they engage in collaboration, continuous reflection, on-going learning and leadership within the profession" (p. 11). Thus, if teaching is collaborative, value-added scores will have serious limitations to measure individual teacher effectiveness. If ongoing learning and leadership are important, then experience and advanced degrees matter.

Sadly, the only definition of teacher effectiveness that seems to matter in the discussion is not comprehensive, as "increasingly, policy conversations frame teacher effectiveness as a teacher's ability to produce higher than expected gains in students' standardized test scores" (Goe, Bell, & Little, 2008, p. 5). Indeed you will find this notion within the Race to the Top program (U.S. Department of Education, 2009). It has led teaching to become "in many ways, a test-governed game" that teachers need to learn to play so that students come out the winners (Popham, 2009, Preface section). It encourages teaching to the test--an educator's nightmare, as they know goals of education are not limited to those measured on standardized tests. But, when outcomes potentially have high stakes for their evaluations and might impact their compensation, can you really blame teachers for this?

Do We Agree That Teaching Is Both Science and Art?
Measuring teaching effectiveness is compounded by its nature as both science and art, as illustrated by fundamental requirements for proficient teaching from the National Board for Professional Teaching Standards (NBPTS, 2002). As a science we can quantify teachers' "knowledge of the subjects to be taught, of the skills to be developed, and of the curricular arrangements and materials that organize and embody that content." We can quantify their "knowledge of general and subject-specific methods for teaching and for evaluating student learning" and "knowledge of students and human development" (p. 2). However, their "skills in effectively teaching students from racially, ethnically, and socioeconomically diverse backgrounds; and the skills, capacities and dispositions to employ such knowledge wisely in the interest of students" (p. 2) define an artistic side that is qualitative for evaluation purposes.

We can test content and pedagogical knowledge as part of licensure. We can quantify teaching using a set of descriptive components, such as found in checklists that administrators have traditionally used along with direct observations for rating teachers as excellent, satisfactory, or unsatisfactory. For example, the criteria included in the teacher evaluation form for the Grundy County Special Education Cooperative in Iowa address planning and preparation, the classroom environment, instruction, and professional responsibilities. However, as art, not everyone can perceive or appreciate all the nuances that an individual teacher might add to the "painting," particularly for the social and emotional growth of learners. Of course, they might get a limited glimpse, as video of classroom instruction to be used in teacher evaluation has also been addressed by policy makers--a true potential nightmare for ethical and legal reasons (e.g., see Orr, 2011).

Other evaluation tools might include self-assessments, teacher portfolios, and student work samples; some value might be derived from survey results from students and parents and possibly peer commentary. As each tool has its own strengths and weaknesses (Goe, Bell, & Little, 2008; Hull, 2011), it's apparent that developing a standardized evaluation model for a group of teachers, which would take into consideration teaching as both a science and art, would be a challenge and then a nightmare to administer.

What's the best combination?

No evaluation system is perfect. Value-added models might add a layer of objectivity to measuring teaching effectiveness, but if we agree that teaching is a science and art, at best it can only be one indicator. Yet the scores derived from those will be what the public and press will focus on, another potential nightmare for educators.

How Do We Consider the Role of Experience and Degrees?
Dan Goldhaber (2010), director of the Center for Education Data & Research, stated that while "research shows teacher effectiveness to be a highly variable commodity, it also shows that it is not well explained by factors such as experience, degrees, and credentials that are typically used to determine teacher employment eligibility and compensation" (p. 1). The phrase "not well explained" does not mean we should minimize or negate their value, as doing so is contrary to what we profess for the very learners we teach--namely, lifelong learning.

Research does show that experience matters, but we are losing many of our veteran teachers (Carroll & Foster, 2010), which in itself should be a concern for school districts--compensation and tenure issues aside. We want teachers with staying power in the profession, not just a two-year commitment, and those who also demonstrate their own growth intellectually and professionally as role models. I agree that the nature of an advanced degree is important in compensation considerations, as it should be education related so as to increase a teacher's content or pedagogical knowledge, or knowledge and skills related to the profession as a whole. While I appreciate arguments for and against advanced degrees and their consideration in compensation ("Do Teachers Need," 2009), both experience and appropriate degrees also have the potential to contribute to the enhancing the quality of the educational system as a whole and its ability to develop and implement new programs and practices for the benefit of our learners.

Research clearly shows "teachers improve their proficiency and effectiveness during the first seven years" (Carroll & Foster, 2010, p. 12). It also has shown that when teaching a particular grade level, as in a 2009 study analyzing teacher qualifications and academic achievement in low performing schools, "additional years of teaching experience at the same grade level add to direct positive impact on student achievement, peaking at about 20 years" of experience. Even "a teacher at 30 years at the same grade level is still performing at a level of effectiveness that is higher than the performance of teachers during their first ten years" (Huang, 2009, cited in Carroll & Foster, 2010, p. 12). This has implications for school districts for improving teacher effectiveness by adding greater stability to what teachers teach and where. I am confident in saying that the more I have taught a particular subject, the more I've been able to focus on the learners and less on content itself in instruction.

Carole Steele (2009) reminded us, no teacher excels at every aspect of teaching. Beginners and experts differ in what they attend to or ignore. It's the experts who "have become skilled at synthesis and evaluation in regard to their thinking about teaching and learning" (Introduction section). In general terms, new teachers benefit from the wisdom of their more experienced colleagues. More experienced teachers are better able to integrate and draw connections between current, past, and future learning and relate their content to other curricular areas. They tend to be able to better use such classroom management skills as voice, gestures, reading student facial expressions and body language, and proximity. They can see the big picture--in planning they can anticipate problems and a need for alternative plans and adjust their practice accordingly. They also know their students' needs and evaluate their lessons according to students' learning growth--that is they measure effectiveness of a lesson beyond meeting the broad objective of the day. Plus, they are knowledgeable about school and community resources that can benefit students. They understand the culture of the school, and have amassed strategies to effectively engage parents in collaborative activities. They understand how to motivate students and maintain their interest even in the face of temporary failure (NBPTS, 2002). These are important considerations, not to be ignored in evaluating teaching effectiveness.

Bottom Line
I suspect in reality only a very small percentage of teachers with experience and advanced degrees would be labeled as truly ineffective based on scores from value-added models. My nightmares surround potential misuses of data, and that we do not have the research base to know for certain whether value-added models will truly be better than existing systems for measuring teaching effectiveness for high stakes decisions. It will take a great deal of funding to find out. I would hope that value-added scores might be used for lower-stakes decisions to contribute to understanding how to improve instruction, and that we can experiment with models that really control for factors of interest. We need educator buy-in to validity and reliability of results. In the meantime, we need to look more closely at what will go on in schools as a consequence for implementing merit pay for educators to warrant phasing experience and advanced degrees out of compensation plans.

In part 2 of this two-part series, I'll delve more into finer details of my concerns about value-added models and what they can't tell you about the human side of teaching and learning, professional practice, and leadership and remark on what is ultimately meaningful in measuring teacher effectiveness.

References

ASCD Educator Advocates. (2009). 2009 ASCD legislative agenda.

Carroll, T. G., & Foster, E. (2010, January). Who will teach? Experience matters. Washington, DC: National Commission on Teaching and America's Future.

Do teachers need education degrees? (2009, August 16). New York Times, Room for Debate Blog.

Evergreen Freedom Foundation. (2011.). School director's handbook: Value-added assessment.

Fields, R. (2011, April 23). Teacher merit pay system in Ohio's new collective bargaining law could be first of its kind in the country. The Cleveland Plain Dealer.

Goe, L., Bell, C., & Little, O. (2008). Approaches to evaluating teacher effectiveness: A research synthesis. Washington, DC: National Comprehensive Center for Teacher Quality.

Goldhaber, D. (2010, December). When stakes are high, can we rely on value added? Exploring the use of value-added models to inform teacher workforce decisions.

Harris, D. (2008). Would accountability based on teacher value-added be smart policy? An examination of the statistical properties and policy alternatives. Madison, WI: University of Wisconsin at Madison.

Hull, J. (2011). Building a better evaluation system: At a glance. Alexandria, VA: Center for Public Education.

Institute for Competitive Workforce. (2011, January). In focus: A look into teacher effectiveness.

McCaffrey, D., Lockwood, J. R., Daniel M. Koretz, D. M., & Hamilton, L. S. (2003). Evaluating value-added models for teacher accountability. Santa Monica, CA: Rand Corporation.

National Board for Professional Teaching Standards. (2002). What teachers should know and be able to do: The five core propositions of the national board.

Orr, B. (2011, February 3). Bill to install classroom cameras fails in the senate. Wyoming Tribune Eagle.

Popham, W. J. (2009). Instruction that measures up: Successful teaching in the age of accountability. Alexandria, VA: ASCD.

Postal, L. (2011, March 24). Gov. Scott signs teacher merit-pay bill. Orlando Sentinel.

Remarks by President Obama at Univision Town Hall. (2011, March 28). Washington, DC Press Release.

Sawchuk, S. (2011, April 27). Teacher-evaluation logistics challenge states. Education Week.

State Council for Educator Effectiveness. (2011, April 13). Final Report and Recommendations to the Colorado State Board of Education.

Steele, C. F. (2009). Inspired teacher: How to know one, grow one, or be one. Alexandria, VA: ASCD

Strauss, V. (2010, November 20). Why teaching experience really matters. The Washington Post.

United States Department of Education. (2009). Race to the top program: Executive Summary. Washington, DC

E-Mail this page

Printable Format

Featured

Google Cloud Study: Early Agentic AI Adopters See Better ROI

Google Cloud has released its second annual ROI of AI study, finding that 52% of enterprise organizations now deploy AI agents in production environments. The comprehensive survey of 3,466 senior leaders across 24 countries highlights the emergence of a distinct group of "agentic AI early adopters" who are achieving measurably higher returns on their AI investments.
1EdTech Announces K-20 Collaboration to Shape Responsible AI in Education

The 1EdTech Consortium recently announced it will lead a cross-sector collaboration "to define how AI can responsibly and effectively support teaching and learning."
Cambium Learning Group to Combine ExploreLearning and Learning A-Z Brands

Ed tech company Cambium Learning Group has announced plans to combine its ExploreLearning and Learning A-Z brands, with a new name and brand identity to be introduced in early 2026.
StudyFetch Launches Free AI-Powered Literacy Platform

Education platform StudyFetch has introduced StudyFetch Read, a free AI-powered literacy tool designed to provide personalized reading instruction for students.