Pay for performance can improve student learning without negative impacts on the type of teacher that gets recruited or retained
Student learning in primary school lags enrolment gains across many low-income countries (Angrist et al. 2021). Teachers are key to student attainment (Chetty, Friedman and Rockoff 2014a,b, Bau and Das 2020, Buhl Wiggers et al. 2020) and central to addressing this learning crisis. Challenges include high teacher turnover rates and unfilled vacancies (Zeitlin 2021), as well as limited teaching time and poor curricular knowledge among those in post (Bold et al. 2017). Since low teacher salaries and the disconnect between career outcomes and performance on the job may be partly responsible for these outcomes (Crawfurd and Pugatch 2020), a possible policy response is to link teacher pay to performance.
Pay-for-performance (P4P) has seen limited implementation at scale in low-income countries; opinion is divided on whether it would improve teacher achievement (Breeding, Beteille, and Evans 2021). Critics argue that P4P could dampen worker effort (Benabou and Tirole 2003, Deci and Ryan 1985; Krepps 1997). Concerns are that such schemes recruit individuals who are purely ‘in it for the money’, lower effort by eroding intrinsic motivation, and drive good teachers demotivated by narrow evaluation criteria to quit. Proponents, on the other hand, point to classic contract theory (Lazear 2003, Rothstein 2015) and evidence from private-sector jobs with readily measurable output (Lazear 2000) to argue that pay-for-performance (P4P) will have positive effects. Under this view, such schemes recruit individuals who anticipate performing well in the classroom, raise effort by strengthening extrinsic motivation, and retain effective teachers, who feel rewarded and stay put. A recently growing literature shows the potential for performance contracts to improve learning outcomes along the effort margin,1 but relatively few studies speak to the consequences of P4P for the composition of teachers flowing into and out of the profession.2
Testing a pay-for-performance contract using a two-tiered RCT
In our study (Leaver, Ozier, Serneels, and Zeitlin 2021), we set up a two-tiered experiment to answer several questions about P4P contracts: What kinds of teachers are attracted by them? Do teachers deliver more student learning if paid this way? Do such contracts affect what kinds of teachers stay in their jobs?
Together with the Rwanda Education Board, we designed the STARS ("Supporting Teacher Achievement in Rwandan Schools") P4P contract, which rewards the top 20% of teachers in each district with extra pay. The teacher performance metric equally weights outcomes – student learning gains relative to others with similar baseline levels of achievement3 – alongside teacher inputs – a composite of teacher presence, lesson planning, and pedagogy. The tournament nature of this contract allows us to compare it to a fixed-wage (hereafter FW) contract that is equal in expected payout. These contracts were offered for two years.
Our two-tiered experiment first randomly assigns labour markets to either P4P or FW advertisements – to reveal the impacts of these contract types on the applicant pool – and then, once teachers have been placed, uses a surprise re-randomisation of experienced contracts at the school level to distinguish pure compositional effects from effort-margin impacts on learning. The first stage was undertaken during recruitment for teacher placements for the 2016 school year. Teacher labour markets are defined at the district by subject-family level. We randomised the offer of two-year P4P or FW contracts to positions in six districts, comprising more than half the new upper-primary teacher hiring lines for that year. Following teacher placement, we enrolled all schools that received a teacher to fill an upper-primary position, for a total of 164 study schools.4 In the second stage, we randomly re-assigned these study schools to either P4P or FW contracts.5
Performance contracts increase effort without crowding out teacher quality
As Figure 1 shows, advertised P4P contracts did not change the distribution of measured teacher skill either among applicants in general or among new hires in particular. This is estimated sufficiently precisely to rule out even small negative effects of P4P on measured skills. By contrast, advertised P4P contracts did select teachers who appear to be less ’prosocial’, contributing less in a framed Dictator Game played at baseline to measure intrinsic motivation. But teachers recruited under P4P were at least as effective in promoting learning as were those recruited under FW (holding experienced contracts constant). Our point estimates are consistent with a modest positive compositional effect on teachers’ value added under P4P contracts, as shown in Figure 2.
Recruited teachers working under P4P contracts elicited better performance from their students than those working under FW contracts, holding advertised contracts constant. Averaging over the two years of the study, the within-year effort effect of P4P was 0.11 standard deviations of pupil learning, and in the second year, the within-year effort effect of P4P was 0.16 standard deviations.6
Figure 1 Distribution of placed recruit attributes on arrival, by advertised treatment arm
(a) Grading task score
(b) Dictator Game contribution
Notes: Figures show cumulative distributions by advertised treatment arm of outcomes among placed teachers in a grading-task exercise, which measured curricular knowledge, and Dictator Game contributions, which measured prosociality. In Figure 1a, the randomisation inference p-value for a test of equality in Grading Task scores across the P4P and FW treatments is 0.367: there is no evidence of different levels of curricular knowledge when comparing the teachers who applied to the different kinds of contracts. In Figure 1b, the p-value for a test of equality in mean DG share sent between the P4P and FW treatments is 0.029.
We observe a range of teacher behaviours. These corroborate our first finding: P4P recruits performed no worse than the FW recruits in terms of their presence, preparation, and observed pedagogy. They further indicate that the learning gains brought about by those experiencing P4P contracts may have been driven, at least in part, by improved teacher presence and pedagogy.
Teacher presence was eight percentage points higher among recruits who experienced the P4P contract compared to recruits who experienced the FW contract, in a context where teacher presence is already high compared to other, including neighbouring, low-income countries. Teachers who experienced P4P were also scored as more effective in their classroom practices than teachers who experienced FW.
Figure 2 Teacher value added among recruits, by advertised treatment and year
(a) Year 1
(b) Year 2
Notes: The figures plot distribution of teacher value added under advertised P4P and advertised FW in Years 1 and 2. Value added models estimated with school fixed effects. Randomisation inference p-value for equality in distributions between P4P and FW applications, based on one-sided KS test, is 0.796 using Year 1 data, 0.123 using Year 2 data, and 0.097 using pooled estimates of teacher value added (not pre-specified).
We find little evidence to support claims of either proponents or opponents of P4P regarding retention. Teachers with P4P contracts were no more likely to quit during the two years of the study than teachers with FW contracts. There was also no evidence of differential selection-out on baseline teacher characteristics by experienced contract, either in terms of skills or measured motivation.
In summary, by the second year of the study, we estimate the within-year effort effect of P4P to be 0.16 standard deviations of pupil learning, with the total within-year effect rising to 0.20 standard deviations after allowing for the changed composition of teachers under P4P. Despite evidence of lower intrinsic motivation among P4P recruits, these teachers were at least as effective in promoting learning as were those recruited under FW. These results support the view that pay-for-performance can improve effort, while allaying fears of harmful selection effects.
From pilot to policy
Teacher accountability and incentive reforms have been described as “promising but low-evidence” by The Global Education Evidence Advisory Panel (GEEAP) in its recent “Smart Buys” report (GEEAP 2020). The GEEAP rightly emphasises the importance of context in shaping the extent to which such incentives can be useful and politically feasible, and calls for more evidence.
To speak to the potential impacts of P4P operated through government systems, our study took several steps to evaluate a contract that was contextually suitable. Implementation worked with District Education Office staff. And, at an average of 3% of salary, bonuses were modest and comparable in magnitude to existing variable-pay stakes in the imihigo system of subjective performance evaluations used elsewhere in Rwanda’s civil service.
Context also determines the investments in measurement needed to scale P4P schemes. Where annual learning assessments do not exist or are limited in their coverage, the incremental costs of introducing teacher incentives, or the design compromises needed to implement these with existing systems, may be prohibitive.
In some contexts, policymakers may choose to implement performance pay in only the most troubled schools. Recent evidence from reforms in the US (Biasi 2021) and experimental evidence from a network of private schools in Pakistan (Brown and Andrabi 2021) suggests that such incentives may be used to attract high-ability teachers to the under-performing schools that need them most.
The gains from system-wide implementation of P4P can be large. The combined effort- and selection-margin effects achieved by the second year of our study equate to approximately a full additional year of status-quo learning for each year of exposure to a STARS teacher.7 Even in the short run, our findings are consistent with the possibility of small gains in value-added of teachers selected into the profession, and P4P does not appear to attract worse teachers in these terms. Where the necessary infrastructure for such a policy is in place – because, as in Rwanda, data already exist that could be used for teacher evaluations8 – the potential benefits of P4P could far exceed its costs.
Authors' note: This study was funded by the UK Department for International Development via the International Growth Centre and the Economic Development and Institutions Programme, by Oxford University’s John Fell Fund, and by the World Bank’s SIEF and REACH trust funds.
References
Angrist, N, S Djankov and P K Goldberg and H A Patrinos (2021), "Measuring human capital using global learning data", Nature, 592: 403–408.
Ashraf, N, O Bandiera, E Davenport and S Lee (2020), "Losing prosociality in the quest for talent? Sorting, selection, and productivity in the delivery of public services." American Economic Review, 110(5): 1355–1394.
Ashraf, N, J Berry and J M Shapiro (2010), "Can higher prices stimulate product use? Evidence from a field experiment in Zambia". American Economic Review, 100(5): 2382–2413.
Barlevy, G and D Neal, (2012), "Pay for percentile", American Economic Review, 102(5): 1805–1831.
Bau, N and J Das (2020), "Teacher Value Added in a Low-Income Country." American Economic Journal: Economic Policy, 12 (1): 62-96.
Benabou, R and J Tirole (2003), "Intrinsic and Extrinsic Motivation." Review of Economic Studies, 70: 489–520.
Biasi B, forthcoming. "The labor market for teachers under different pay schemes." American Economic Journal: Economic Policy.
Bobba, M, T Ederer, G Leon-Ciliotta, and M Nieddu (2021), "Teacher compensation and structural inequality: Evidence from centralized teacher choice in Peru", Princeton University, Industrial Relations Section Working Paper no. 648.
Bold T, D Filmer, G Martin, E Molina, B Stacy, C Rockmore, J Svensson and W Wane (2017), "Enrollment without learning: Teacher effort, knowledge, and skill in primary schools in Africa." Journal of Economic Perspectives, 31(4): 185–204.
Breeding, M, T Beteille and D K Evans (2019), "Teacher Pay-for-Performance: What works? Where? And how?" World Bank, Teachers Thematic Group Policy Brief
Brown, C and T Andrabi (2021), "Inducing positive sorting through performance pay: Experimental evidence from Pakistani schools". Working Paper, University of Chicago.
Buhl-Wiggers, J, J Kerwin, J Smith and R Thornton (2018), "Teacher effectiveness in Africa: Longitudinal and causal estimates", IGC Working Paper No. S-90238-UGA-1.
Chang, F, H Wang, Y Qu, Q Zheng, P Loyalka, S Sylvia, Y Shi, S Dill and S Rozelle (2020), "The impact of pay-for-percentile incentive on low-achieving students in rural China." Economics of Education Review, 75, 101954.
Chetty, R, J N Friedman, and J E Rockoff (2014a), "Measuring the impacts of teachers I: Evaluating bias in teacher value-added estimates." American Economic Review, 104(9): 2593–2632.
Chetty, R, J N Friedman, and J E Rockoff (2014b), "Measuring the impacts of teachers II: Teacher value-added and student outcomes in adulthood." American Economic Review, 104(9): 2633–2679.
Cohen, J and P Dupas (2010), "Free distribution or cost-sharing? Evidence from a randomized malaria prevention experiment." Quarterly Journal of Economics, 125(1): 1–45.
Crawfurd, L (2021), “Accounting for repetition and dropout in contemporaneous cross-section learning profiles: Evidence from Rwanda”, International Journal of Educational Development, 85: 102443.
Crawfurd, L and T Pugatch (2020), "Teacher labor markets in developing countries", IZA Discussion Paper No. 12985.
Dal Bó, E, F Finan and M Rossi (2013), "Strengthening state capabilities: The role of financial incentives in the call to public service." Quarterly Journal of Economics, 128(3): 1169–1218.
Deci, E L and R M Ryan (1985), Intrinsic motivation and self-determination in human behavior. New York: Plenum.
Deserranno E (2019), "Financial incentives as signals: Experimental evidence from the recruitment of village promoters in Uganda", American Economic Journal: Applied Economics, 11(1): 277–317.
Filmer, D, J Habyarimana, and S Sabarwal (2020), "Teacher performance-based incentives and learning inequality." World Bank Policy Research Working Paper 9382.
GEEAP (2020), "Cost-effective approaches to improve global learning", Recommendations of the Global Education Evidence Advisory Panel, World Bank.
Gilligan, D O, N Karachiwalla, I Kasirye, A M Lucas, and D Neal, Forthcoming. "Educator incentives and educational triage in rural primary schools", Journal of Human Resources.
Filmer D, J Habyarimana and S Sabarwal, 2020. “Teacher performance-based incentives and learning inequality”, World Bank Policy Research Working Paper no. 9382.
Karlan, D and J Zinman (2009), "Observing unobservables: Identifying information asymmetries with a consumer credit field experiment", Econometrica, 77(6): 1993–2008.
Krepps, D (1997), "Intrinsic motivation and extrinsic incentives." American Economic Review, 87(2): 359–364.
Lazear, E P (2000), "Performance pay and productivity." American Economic Review, 90(5): 1346–1361.
Lazear, E P (2003), "Teacher incentives", Swedish Economic Policy Review, 10(3): 179–214.
Leaver C, O Ozier, P Serneels, and A Zeitlin (2021), “Recruitment, effort, and retention effects of performance contracts for civil servants: Experimental evidence from Rwandan primary schools.” American Economic Review, 111(7).
Mbiti, I, M Karthik, M Romero, Y Schipper, C Manda, and R Rajani (2019), "Inputs, incentives, and complementarities in education: Experimental evidence from Tanzania". The Quarterly Journal of Economics, 134(3): 1627–1673.
NCES (2020), "Digest of Education Statistics", National Center for Education Statistics, United States Department of Education.
Rothstein, J (2015), "Teacher quality policy when supply matters", American Economic Review, 105(1): 100–130.
Zeitlin, A (2021), "Teacher turnover in Rwanda", Journal of African Economies, 30(1): 81–102.
Endnotes
1 Extending the varied findings of early papers (Glewwe et al. 2010, Muralidharan and Sundararaman 2011), recent studies in China, Uganda, and Tanzania have shown the potential for performance pay to deliver equitable learning gains when appropriately designed and accompanied by adequate instructional resources (Fang et al. 2020, Gilligan et al. forthcoming; Mbiti et al. 2019).
2 Other studies have shown that recruitment is sensitive to unconditional salaries and career-track motivations. Dal Bó et al. (2013) and Bobba et al. (2021) show positive compositional effects of unconditional pay raises in the civil service in Mexico and Peru, the latter among teachers. And in the health sector, Ashraf, Bandiera, Davenport, and Lee (2020) find that career framings in job advertisement results in the hiring of more talented staff, with resulting health improvements, while Deserranno (2019) finds that increased earnings expectations discourage pro-social applicants and result in lower effort and retention.
3 This is based on Barlevy and Neal’s (2012) “pay for percentile” design, which ensures that student performance across the achievement spectrum is relevant to teacher rewards. This design does not require cardinal consistency in achievement scales across assessments, which removes the need to maintain consistent items, reducing scope for teaching to the test.
4 Our study focuses on teachers in upper primary, which refers to grades 4, 5, and 6, and uses English as its language of instruction. All teachers who taught core-curricular classes to upper-primary students, including both newly placed recruits and incumbents, were eligible for either P4P or FW contracts.
5 A signing bonus ensured that all recruits, regardless of their belief about the probability of winning, were made weakly better off by the re-randomisation. Consistent with this, no one turned down their (re-)randomised contract.
6 Another recent study of performance incentives for teachers in Tanzania also found stronger evidence of learning gains in the second year of implementation relative to the first (Filmer, Habyarimana, and Sabarwal 2020).
7 Equivalent years of schooling calculations based on Crawfurd (2021), who estimates that each year in school in Rwanda is equivalent to 0.16 standard deviations in reading and 0.25 standard deviations in mathematics; we compare these with the 0.21 standard deviation effect of STARS on learning in year 2.
8 In 2019, the Government of Rwanda introduced a system of Comprehensive Assessments, covering all students and subjects in Basic Education. It has also been investing in the Sector Education Inspectorate and in an electronic Teacher Management Information System.