What Changed? Exploring Teacher Incentives, Social Accountability, and Student Outcomes in Indonesia

Indonesia school children in a classroom
©Lina Rozana

What causes gains in student learning? When a policy change or intervention yields better student learning outcomes, this is cause for celebration. But if we want those learning gains to become a reality for other children in other contexts, we need to go beyond celebration toward analysing what exactly the intervention achieved—such as whether the learning gains were accompanied by undesirable side effects—and how exactly the intervention achieved it.

A new working paper from the RISE Indonesia Country Research Team offers celebration-worthy results alongside fascinating questions about how those commendable results came about. This study by Arya Gaduh, Menno Pradhan, Jan Priebe, and Dewi Susanti evaluated the impact of KIAT Guru, a policy programme involving a social accountability mechanism (SAM) in approximately 200 primary schools in remote areas of East Nusa Tenggara and West Kalimantan in Indonesia. All three KIAT Guru treatment arms saw statistically significant improvements in student test scores relative to a control group, as summarised in the table below:

 

Treatment

Details of the Treatment

Student Test Scores2

Teacher Attendance Rate3

Control

  Mean=0.00 Mean=82%

SAM

Social accountability mechanism in which local community members evaluate individual teachers according to a pre-agreed scorecard every month 

+0.09 SD in language

+0.07 SD in maths

+2 percentage points 

SAM+Cam

Social accountability mechanism & making supplementary allowance>1 conditional on teacher attendance as recorded on a camera

+0.17 SD in language

+0.20 SD in maths

+4 percentage points

SAM+Scorecard

Social accountability mechanism & making supplementary allowance1 conditional on teacher scorecard evaluations

+0.11 SD in language

+0.09 SD in maths

-3 percentage points

Notes: SD=standard deviation. 1The performance-based pay mechanism was a deduction from a supplementary allowance allocated to teachers working in remote areas who were registered with the central ministry. 2All gains in student learning were significant relative to the control group. Tests of equality indicate that (a) gains under SAM+Cam were significantly different from gains under SAM and SAM+Scorecard (p-values: 0.003–0.070); and (b) gains under SAM+Scorecard were not significantly different from gains under SAM. 3All changes in the teacher attendance rate (the likelihood that teachers would be present in school) were insignificant relative to the control group. The parameter estimates remained insignificant with the addition of interaction terms for each treatment arm x teacher eligibility for the supplementary allowance. Source: Adapted from Gaduh et al (2020), Table 4, p. 29, and Table 7, p. 32.

 

These results are encouraging—so much so that the Indonesian government is currently expanding the most effective treatment arm, SAM+Cam, to more schools. Yet there are some intriguing questions about what exactly led to these improvements. For example:

  1. How can we explain the fact that even the SAM-only treatment arm even without monetary incentives led to significant improvements in test scores?
  2. In their archetypal study of cameras, financial incentives, and teacher attendance in India, Duflo, Hanna, and Ryan found that linking teacher compensation to teacher attendance rates, as recorded by tamper-proof cameras, led to a 21 percentage point decrease in teacher absenteeism and a 0.17 SD gain in test scores. In contrast, the KIAT Guru SAM+Cam treatment arm had a similar impact on test scores (0.17–0.20 SD), but no significant impact on teacher attendance. If better teacher attendance did not cause the learning gains, then what did?
  3. Why were test score gains 1.5 to 2 times larger for SAM+Cam, where the performance pay mechanism was based on a thin indicator (i.e., showing up at school), as compared to SAM+Scorecard, where the incentive relied on a thicker evaluation of teacher practice?

While I don’t have definitive answers, here are some speculations about each question, drawing on findings in the KIAT Guru paper. (A forthcoming paper analysing detailed qualitative observations from nine of the treatment schools will shed further light in the coming months.)

1. Clear delegation matters: why SAM led to improvements in teacher practice and test scores

At RISE, we think of education systems as a set of dynamic, interconnected accountability relationships. Each relationship is made up of principals, who entrust something of value (e.g., their children’s time, some line items in a budget, their votes in an election) to agents, to whom the principals delegate certain expectations to be fulfilled. When the agents are teachers, it’s sometimes assumed that this delegation is obvious: teachers are expected to teach students what they should know. But what if the curriculum is wildly overambitious given students’ current skill levels; or if curricular expectations don’t match the expectations implicit in exams; or if teachers get conflicting expectations from the central ministry, district office, and local community? When delegation is unclear or unrealistic, pitfalls abound.

The social accountability mechanism (SAM) studied in this paper offered a channel for clarifying the expectations that local stakeholders delegate to teachers. In each school, this mechanism involved a series of facilitator-led meetings to elicit perspectives from students, alumni, parents, community members, and teachers about how different stakeholders could improve school and home learning environments. These perspectives were then formalised into a multi-stakeholder service agreement. The service agreements, in turn, formed the basis for the scorecards used to evaluate teachers each month.

In contrast to generic government documents that enumerate teachers’ official duties in less-than-inspiring terms, the community-level SAM process of articulating and agreeing on the priorities that are delegated to teachers (and reminding teachers of these priorities at monthly meetings) may have helped to orient teachers’ efforts toward these valued goals, which in turn led to student learning gains—even in the SAM-only treatment arm that did not deploy performance-based pay.

2. In accountability relationships, motivation is complex: why SAM+Cam improved test scores without reducing teacher absenteeism

 Besides delegation, other elements of accountability relationships in an education system include information that the principal uses to gauge the agent’s performance, as well as motivation, or how the agent’s well-being is affected by their performance of the delegated tasks.

If all of these elements worked in a straightforward, mechanistic way, we might expect the following theory of change for the SAM+Cam treatment arm—which was the pathway reflected in the Duflo, Hanna, and Ryan study:

Introduce a financial incentive based on teachers’ presence in school >> Teacher absenteeism decreases >> Student outcomes improve

Yet, as noted above, the KIAT Guru study found no evidence of the middle step in this pathway: SAM+Cam had no significant impact on the likelihood that teachers would be present in school. But it did have some positive impact on the likelihood that teachers would be working during the time they spent in school, and on the amount of time that teachers allocated to curricular teaching—even though neither of these were explicitly incentivised under this intervention.

This suggests that (a) teachers (like other humans) don’t always respond to financial incentives in straightforward ways, and (b) teachers also respond to non-financial carrots and sticks. In this case, it seems likely that the SAM delegation process mobilised local stakeholders’ attention to teaching and learning more generally. Compared to the control group, parents across all treatment arms reported more frequent meetings with teachers to discuss learning-related issues, and school principals conducted more lesson observations and performance evaluations. The desire to measure up to stakeholder expectations—whether for a reputational glow or to avoid complaints and social sanctions—can be a powerful source of motivation.

(It is also important to note that households in the treatment arms not only engaged with teachers more frequently, but also spent more time supporting children with their homework—and, in the SAM+Cam arm, invested significantly more money in paid tutoring. Thus, test scores increased not only due to teacher effort, but also household contributions.) 

3. Information is complex, too—especially in combination with motivation and delegation: why SAM+Cam had a larger effect than SAM alone or SAM+Scorecard

But if the social scrutiny from SAM was responsible for motivating teachers to change their practice, why did SAM+Cam have larger effects on student learning than the SAM-only treatment arm? And why did it also have larger effects than the SAM+Scorecard treatment arm, which gave punitive weight to the social accountability scorecards?

To address the second question first, the monthly scorecard evaluations included a broad range of indicators, some of which had room for subjectivity—unlike the camera-validated attendance records. Perhaps because of this subjectivity, local committees in the SAM+Scorecard treatment arm reported significantly more pressure to award high scores to teachers, and significantly greater likelihood of receiving threats from a teacher or principal against awarding low scores, compared to either the SAM-only or SAM+Cam treatment arms. The local committees, in turn, often found it difficult to resist this pressure, because the average committee member was less educated than the teachers they were evaluating—and, hence, were lower in the social hierarchy. (Similar issues about social status in teacher evaluations have been observed in a separate study in Indonesia and a study in India.) Hence, when motivation in the form of a financial incentive was combined with information that teachers regarded as debatable, it isn’t surprising that the +Scorecard element didn’t have much effect on student learning above the effects in the SAM-only treatment arm.

To return to the earlier question, if SAM on its own improved teacher practice and test scores, and if SAM+Cam had no impact on the incentivised teacher presence indicator, why did SAM+Cam have 1.5 to 2 times the effect of the other two treatment arms? One possibility is that teachers considered the user committees legitimate enough to validate their attendance in school using a time-stamped, tamper-proof smartphone camera, which had far less room for user error and subjectivity than the scorecards.

Diagram of the accountability triangle.
Source: Adapted from WDR (2004) and Pritchett (2015)

 

Another possibility—and here I move beyond empirical foundations to conceptual speculation—is that SAM+Cam not only clarified the goals that were delegated to teachers (as discussed above in #1), but also improved the coherence between two accountability relationships that directly involve teachers: the citizen power relationship from the local community to schools and teachers, and the management relationship from the education ministry to schools and teachers.

SAM+Cam explicitly harmonised a priority that was delegated to teachers (being present at school) in the citizen power relationship, with an aspect of motivation (how teachers were paid) in the management relationship. Furthermore, information on teacher presence was recorded on camera by school staff, who are part of the management relationship, and subsequently verified by user committees, who are part of the citizen power relationship. The alignment between these two relationships may have boosted shared efforts to improve the learning environment. (Although SAM+Scorecard also involved both the citizen power and management relationships, teachers’ scepticism about the scorecard evaluations may have hampered any comparably constructive alignment.) Speculation notwithstanding, learning gains from improved stakeholder alignment are not without precedent. In a prior study in Indonesia, Pradhan and collaborators found learning gains from an intervention that facilitated joint planning meetings between school committees and village councils.

In short, teacher accountability instruments are complicated, and they may improve teacher practice, but not necessarily in the ways you expect. Here’s hoping that the ongoing expansion of KIAT Guru leads to more gains in student learning—as well as gains in our understanding of how to meaningfully improve accountability in education systems.

 

 

 

 

Yue-Yi Hwa is a Research Fellow for the RISE Programme at the Blavatnik School of Government, focusing on teachers and management. She recently submitted her PhD thesis in education at the University of Cambridge. Her thesis looks at the relationship between teacher accountability policy and sociocultural context across countries, using secondary survey data on education and culture alongside interviews with teachers in Finland and Singapore. Previously, Yue-Yi taught secondary school English for two years through Teach For Malaysia, and was a research fellow for the Penang Institute in Kuala Lumpur. She has also conducted research for the World Bank’s MENA education team. She holds a master’s degree in comparative government from the University of Oxford.

 

RISE blog posts reflect the views of the authors and do not necessarily represent the views of the organisation or our funders.