Systematic Reviews: Can They Tell Us What Works in Education?

31 October 2016

Following the launch of the largest systematic review on education, are we any closer to finding what works in education?

How strong is the evidence base for what works in education in developing countries? At best mixed and certainly lacking in ‘magic bullets’ was the verdict at the launch of the largest ever systematic review (SR) on education at the What Works Global Summit

3ie’s report documents and synthesises evidence from 238 studies in 52 countries, finding that few educational interventions have large and consistent effects, particularly where learning outcomes (as opposed to participation) are concerned.  Among the most promising are merit-based scholarships, school feeding, structured pedagogy and remedial classes.     

How should policy-makers interpret these findings?  While the report provides an invaluable guide to the state of the evidence from robust causal studies, does it offer guidance on what works which can be readily implemented?  Can the findings be employed to address the learning crisis? 

Clearly the policy-maker needs a lot more – the report acknowledges, for example, the shortage of the costs data required for serious comparisons between alternative interventions. But there must also be a thorough understanding of political economy, education system and context if barren empiricism is to be avoided.  Can a particular intervention actually be implemented; and if so would it actually work in a given system, given its existing features and dynamics?

The external validity of SR findings depends on a well-developed understanding of the mechanisms behind an intervention’s efficacy (and the problems it solves), as part of a broader theory of change.  Remedial classes provide an example.  If, in India, for example, remedial classes are effective largely because the curriculum is over-ambitious and mainstream classes are appropriate for only the most able pupils, then remedial classes are a solution to “the wrong problem”. Teaching at the right level in all classes would offer much greater potential to improve outcomes.

External validity concerns aside, transparency of method along with objectivity and simplicity of findings, are indeed strengths of the SR approach.  And no other method can effectively compare and reduce the findings of a large body of studies so readily as meta-analysis (statistical combining of results across studies). But comparisons rely on assumptions.  Suitably comparable pairs of intervention and outcome must be identified in adequate numbers.  Outcomes must be measured in ways that can be rendered directly comparable and they must be measured in appropriate samples and populations to permit valid comparisons (and especially for aggregation or “pooling”).  In education, more so than in certain areas of medicine, this is more easily said than done.      

A particular issue for SRs in education is the comparison of effect-sizes based on test scores.  It is common in meta-analysis to compare effect-sizes based on tests from different grades, with different curricular content, at varying levels of difficulty, and reported on different scales.  The approach is usually to standardise results by reporting effects as standardised mean differences (SMDs) between treatment and control groups. Such a transformation works well for interval scale measures, as often used in medicine.  But since test-score scales are usually entirely dependent on the items included in a particular test and on the sample to which a test is administered, there is no underlying scale to which individual tests may be anchored.  Tests designed specifically for comparison, such as PISA are an obvious exception, but are very rarely used in research which makes its way into SRs. 

Even when outcome measures are directly comparable, interventions frequently are not.  In the case of school feeding, for example, the intervention might be considered sufficiently similar across contexts to allow comparison and synthesis of effects in studies with comparable outcomes, but often interventions are more complex and systemic. Reforms such as decentralisation, for example, are inextricably linked with the systems to which they belong; “the same intervention” has only a very broad interpretation, arguably too broad to warrant pooling of studies.

Given the relatively small effect-sizes reported for even the most successful interventions included in education SRs, the prospect for combining such interventions to provide solutions to the learning crisis or to under-performing education systems are slim.  This evidence is nonetheless invaluable as part of the toolkit of the judicious policy-maker. One who is attuned to the need to interpret comparisons from education SRs as indicative only and possessed of a well-developed contextual understanding of the relevant systemic theory of change. 


Caine Rolleston is a key research associate on the Young Lives project, an ambitious research project that is tracking two different cohorts of children in four different locations: Peru, Vietnam, Ethiopia and Andhra Pradesh in India. This data is one of the only multi-country longitudinal studies that can track children from early ages into their teen years that includes learning outcome measurements.


Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd><h1><h2><span><h3><h4><p><div><img><td><tr><table>
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.