• As we look deeper into our evaluative methods, Strategy and Impact Director, Andrew Berwick explores the different approaches we take to evaluating our programme 
  • He notes how that experimental methodologies are successful in medical testing, but that complexities within social policy interventions poses barriers to this form of evaluation 
  • We summarise how we attempt to answer the ‘big questions' about impact through a range of methodologies

This is the first in a series of posts on evaluation and monitoring at West London Zone. We will be exploring the different approaches we take to evaluation, why we do it, and some of the challenges of evaluating our service effectively. Through this series we will dive into a range of topics including our ongoing external evaluation, how we try to understand the preventative ‘value’ of our work, and how we try to understand what does and doesn’t ‘work’ within our programme.

Experimental evaluation methods in public policy

This post highlights some of the ‘big questions’ for WLZ around evaluation. I started thinking about this when reading a great article in the Guardian written a few weeks ago by Stian Westlake, CEO of the Royal Statistical Society – in which he calls for the use of more experimental evaluation methods in public policy.

Instinctively I align with Stian’s view. I remember being really excited sitting in the first-ever ResearchEd conference 10 years ago as Professor Ben Goldacre painted a picture of how evaluation should work within schools. He argued for several innovations in educational evaluation, including far more widespread use of randomised trials – these are experimental evaluations where participants are divided (randomly)  into a ‘treatment’ group and a ‘control’ group, to enable a comparison of the performance of the two groups. 


My feeling at the time – and since – is that more and better evidence on ‘what works’ is a good thing. So when I joined West London Zone 18 months ago one of the big attractions for me was that two of our values were being evidence-led, and holding ourselves accountable to children, families and commissioners. As an organisation, we strive to answer the question – what change are we driving? To answer this, we have commissioned a 4-year external evaluation with University College London* to try to answer this question – and we’ll talk about that evaluation later in this series of posts.

However, I came to Stian Westlake’s piece with more wariness than I did when listening to Ben Goldacre make a similar case a decade ago. There are challenges when it comes to taking a purely experimental approach: complexity and scale. Ultimately, this has driven our decision to use a range of approaches to understand our impact – each imperfect but, we believe, contributing to our evidence base over time. The case for randomised trials in medicine is very clear: Stian Westlake points this out to great effect. However, I think he omits a major difference between medical interventions, and those in the social/policy space – the complexity of the intervention itself. Stian cites the example of a series of randomised medical interventions during the COVID-19 pandemic to understand which treatments were effective in treating the virus: as he says this was an excellent example of the use of randomised trials.

"Ultimately, this has driven our decision to use a range of approaches to understand our impact – each imperfect but, we believe, contributing to our evidence base over time."  Andrew Berwick, Strategy & Impact Director 

No narrow cure 

However, I believe it’s harder to evaluate our work at WLZ than to understand the impact of a course of medication. We are not delivering a narrow ‘cure’ for a specific ‘condition’ affecting young people on the programme: we are attempting to do the opposite. We believe the young people we work with have great potential and unique strengths, and face specific and often complex factors preventing them from fulfilling this potential. This could be a special educational need; challenges outside of school that impact their mental health; or a struggle in building relationships with their peers – and often they face more than one of these challenges.

We meet each child where they are – understanding their needs and co-designing a 2-year programme of support. This is not the same as prescribing a pill, and we know that change will take more than a few weeks to happen. This creates challenges for applying experimental methods: how do we identify an effective ‘control group’ for each child given their own unique circumstances? What is the correct period of time to observe change? Is it even possible to gather reliable data on the emotional and social wellbeing that we believe is so important to understand and measure?

We meet each child where they are – understanding their needs and co-designing a 2-year programme of support. This is not the same as prescribing a pill

An issue of scale

Beyond the issue of complexity is an issue of scale. We work with approximately 1,700 children across our Zone in west London. This is relatively large for a third-sector organisation – but tiny in the context of large scale randomised trials. Yet we feel very keenly the need to hold ourselves accountable for the work we do, and to understand ‘what works’ as we grow within London and consider expansion outside of the Zone.

Our approach to monitoring and evaluation

Given these challenges, we need to deploy a range of methods for understanding our impact. We know that all of these methods are imperfect in isolation – but we believe that over time the evidence from these different methods will help us to understand what is and isn’t working. These methods include internal monitoring of ‘pre’ and ‘post’ measurements of academic success, emotional wellbeing, and peer relationships and quasi-experimental methods that create a ‘comparison group’ of children, so that we can compare their outcomes to young people on our programme. We also gather a range of formal and informal qualitative data to understand not simply ‘what works’ but also ‘what matters’ for children, families and practitioners.

Over the next few months, we’ll speak about the range of methods we’re using, what we think each one can tell us, and some of the limitations of these approaches. We don’t believe there is a simple answer to ‘how to evaluate’, but we hope that these posts will help others to learn about some of the choices we have made – and to challenge and offer better solutions.

*The evaluation is being conducted by the Centre for Education Policy and Equalising Opportunities and the Helen Hamlyn Centre for Pedagogy. We’ll be posting more detail on this in the Summer