The battle against educational inequality

In the Netherlands, the COVID-crisis seems to have exacerbated discrepancies in quality of education: the already-wide gap in educational outcomes between students from disadvantaged socio-economic groups and those from the middle and upper strata of our society is widening. Warning us against Americanisation of our school-system, UvA sociologist Dr. Bo Paulle appeared in NPO documentary-series ‘Klassen’ about this subject, and we spoke to him about current research on new educational programmes which are being implemented in the USA and in The Netherlands that seek to mend this problem. He is a proponent of using rigorous randomised controlled trials (RCTs) to evaluate these programmes.

1. Recently there has been some talk about new educational programmes and their need to be “evidence-based”. Why does this seem to be such a new trend in education policy: why was it apparently not evidence-based before?

It’s tough to pin-point the root-cause for this, but it might have to do with the fact that historically there have been ethical qualms about assigning educational programmes to schools at random to research their efficacy (which is one of the conditions of randomized controlled trials). But I would turn that around on its head and state that a lot of the well-intentioned interventions that have been rolled out so far actually turn out to have negative treatment-effects and a huge percentage of them have very weak or no treatment effects, so the real ethical issue is actually “are we going to keep throwing money away at ineffective programmes, or are we going to take this seriously and methodically evaluate these programmes through proper controlled randomization studies (RCTs).

2. Can you explain why RCTs are important in research about educational interventions?

Well, the crux is: we do not want our education system to become like that of the United States, so we need programmes that actually work. How do we know they work? Well, there’s this thing called a randomized controlled trial, for which there is a giant body of statistical literature.

Let me emphasize that we need to be careful and specific with the question: “which programmes work?” - because educational programmes can be implemented by different organizations, with different teams, in different locations. So the question can’t just be “does programme X produce good results?”, the question must be “Does programme X, implemented by organization Y, in context (i.e. city) Z produce significant positive results? And in comparison to what?”. We need to forget about the big general terms like “tutoring” but instead break it down to the nitty-gritty of what the specific method of tutoring is, and investigate which organization implements it, investigate the short-term and long-term effects on the students. Politicians, however, are often seduced into following soundbites. I don’t really blame them, though. They are likely very time constrained.

I like to think of it in comparison to cancer research: in the 1960s when they started doing research on cancer, the scientific community thought in terms of “lung cancer” and “breast cancer”: big catch-all terms. Nowadays, they’re getting really specific and they know that there’s a hundred different kinds of lung cancer, for example. And that’s what we need to do with these educational reforms and programmes: we have got to get really specific about evaluating proven effective interventions operated by specific organizations.

3. Are the educational interventions we’re talking about mainly concerning the most disadvantaged students, or about education in general?

Personally, I am mostly focused on the most at-risk groups, and I think that the long-term cost/benefit profile is most attractive in these cases, because these kinds are where most societal costs are being incurred: if these kids get side-tracked you’re talking about jails, welfare, severe human suffering, their kids suffering, obesity, etc. These all tie into failure in education, for these groups.

But I would say, in general, ongoing RCTs are the best method to evaluate any educational programme, for any demographic. Second best are probably cohort-analyses.

4. What are some of the important issues when evaluating an education programme?

One of the more important issues is that of scalability. This is where some promising programmes fail: they work on a small-scale when implemented by a ‘dream-team’ of interventionists / educators, but when you have average people trying to implement it, they suck. With the best of intentions, people who set up small scale experiments often choose the most awesome people to run their little pilot with. They’re not trying to game or cheat the system, they’re just very motivated and believe in their programme and they want to do it right. But then you have a dream-team in phase 1 of the RCT and then in phase 2 or 3 when you scale-up, you’ve got ‘Johnny Average’ trying to implement the programme and guess what: it might suck.

Anyone in charge of education policy should embrace these kinds of complexities to be able to evaluate these programmes properly.

5. Could you tell us something about current promising research?

I am really interested in the small-scale RCT, but even more so in the medium to large scale RCTs because that’s where the scalability is tested. So for example Saga’s [Saga is an organization implementing educational programmes in the USA] High Dosage Tutoring programme started in America on a very small scale, with tremendous results, and then they replicated those results twice, with (I think) populations 1200 treatment and 200 control. Well, if you’re tutoring 1200 kids, you cannot have a very select dream-team, and despite that the programme still has really, really strong results. A recent paper verified this again.

The High Dosage Tutoring programme that’s being researched here in the Netherlands is Bridge HDT and it’s a 2 (students) on 1 (tutor) programme. I’m excited about forthcoming results of a new model which is part tutoring per tablet / pad and part live-tutoring. So instead of one on two, it’s one on four: on Monday, two kids get live instruction and two kids are on the pad (and get brief, 5 minutes instruction), and you switch them the next day. Tablets offer monitoring, so we can see what the students struggle with, and maybe this leads to more effective live-days and fully live programmes.

Again, maybe we’re wrong: maybe it will not work. RCTs allow us to test this.

The complexity and scalability of these studies are quite manageable, certainly for people capable of studying econometrics. And it’s really exciting that people are finally getting the importance of rigorous evaluation of education. Another example:

 I heard about this study, I believe it was in France, and the assumption was that “if we give a bunch of disadvantaged kids cheap Chromebooks they’ll be able to learn more effectively remotely during Covid”. It was tested with a RCT and it turned out the effects were negative: the kids just started gaming more and doing less work. Who would’ve thought [sarcasm]...  So, with the best of intentions it had a negative impact on these kids’ lives. 

6. Okay, so policy-making seems to slowly shift towards evidence-based evaluation being important, but on some Dutch political parties’ websites, there’s still advocacy for ‘summer schools’ and ‘smaller classes’ and they’re being hailed as part of the solution to current education-deficits, despite research indicating that summer schools are largely ineffective and sometimes affect outcomes negatively. Why do you think this “let’s do what we think might work” is still a thing when it comes to policy making?

Part of it is just that politics trumps everything else, and saying “kleinere klassen” is very easy and believable. It fits on a bumper sticker, whereas if you’re gonna say “we should have organization X implement phonics [a class of language-education methods] to 6 year old and only 3 days a week, without volunteer-tutors, and the tutors have to be college educated”, that doesn’t fit on a bumper sticker, but it is the more scientific approach. Policymakers and politicians however might just not have time for this complexity, and just go for the bumper-sticker catchphrase.

But indeed, the evidence is very choppy for smaller classes. There are some great indications, for example a study by Oosterbeek, who used Swedish data from 2014 and looked at long term effects on earnings, and there’s also some indication about smaller classes working on the short-term.

You’re also looking at broad policies (‘smaller classes’) versus specific programmes like Bridge High Dosage Tutoring (HDT), and that is where I come in, because very often some (not all) of these specific programmes are very, very effective, when implemented by specific organizations. And when I say “very effective” I also mean cost effectivity, so if smaller classes are costing $3500 per kid per year, it’s only natural to consider the alternatives for the same cost per kid per year, and if you look at the HDT programmes in the Netherlands, I don’t think there’s much of a debate about which option is more cost-effective (it’s HDT).

7. Are programmes like that prone to getting diluted, i.e. changing over time and becoming ineffective? Either in the USA or NL?

That’s the exciting thing about SagaHDT being operated on a very large scale in an extremely tough environment: Chicago.

It’s like they say: “If you can make it in New York, you can make it anywhere”, believe me, in terms of educational reforms: If you can make it in the South side and the West side of Chicago: you can make it anywhere. There’s nowhere in Europe where it is as challenging to implement such programmes as in these places in Chicago. And they’ve done it at scale, so until further notice I’d say this is really interesting and promising. So if the conditions can be set up such that The Bridge and other organizations can bring their own evolving versions of Saga HDT to the Netherlands and other countries as well, then that would offer great opportunities for education and for ongoing research.

8. What do you think of the frequent comparisons of our educational system and those of Scandinavian countries, for example?

Compared to other countries, I definitely think that the Netherlands tracks too early and there is fairly compelling evidence that suggests tracking later would be a net benefit to Dutch society. Sure, there will be some children who may have a penalty because they don’t go off to their gymnasium at age 12, but from what I have read I assume that it would be more than offset by the disadvantaged children who profit from being around a wider mix of other kids, and perhaps not internalizing a negative self-categorisation at age 10 or 11, for example. This whole self-fulfilling prophecy phenomenon of “oh you’re treating me like I’m stupid? I’ll act stupid” can be extremely damaging to children at the bottom of the educational hierarchy.

However, things like early selection are deeply rooted in the Dutch educational system and are slow and hard to change.

9. What do you think of the discussion in the Netherlands about private tutoring, sometimes called ‘schaduwonderwijs’ recently?

It’s a no-brainer that an increasing need for private tutoring is going to contribute to growing inequality, so there is a real political and societal issue here about privatisation of schools and informal learning opportunities. Listen, ‘laissez-faire’ might sound great on paper, but do you want to drift towards American-style inequality with everything that goes along with it? As a budding quantitative researcher, those are the kinds of questions I would ask myself. From my perspective of having grown up in the USA and seeing ghettos from very close by for years and years, I would say “laat het niet zo ver komen”. Be an activist, but be a scientifically rigorous and independent one, and orient your life around the kind of social scientific research that you think lines up with your values. The coming 50 years will bring about a tsunami of forces that can plausibly contribute towards growing inequality, and I urge you to think about how to get involved in rigorously interrogating attempts to scale up measures to counteract that. 

Further reading:

(1) - Interesting blog by the late prominent psychologist and professor Robert Slavin about education and research on education.

(2)    de Ree, J., Maggioni, M. A., Paulle, B., Rossignoli, D., & Walentek, D. (2021). High dosage tutoring in pre-vocational secondary education: Experimental evidence from Amsterdam. (link:

(3)       - An organisation which implements HDT in the Netherlands


(5)  - The UvA Scalable Education Programs Partnership seeks to join academic researchers' expertise with that of teachers and policy-makers

