Helping vulnerable populations through adaptive field experiments

An adaptive field experiment aiming to integrate Syrian refugees into urban labour markets in Jordan sheds light on the effectiveness of different policies, while targeting the welfare of experimental participants.

Randomised Controlled Trials (‘RCTs’) are a popular method to learn about the effectiveness of policy interventions and inform policy decisions. Usually, this happens in two very distinct phases, and these are often separated by several years at least: first, implement the experiment and learn about the effectiveness of the different interventions; second, provide this information to policymakers so that they can incorporate such lessons into their work. Crucially, the first stage involves randomly allocating individuals to one of several intervention groups. In the typical trial, individuals are allocated to these groups in equal proportion, and irrespective of whether prior information suggests that they would benefit more from one intervention as opposed to another.

This approach allows researchers to learn as much as possible about the effectiveness of each intervention. However, it is not designed to be maximally beneficial for the individuals that take part in the trial. This feature is far from ideal – indeed, there are many settings where study participants are vulnerable and in need of urgent support and ignoring their welfare completely is simply ethically unviable. One such setting involves policy towards refugees. The treatment of refugees in host countries is a critical policy challenge around the world (Loiacano and Silva Vargas 2023). One key question – on which there is relatively little previous research – concerns the challenge of integrating refugees into host labour markets.

Policy options to help Syrian refugees in Jordan

This was the challenge that confronted us in 2018 – when we started working with the International Rescue Committee (“IRC”, a large humanitarian NGO) to think about policy options to help Syrian refugees in Jordan, and to help local Jordanian job-seekers. Specifically, we wanted to learn about which policy is most effective at helping our respondents to find jobs – and for our results to feed back to help improve the design of the experiment and increase the welfare of study participants in real time.

To do this, we drew upon two key features of our experimental setting: (i) job-seekers reached out to IRC on a rolling basis, and (ii) we wanted to target short-term employment outcomes (specifically, employment six weeks after treatment). Together, these features meant that we could build an algorithm to learn from employment outcomes as the experiment was ongoing – and use that algorithm to update the probability of different treatment assignments for respondents yet to enter the programme (find a technical description of our algorithm at the bottom of this article). This enabled us, over time, to target the interventions to those who benefitted most.

Results: Labour market opportunities in urban Jordan

We tested three separate treatments – inspired by recent empirical research in low-income urban labour markets, and by discussions from qualitative research on refugees and the challenges they face. First, a cash treatment: this involved a one-off unconditional cash transfer, worth about a month of average expenditure. Second, an information treatment, in which job-seekers were helped to signal their skills to prospective employers. Third, a nudge treatment, designed to strengthen job-seekers’ motivation for job search. To these three treatments, we added a control group (where the probability of assignment to control was allowed to vary over time, in the same way that the treatment probabilities did).

Our experiment generated four key findings about these treatments:

Six weeks after being offered treatment, none of the interventions had significant or meaningful impacts on the probability that refugees are in wage employment.
The cash intervention had large and significant impacts on refugee employment and earnings, two and four months after treatment. For example, four months after treatment, the grant boosted employment by 3.8 percentage points (a 70% gain over the control mean) and earnings by 65%. Consistent with the existence of binding liquidity constraints, we find that these impacts are driven by individuals with below-median expenditure at baseline, and that baseline expenditure is significantly associated with job search intensity in the control group. (Our six-week surveys were conducted by different enumerators – not known to the respondents – and we speculate that respondents may have been unwilling to discuss informal employment with these enumerators.)
The information and nudge interventions also boosted job search among refugees, and had significant impacts on employment and earnings after two months. However, these impacts are smaller than those of the cash grant and ultimately were short lived: four months after treatment, we found weaker and insignificant impacts of these interventions.
We found essentially no effects of our treatments on the Jordanian sample. This highlights a key policy challenge of active labour market policies: the effectiveness of different policies is likely to depend heavily upon the characteristics of the targeted groups (and, in particular, the Jordanians in our sample had larger baseline expenditure, searched at higher intensity and found jobs faster than the Syrian refugees – suggesting that the Jordanians faced quite different search frictions than the Syrians).

Did the adaptive algorithm help?

For the first eight weeks of our experiment, the probability of assignment was 25%, for all treatments. From week nine, the algorithm started learning, and moved the probabilities of assignment accordingly. However, the average proportions of individuals assigned to the various treatments did not depart very much from 25% for any treatment – so the use of an adaptive algorithm did not generate large gains in average employment in our experiment. This was driven by the fact that our experiment targeted employment at the six week point – where we did not find large treatment effects.

However, six-week employment did respond to treatment more strongly for some specific subgroups – and these subgroups therefore did see some gains from adaptivity. For example, by the end of the trial, we assigned 60% of Syrian women without tertiary education or work experience to the cash condition. Consistent with this, we find that the optimal targeted policy has a treatment effect on six-week employment that is one percentage point larger than the optimal non-targeted policy (though the confidence sets overlap). This provides suggestive evidence on the benefits of targeting.

Finally, as a counterfactual exercise, we simulated what the performance of the Tempered Thompson Algorithm would have been had we targeted two-month employment. The simulations indicate that the algorithm would quickly have directed participants towards the optimal interventions – and potentially could have doubled the employment gains of a standard RCT. This highlights the critical importance of effective choice of the targeted outcome.

Adaptive experiments as the way forward for field experiments in economics?

Adaptive algorithms present exciting possibilities for field experiments in economics (see, for example, the VoxDev article by Kasy and Sautmann (2021)). However – like any empirical method – adaptive algorithms have their strengths and their weaknesses.

So – when should such algorithms be used? Practically – as our experiment illustrates – adaptive experimentation allows researchers to generate a feedback loop from experimental results to future experimental participants and to improve the outcomes of the respondent population. This requires researchers to collect relevant outcomes on a reasonably short time-frame (a few weeks or months, for example – rather than years), and to have a suitable data pipeline such that those outcomes can be uploaded in good time to update the algorithm. Adaptive algorithms could be used across a wide variety of social programmes – including, for example, in education, in social welfare programmes, in health interventions, and so on. These are all likely to be amenable to adaptive experimentation as each has clear and measurable policy goals (learning outcomes, benefit take-up, health outcome) and a rolling intake of recipients whose outcomes might be very important to policymakers (students, claimants, patients).

Conceptually, researchers need to be able to describe a clear policy objective, and to link their measured outcomes directly to that objective. In our case, this was straightforward: working with IRC, we wanted to maximise wage employment at the six-week point, and we were able to measure this directly. One can imagine settings in which the policy objective is described by some combination of measured variables – possibly using short-term measures as ‘surrogate outcomes’ for longer-term policy objectives.

We are excited by the possibilities for adaptive experiments in development economics – and, in the appendix to our paper, we provide a detailed discussion of several lessons that have emerged from our study, and some general guidance on extensions to the basic algorithm. For instance, we discuss the use of surrogate outcomes, continuous (rather than binary) outcomes, choice of sample, strata and wave size, choice of prior, budget constraints, inference, non-stationarity, and alternative adaptive assignment algorithms.)

In sum – adaptive experiments are not going to replace traditional fixed-proportion field experiments, and nor should they. However, this kind of method clearly has much to offer in many important contexts: particularly in settings where researchers have a clear policy objective, where respondent populations are vulnerable, and where field teams are able to feed back data reasonably quickly. In particular, adaptive algorithms can help to bridge the ‘research-policy gap’ – by updating policy decisions directly and quickly. Humanitarian settings – such as that in Jordan – are one case where this seems particularly useful.

Technical description of the adaptive algorithm

Our algorithm involved three components. First, we used a Bayesian Hierarchical Model to estimate the probability, for each of our treatments, that a job-seeker would successfully find a job within six weeks. (The ‘hierarchical’ feature of this model allowed us to estimate this probability separately across sixteen strata, generated by the interactions of four dummy variables: (i) a dummy for being a Jordanian rather than a Syrian national, (ii) a dummy for identifying as female, (iii) a dummy of having completed high school, and (iv) a dummy for having experience in wage employment.) Second, this model allowed us to estimate – for any job-seeker entering the programme at any point in time – the probability that each one of our treatments was optimal for that individual. Finally, we used that estimated probability to assign individuals to different treatments (specifically, we did this using what we term a ‘Tempered Thompson’ approach; this is a simple modification of the famous Thompson (1933) algorithm, currently used in many online advertising and recommendation settings). Interested readers can find more detail about adaptive experiments – and example code – at https://maxkasy.github.io/home/Adaptive_Abidjan_2024/.

Supported by

Helping vulnerable populations through adaptive field experiments

Stefano Caria

Maximilian Kasy

Simon Quinn

Grant Gordon

Soha Osman

Alex Teytelboym

Policy options to help Syrian refugees in Jordan

Results: Labour market opportunities in urban Jordan

Did the adaptive algorithm help?

Adaptive experiments as the way forward for field experiments in economics?

Technical description of the adaptive algorithm

Further reading on adaptive experiments and integrating refugees