With Esther Duflo, Abhijit Banerji, and Michael Kremer winning the 2019 Nobel Memorial Prize in Economic Sciences, there is renewed interest and discourse around randomised controlled trials
With Esther Duflo, Abhijit Banerji, and Michael Kremer winning the 2019 Nobel Memorial Prize in Economic Sciences, there is renewed interest and discourse around randomised controlled trials (RCTs).
But RCTs are complex, and there seem to be lingering questions around the subject. From conceptual queries to the ethics of RCTs, here are some of those questions, answered:
An RCT is an evaluation technique that can be used to measure whether a particular programme is working: whether it has any impact, and how large that impact is. Essentially, it is an experiment designed to establish a cause-effect relationship, and isolate the influence that a particular intervention has on a certain outcome.
Participants in an RCT are randomly assigned to different groups — control groups and treatment groups. The concept of a control group and treatment group has roots in clinical trials, and the method of random assignment to these groups was developed through agricultural experiments in the 1920s.
The treatment group receives the programme or intervention being evaluated, while the control group does not. Statistically, both the control and treatment group are assumed not only to be representative of the larger group from which they are culled (and so what is discovered about them is arguably true about the larger group as well), but also equivalent to each other.
Before the programme or intervention is introduced, the two groups are thought to be the same. Proponents of RCTs believe that any difference that subsequently arises between them can then be attributed to the programme or intervention.
Control and treatment groups can be segregated at various levels: an individual level; or cluster levels — households, schools, villages, blocks, and so on —according to feasibility and ethicality, which will be discussed later.
However, it is important to note that RCTs do not always require a ‘no treatment’ control group. Randomisation can just as easily be used to compare different versions of the same programme, different interventions within a programme, or when resources are scarce — it can simply be a method of selecting who receives access to a particular intervention in a seemingly unbiased way. There are multiple ways of designing randomisation: lottery design, phase-in design, rotation design, encouragement design, and so on.
The RCT approach can be used across sectors and adapted to a number of different circumstances. In India, the first RCT was carried out with Seva Mandir in 1996, and today, RCTs are used in multiple sectors, including education, health, and agriculture.
There are often various factors at play within development programmes, and narrowing in on which variables most significantly affect outcomes can be challenging. RCTs are therefore used to zero in on which aspects of the programme are affecting change and creating impact.
Measuring impact often involves comparisons: to what extent has the programme affected a group or community compared to if it had never been implemented in the first place? Because this is a question that is difficult to measure directly, the control group serves as an indicator of what the absence of the programme would reflect (referred to as the ‘counterfactual’).
While there are other methods of examining this, RCTs are generally considered to be more rigorous and unbiased, and are less dependent on the assumptions that other evaluation techniques sometimes need to make.
What also distinguishes RCTs from other impact evaluations is that participation is determined randomly, before programme implementation begins.
The key players include programme implementers (such as governments or nonprofits), donors who fund programmes or evaluations, research centres, researchers, and the communities participating in RCTs.
There are a number of research centres that conduct RCTs including organisations such as the Abdul Latif Jameel Poverty Action Lab (J-PAL), International Initiative for Impact Evaluation (3ie), Innovations for Poverty Action (IPA), and What Works Network, among others.
J-PAL has 982 ongoing and completed RCTs, and 3ie estimates that their impact evaluation repository—a database of development programme evaluations in low- and middle-income countries—has 2,645 recorded RCTs.
Multilateral organisations such as the World Bank, the Asian Development Bank, and UNICEF, also use RCTs. At a government level, donor partners such as USAID and DFID; national government bodies such as the Government of Andhra Pradesh, Gujarat’s Pollution Control Board, or the Rajasthan police have also partnered with research organisations to conduct RCTs.
Private companies such as Mathematica Policy Research and Abt Associates, as well as nonprofits also routinely undertake RCTs to evaluate programmes and conduct policy analyses.
RCTs have been criticised on the grounds that ‘randomistas’ (as they are often referred to) are willing to sacrifice the well-being of participants in order to ‘learn’. Who participates in an RCT is also an ethical question that researchers must consider. It is often pointed out that due to randomisation, people who need a certain treatment do not receive it, while others receive a treatment they do not need.
Randomisation could also lead to potential conflict. If households within a particular village, for example, are randomly selected to receive a particular intervention while others remain in the control group, it could lead to disruption within the community. The lack of attention to the question of human agency is another limitation of RCTs.
Having Institutional Review Boards (IRBs) in place has become the norm for these kinds of studies, in order to protect the rights and welfare of participants. But these bodies are largely self-regulating, and beyond anecdotal evidence, it isn’t clear how well they have worked for development RCTs.
The debate around ethics aside, there are also certain design-based challenges that RCTs face. Here are some considerations to keep in mind:
1. What level to randomise at
The nature of the programme or intervention usually guides the researcher in deciding what level to randomise at. For example, if chlorine pills to treat contaminated water are being distributed, random assignment to control and treatment groups at a household level might not be viable.
Apart from ethics (giving one family pills for their water source but denying their neighbour), feasibility also needs to be considered. If the community drinks from a common tank of water, treating this tank would automatically make randomisation at a household or individual level unfeasible.
Even if households had individual sources of water, logistically, screening out control group households while distributing pills could be inconvenient, and ensuring that treatment group participants don’t share their pills with control group neighbours is difficult.
It may also not be politically feasible to randomise at a household level. Political leaders may demand that all members of their community receive assistance, and this demand could come from the community itself as well.
Proponents of RCTs like Duflo and Kremer say that, “all too often development policy is based on fads, and randomised evaluations could allow it to be based on evidence”. But not everyone is convinced that randomisation is infallible. According to economist Pranab Bardhan, “it is very hard to ensure true randomness in setting up treatment and control groups. So even within the domain of an RCT, impurities emanate from design, participation, and implementation problems.”
2. Threats to data collection
Statistically, the larger the sample size, the more it represents the population. Even when sample size is large enough, if respondents drop out during the data collection phase the results are susceptible to attrition bias. Attrition and failure by evaluators to collect data diminishes the size of the sample, reducing the ‘generalisability’ of the study. And if attrition is skewed more in either the treatment or the control group, and doesn’t occur at a roughly equal pace, the validity of the findings will be compromised.
Spillovers and crossovers also affect data collection. Spillovers occur when individuals in the control group are indirectly affected by the treatment. For example, if the intervention involves vaccinations when a significant amount of the population is vaccinated and becomes immune to a disease, ‘herd immunity’ could end up protecting individuals who may not have received vaccinations as well. Individuals who crossover, on the other hand, find themselves directly affected by the treatment. For example, if a parent transfers their child from a control group school to a treatment group school.
Impartial compliance refers to instances where the individuals within the treatment group choose not to participate. Statistical interventions can be used to produce valid results, but these come with certain assumptions—many of which randomisation aims to avoid in the first place.
3. Uncertain internal and external validity
Internal validity refers to the extent to which a study establishes a relationship between a treatment and an outcome. Randomisation can help establish a cause-effect relationship, but internal validity depends largely on the procedures of a study and how meticulously it is performed.
External validity, or generalisability, is more difficult to obtain. This refers to whether the same programme would have the same impact if replicated with a different target population, or if scaled up.
While the internal validity of RCTs is well-recognised as being rigorous, Nancy Cartwright and Angus Deaton question the external validity of RCTs. They call this the ‘transportation’ problem, where, “demonstrating that a treatment works in one situation is exceedingly weak evidence that it will work in the same way elsewhere.” Cartwright also points out that the rigour demanded to achieve internal validity is hardly ever found in establishing external validity.
RCTs are known to be prohibitively expensive, but since they have come to be synonymous with ‘hard evidence’, numerous governments and nonprofits have invested in them, and donors have been willing to fund them.
While it is difficult to estimate exact figures, there are certain ‘line item’ categories that contribute to an RCT’s cost structure:
Staff costs: Principal investigators, professors, field research associates, and a chain of other people all participate in pre-study evaluations, implementation, and post-study evaluation. Their fees, salaries, travel costs, and accommodation costs must be taken into consideration.
Data collection: There exist multiple rounds of data collection—typically baseline, midline, and endline studies. Depending on the kind of programme being evaluated, there may be multiple midlines and endlines over a particular period of time. These studies also have several sub-costs: staffing, training, technology, and incentives for survey participants, amongst others.
Intervention costs: Depending on the programme being evaluated, the intervention costs would differ. For example, medical treatments or something that requires input from the implementing organisation would increase costs, as opposed to an RCT that studies state-run programmes like subsidies or direct benefit transfers (DBTs).
Overheads and utilities: Office space, utilities, laptops, survey printouts, and other miscellaneous costs also significantly contribute to an RCT’s cost structure.
The complexity and scale of the RCT would determine how complex these buckets are in and of themselves, along with factors such as sample size or the design and duration of the study.
Insights in this explainer have been sourced from J-PAL’s Introduction to Evaluations, in consultation with other sources.
Ayesha Marfatia contributed to this article.