Business as usual: The NGO Eduasusual mobilizes funding for keeping girls in school. They put together a package of interventions (school feeding, awareness raising, teacher training, etc.) and start implementing it. The package is the same for each of the 150 schools, and roughly based on a “Theory of Change” designed by an external consultant who flew in and drafted the project document. At the end of the intervention, the organization commissions an external evaluation to understand if the approach has worked or not. So far, so boring.
And that is where outcome real-time monitoring is coming in: Instead of documenting school attendance in a paper-based registry, target schools use a simple SMS-based system (like UNICEF’s EduTrac). School attendance is tracked on a daily basis and analysed on an online platform.
This allows the NGO to quickly try out what works – and what does not – on the outcomelevel (!). Together with a group of pupils, headmasters and teachers, they put together three different packages of intervention. Each package is implemented in 50 schools. Over the next three months, they closely follows what happens. After a while, they identify the set of interventions that was able to rise school attendance. The analysis is based on the attendance data collected as well as an internal review with the students, teachers and headmasters involved at the beginning. With this “proof of concept”, they mobilize more resource, scale up the successful intervention to all 150 schools. With additional funding, they put again together two variations of the successful model and try it out in 100 additional schools.
Is there a problem with ethics? No, not really. The NGO does not withhold any benefits from some schools – they simply do not know yet what works and what does not. Covering all schools in the country from the beginning is not possible anyway with limited funds.
Donors could use a similar approach: They fund five NGOs with the simple goal to keep more girls from skipping school. How they do that is completely up to them: no logframes, no detailed planning of activities, no detailed reporting. But: Donors continuously track how well the NGOs perform in keeping girls in school. After 6 months, they review the data and scale up what worked.
Development aid is changing rapidly. Development evaluation needs to change with it. Why? And how?
PLANNING IN A COMPLEX, DYNAMIC ENVIRONMENT REQUIRES MORE AND DIFFERENT EVALUATIONS
Linear, mechanistic planning for development is increasingly seen as problematic. Traditional feedback loops that diligently check if an intervention is ‘on-track’ in achieving a pre-defined milestone do not work with flexible planning. In their typical form (with quarterly and annual monitoring, mid-term reviews, final evaluations, annual reporting, etc.), they are also too slow to influence decision-making in time.
A new generation of evaluations is needed – one which better reflects the unpredictability and complexityof interactions typically found in systems, one which gives a renewed emphasis to innovation, with prototypes and pilots that can be scaled up, and which can cope with a highly dynamic environment for development interventions.
Indeed, this is an exciting opportunity for monitoring and evaluation to re-invent itself: With linear, rigid planning being increasingly replaced by a more flexible planning approach that can address complex systems, we find now that we need more responsive, more frequent, and ‘lighter’ evaluations that can capture and adapt to rapidly and continuously changing circumstances and cultural dynamics.
We need two things: firstly, we need up-to-the-minute ‘real-time’ or continuous updates on the outcome level; this can be achieved by using, for example, mobile data collection, intelligent infrastructure, or participatory statistics that can ‘fill in’ the time gaps between the official statistical data collections. Secondly, we need to use broader methods that can record results outside a rigid logical framework; one way to do this is through retrospectively ‘harvesting outcome’, an approach that collects evidence of what has been achieved, and works backward to determine whether and how the intervention contributed to the change.
MULTI-LEVEL MIXED METHODS BECOME THE NORM
Although quantitative and qualitative methods are still regarded by some as two competing and incompatible options (like two-year olds not yet able to play together, as Michael Quinn Patton has formulated in his recent blog), there is a rapidly emerging consensus that an evaluation based on a single method is simply not good enough. For most development interventions, no single method can adequately describe and analyze the interactions found in complex systems.
Mixed methods allow for triangulation – or comparative analysis – which enable us to capture and cross-check complex realities and can provide us with a full understanding, from a range of perspectives, of the success (or lack of it) of policies, services or programmes.
It is likely that mixed methods will soon become the standard for most evaluations. But the use of mixed methods alone is not enough; they should be applied on multiple levels. What we need is for multi-level mixed methods to become the default approach of evaluation, and for the qualitative-quantitative debate to be declared over.
Whatever one might think about the merits or fallacies of result-based management, development evaluations have to deal with one consequence: A broad agreement that what ultimately counts – and should therefore be closely monitored and evaluated – are outcomes and impact. That is to say, what matters is not so much how something is done (=outputs, activities and inputs), but what happens as a result. And since the impact is hard to assess if we have little knowledge on outcome results, monitoring and evaluating outcomes becomes key.
There is one problem, however: By their nature, outcomes can be difficult to monitor and evaluate. Typically, data on behaviour or performance change is not readily available. This means that we have to collect primary data.
The task of collecting more and better outcome level primary data requires us to be more creative, or even to modify and expand our set of data collection tools.
Indeed, significant primary data collection will often become an integral part of evaluation. It will no longer be sufficient to rely on the staple of minimalistic mainstream evaluations, the non-random ‘semi-structured interviews with key stakeholders’, the unspecified ‘focus groups’, and so on. Major primary data collection will need to be carried out already prior to or as part of an evaluation process. This will also require more credible and more outcome-focused monitoring systems.
Thankfully, there are many tools becoming available to us as technology develops and becomes more widespread: Already, small, nimble random sample surveys such as LQAS are in more frequent use. Crowdsourcing information gathering or the use of micro-narratives can enable us to collect data that might otherwise be unobtainable through a conventional evaluation or monitoring activity. Another option is the use of ‘data exhaust’ – passively collected data from people’s use of digital services like mobile phones and web content such as news media and social media interactions.
So there is good reason for optimism. The day will soon come when it is standard practice for all evaluations to be carried out by mixed methods at multiple levels, with improved primary data collection enabling us to evaluate what really counts in our interventions: the outcomes and the impact.
Not sure what you think about the new UNDAF guidelines, but here are some of my thoughts after an initial review:
Despite the positive change in how the guidelines, annexes and companion guidance are presented, this appears more of an evolutionary updaterather than a ‘SDG revolution’. True, the guidelines include a clearer focus on the SDGs (good!), but I do not really see any radical departure from how the UN has done business in the past. Links to the SDGs in the results framework remain tentative at best. The new UNDAF guidelines have clearly taken on some of the features of One UN, but most of the prescriptive, practical guidance remains similar to the 2014 UNDAF updates.
Having said that, there are some really positive changes in the new guidelines, in my view. For example:
The central role of Leave No One Behind for everything the UN does, which – at last – provides again an overall vision for the entire UN. If taken seriously, this could become a very powerful mobilizer, internally as well as externally.
The inclusion of a strong theory of change: although a buzzword and very much en vogue, it can indeed help – if properly done – to focus the UN’s work and provide a common framework for working together
It is a good thing that the Common Country Analysis is now a minimum requirement: This is important since solid theories of change (and the subsequent results chains) require a detailed, logical problem analysis as a first step. Otherwise, they tend to fail.
It is positive that the UNDG RBM handbook is included as a pillar for the UN’s work. Although not perfect, the RBM handbook provides a common, reasonably clear set of definitions, frameworks and tools to all UN agencies.
Another step in the right direction is that UNDAF data should, where possible, be publicly accessible (in line with IATA).
However, I feel that the updated UNDAF guidance overall falls short on a number of issues which would have shown that business as usual is not an option. I was particularly surprised by the lack of attention to innovation. Although there is some reference to real time data collection and vague and general support of innovation, this is certainly not the big push to ingrain innovation into the DNA of the UN and to catch up with some of the more advanced techniques for innovation (positive deviance, using big data, testing, Minimum Viable Products, etc). I would especially have expected a stronger push for adaptive programming, which appears to quickly become mainstream in development organizations.
So overall, my personal feeling – after a first read – is that the new guidelines are not as different from the 2014 version as I would have expected and the nice, modern lay-out initially suggests: The guidelines pick up some of the One UN elements (which is good!), provided a bit more clarify and simplified a few things, but overall, my impressions is that resulting new UNDAFs will not look radically different from what we know today – and that the way the UN will cooperate/coordinate at the country office level may not dramatically change.
Has anyone tried to apply the new guidance to an UNDAF formulation? Would love to hear your experiences.