Propositional Evaluation and Julian King’s e-bike utopia
Updated: Mar 2, 2023
Demonstrating the value of Propositional Evaluation (PE) requires examples that show how fun, creative, and useful this approach can be to co-creating propositions, programs, or interventions that work! This example is based on a hypothetical example of an e-bike subsidy described by Julian King. Thanks to Julian for some comments on a draft, of course, I may still not have represented his views accurately.
Now I don’t know much about e-bikes or interventions designed to increase cycling. So, the things I consider below should be considered symptomatic of the thinking involved in PE and not at all a claim to know how to run an e-bike program. I am going to take the position of a public sector team responsible for implementing Julian’s vision. I will italicise the key sections of the Program Design Logic (PDL) diagram being discussed as we go.
In a nutshell, PE is about testing the logic of a policy or program at any point in its lifecycle. The evaluand is a proposition. It has premises and a conclusion. The proposition is made up of outputs of our actions, assumptions, and a direct intended outcome. Each of these is written in the form of a subject and a predicate, ‘who/what is in what condition’. The general formula is ‘outputs + assumptions = direct outcomes.’ Evaluation, or testing the logic is then about whether the proposition is valid (the argument makes sense on paper) and well-grounded (the extent to which each premise is actually manifested in reality). Towards the end of this example is a little explainer about valid arguments.
Let’s start by thinking about what Julian’s e-bike subsidy program may be sufficient for achieving. If it’s done well this program will result in more people buying and using e-bikes. This would be the direct outcome of the subsidy and would be a success for us. There is a question as to whether the subsidy will be sufficient only for additional e-bikes purchases. How would we ensure they don’t gather dust in someone’s garage? Perhaps we can’t, so we content ourselves with a program that results in people buying more e-bikes. But I don’t think this would be enough to justify the program.
The intended direct outcome of Julian’s e-bike program is written as ‘Bike riders access the subsidy and purchase an e-bike that they otherwise would not purchase (deadweight losses are avoided).’ This is what success looks like to my team. Now we need to understand what it is necessary for us to achieve for this to happen. But before we do, let’s consider the place of indirect outcomes and external factors in PE. Julian may like e-bikes but his real reason for the program is the indirect outcomes of less traffic, fewer emissions, and healthier people. Etc. PE takes a radical approach here. We don’t overly concern ourselves with measuring outcomes for which the program could not be sufficient. We instead focus on being realistic about what our subsidy could actually achieve. Why? Because then we can focus on what we are really trying to do and do that well. In this case, my team is going to focus on e-bike sales, because in all honesty, that’s where we feel our mandate and resources put us and what we are being asked to achieve. We might note in our proposition the intended indirect outcomes and external factors that also affect e-bike use, such as bike lanes, or changing the attitudes of motorists to bikes or even inclement weather? But we are not addressing these and so it's not logical to judge our success based on factors outside our control. If we should be achieving more then tell it to those developing the strategy and allocating resources in the department.
In PE we are not focusing on what we can’t control or testing a hypothesis. We are building our program on past evidence or common sense – we don’t need to reinvent the wheel (pun intended!) and focus our energy on testing whether e-bike usage leads to fewer emissions. There is an exception to this rule in PE of focusing on what the proposition can be sufficient for, rather than measuring the contribution to its indirect outcomes. This is the case when we want to compare one intervention to another with the same intended indirect outcomes. We address that role for evaluation later in this article.
For the same reasons we need to be realistic about our outcomes, we need to be disciplined in thinking about what we can actually do. We can’t do absolutely everything we would like; we just don’t have the mandate or the resources. We may or not be able to set different subsidy amounts for different groups or exclude those that would have bought the bike anyway. But what is necessary, or to use a lean start-up term, what would we need for our minimal viable product (MVP)? Ensuring this MVP and our direct outcomes will be the product of two things. The first is the conditions that result from our actions. The second is the conditions in the form of assumptions we make about the context in which our actions are implemented. Our general formula is that ‘outputs + assumptions = direct outcomes.’ You can read more about replacing arrows in program logic with addition and equals signs using PDL in this blog.
The art of making assumptions is that we must name those that are really important and about which we are not 100% sure. There are simply too many assumptions about everything we do in life to list them all. For example, we probably wouldn’t list as an assumption that terrorists don’t target e-bike riders (although Julian might consider some motorists do in fact act in this way). We clearly are assuming this, but it doesn’t appear sufficiently likely to put it up in lights and spend our limited evaluation resources looking into it. We may however assume that ‘E-bike retailers and manufacturers or importers have a sufficient supply of e-bikes to meet the newly created demand’. Is this a reasonable assumption? I don’t know. Subject matter experts may be able to tell us straight away. Or they may see, ‘yes that is an important assumption that is necessary for this subsidy to generate our intended outcomes, let’s do some market research first just to make sure’. So, you can see, we are already thinking evaluatively in the design phase. One last quasi-assumption is worth mentioning. We noted in our diagram that it is perhaps an assumption that a subsidy is more efficient than a rebate. This is not technically an assumption within the proposition because it is not part of our proposition. We wouldn’t be considering it as part of the equation ‘assumptions + outputs = direct outcomes.’ However, it is a question as to whether we are on the right track and might want to consider the logic of our intervention and whether there might be a better alternative.
So, to this point, we have discussed the intended direct outcome and one of our key assumptions. We have set aside the indirect outcomes and external factors as important to consider – but contra popular opinion that treats programs as hypothesis or promises, not the main game when we treat programs as propositions. So, let’s continue further into the heart of the matter or main game. What are we actually planning to do with this subsidy program?! We focus here on our outputs – that is, the conditions that our inputs and actions need to be sufficient for bringing about. As noted in the diagram, a lot more attention needs to go on articulating these inputs and actions, and on ensuring we warrant or justify that these will deliver our outputs. While the means are important, they are only valuable if they create the outputs and so we work backward and focus first on intended outputs.
In the PDL diagram above, we have listed four outputs and I’ve put a fifth in the dashed text as at this stage I am uncertain whether we can do different levels of subsidy for different groups. If we did, then we would need to test the logic of the different amounts. Two of these output condition statements are about riders, one about retailers, one about my staff, and one about the subsidy itself. Note that all assumptions, outputs and outcomes are written in the form of a subject and a predicate, ‘who/what is in what condition’. The first of these outputs is ‘People who won’t buy an e-bike due to cost know about the subsidy (target riders)’. There is a lot going on here, but this seems to state the core first step toward our direct outcome. It is up to the team to work out what will ensure the target audience knows about the subsidy. In practice we know different people will respond to different things, so achieving this condition will be no easy feat. We will need to learn or leverage information about how to find our target riders most effectively. The point is that we have been clear about what we need to achieve by setting out a subject and a predicate, ‘who/what is in what condition’. This is not ‘more people are aware’ because what is ‘more’ – is one enough? No, it’s a condition we need to achieve – in practice, we will collect data and evaluate how many, why and how they were made aware, and perhaps we can refine our program or understanding of the condition that we really need to bring about. There is scope for a lot of useful evaluative activity to understand how this condition may be brought about ‘for whom under what circumstances, to what extent and how’ as the realists have long understood. We can refine and make the condition statement more precise as we learn more, maybe we find out that we need to ensure 85% are aware to ensure sufficient purchase activity. If so, then now we have an even more precise PDL. Suffice it to say that we need to spend considerable time in the design phase working out what are the specific conditions we need to bring about – so that then in the delivery phase we have specific conditions that we are monitoring and evaluating.
We have not spent a lot of time talking about all the inputs and actions that will be necessary to achieve these outputs – but we really should be putting these on the diagram so we can evaluate them. You may or may not find the outputs in the diagram persuasive and compelling or sufficient for generating our direct outcome. If not, my team would like to discuss this with you, with intended beneficiaries and with others about what we really must achieve on the way to generating our direct intended outcome. In the design phase, we want to know whether our inputs and actions and outputs appear necessary and in the delivery phase whether they were in fact achieved and to what extent. We must also be alert to the risk that we are over-engineering our program. How many conditions are actually necessary for our direct intended outcome? Maybe some of these outputs are not necessary and we can cease activities. For example, maybe it turns out the shaping subsidy amounts to different groups is not in fact necessary. Numerous evaluation methods, including but not exclusively counterfactuals, can be useful for identifying premises in our proposition that are not in fact necessary. We might also learn this by talking to people, as well as through an experiment where we did or did implement this part of the program in some times and places but not others. In any event, we only want the necessary conditions in our proposition, otherwise, it is inefficient.
Reasons. So far, we have focused on the conditions or premises in a proposition that should be necessary and sufficient for an outcome. At this point, let's return to the concept of reasons to accept the proposition as valid. A proposition will generally make sense and is likely to be valid if subject matter experts honestly agree, that ‘if we do x we will achieve y and that will be sufficient for generating z’. PE requires people to discuss their reasons for accepting such a claim. In argumentation theory, this is about having warrants that allow us to draw inferences. In practice, it is very hard to be definitive about reasons. It is remarkable how individual humans fall prey to biases and fallacies but as a group, as long as people are free to say what they really think, and we avoid power dynamics that lead to group think, we are quite good at coming to reasonable conclusions – this kind of discussion is called dialectic. This is not the place to talk about formal and informal logic, induction, and deduction – but in many cases, it is likely, if SMEs discuss a draft PDL and agree the proposition is warranted (i.e., outputs + assumptions = outcomes) we are in a good place. If they can’t agree – or the program legitimately doesn’t make sense, you can always ‘park’ the discussion by adding an assumption even if it seems ‘heroic’ (sometimes we re-label this as a ‘constraint’ so it is less offensive) – and now you have a proposition that is valid (even though you are pretty sure this assumption is not well-grounded). Assumptions really are your get out of jail free card! And by naming them, you have a chance of ensuring they are considered in any empirical evaluation or other evidence-gathering exercise. It can only strengthen the program to at least be explicit about those assumptions which presents substantial risks to the program.
When we feel we have a valid proposition, or as valid as it can be before we are told it must go live, it's time for delivery and empirical data collection. What type of evaluation is best at this stage? I will avoid the trap of arguing for one particular approach or method over another. Evaluators know it depends. As a default, you would want to collect some data about the extent to which each condition in your PDL is manifesting, as well as for whom and under what circumstances. But let us consider just one possible and likely type of evaluation that we might be asked to conduct. An outcome or summative evaluation. Given what we have said about external factors, we might not need an evaluation that measures changes in emissions or any other indirect effect of more e-bike usage. This is because our intervention could not be sufficient for it on its own. But let’s suppose Julian’s neighbour thinks it would be much more cost-effective to reduce emissions with a program of free maintenance to older vehicles to reduce their emissions. This might require a Cost Benefit Analysis (CBA) or if we have the same outcomes metric, we could calculate an Incremental Cost Effectiveness Ratio (ICER). If the similar outcome metric needs to be dollars, then we might ask – which of these propositions has a greater Net Present Value (NPV) or Benefit Cost Ratio (BCR). The car maintenance program may be able to measure a reduction in emissions much more easily than Julian’s e-bike program – but that doesn’t mean it is more valuable, just that it is more amenable to a CBA and more likely to have a quantifiable direct economic impact. The danger is that this kind of analysis leads to a misleading oversimplification of which program is better. But I know Julian is enlightened and has developed an approach to Value for Money (VFM) that may be less precise in some of its measures, but probably far more accurate than many CBAs. But let’s return to an evaluation of the proposition on its own merits and what we can usefully know about it – the focus of propositional evaluation.
Finally, I will address a criticism sometimes leveled at PE. This is that often times the outcomes of our interventions, or anything we do in the world, are not what we intended, and we can’t expect to think through all the outcomes that will result from our actions at the outset. Yes, it is highly likely there will be unintended outcomes, or we might even generate our intended outcomes but not for the reasons we thought. There are always unintended consequences both positive and negative that we cannot foresee, unintended positive outcomes would be our good luck, but luck is not a reasonable basis for evidence-based policy. We need sound propositions. If we can’t first check that our intervention is a good idea on its own merits, what hope do we have for a more ambitious evaluation? This makes PE a natural accompaniment to ex-ante CBA in the deliberation of new policy proposals.
In conclusion, the point of PE is to treat the evaluand as a proposition. It does not treat the evaluand as a promise to deliver an outcome that will be the result of many things outside our control. It is not a hypothesis to be tested. It is not a claim to be the best or only way to do something. It is a proposition or a plan for a specific course of action that may or may not be a good idea. We can evaluate its logic in the design phases as well as the delivery phase. We need to know if the program is sound – that is, if it could logically deliver the intended direct outcomes. Other forms of evaluation may aim at replicable knowledge. PE has more humble concerns and is dedicated to understanding how it could, is, or is not achieving what it reasonably sets out to achieve.
PE is only part of the approach to evaluating a value proposition. VFM and systems evaluation might look more broadly at the merits of a proposition in a broader context. Realist Evaluation might help with the warrants or reasons to accept that program activities will be sufficient for outputs and outcomes. Randomised Controlled Trials (RCTs) might be useful when applied to a part of an intervention, or a very mature intervention, or when we really must measure outcomes for a CBA. The interface between PE and other forms of evaluation, especially as relates to value propositions is an area for further exploration that I will continue to work on with interested persons.
Comentarios