“Anyone who tries to tell us they know the future is simply trying to own it.” — Margaret Heffernan in her introduction to Uncharted
Take a deep breath. Nobody knows what is going to happen with AI. You aren’t missing out, and lots is yet to be worked out, if it ever will be. Now is a time for learning and experiments.
It is absolutely incredible what AI models can achieve, and their capability has progressed significantly in the last few years. So I’m not suggesting they should be dismissed. Not at all. But that’s not to say I’m convinced AI will fundamentally change the structure of all public services and private enterprise. Here are my thoughts on why I think we need to be a bit more circumspect.
AI for coding doesn’t translate to all contexts
Much of the hype is being driven by the tech sphere getting excited by how helpful AI is proving in producing code. Techies are going to overvalue that capability, but being handy with coding doesn’t translate to being good at legal judgements or policy trade-offs.
There are a number of reasons why AI seems good at coding:
- Code is fairly predictable;
- There is lots of programming content online to train the models on (the Internet is full of coders!);
- Compilers (the things which interpret code) provide helpful error messages which the models can use to adjust and correct their output with, providing a positive feedback loop.
There are not many (any?) other fields of work where these all apply. Some thought medicine would offer ‘low hanging fruit’ but as an example, despite radiology having the highest level of approved AI applications, radiology roles continue to increase in number.
There are also questions on how sustainable AI coding is for major projects. For hobby software, quick little scripts and prototyping, AI coding has fundamentally lowered barriers and empowered people, no doubt. But maintaining, testing, and iterating complex enterprise software is a different ball game where AI’s limited context window and unreliability become more challenging. If nobody actually wrote the code, who understands what it is doing? Cognitive debt is the new name for this challenge.
Without good data, AI can’t help much.
In my work, we spend lots of time thinking about how to achieve various policy outcomes in support of businesses and the economy. Many of our biggest blockers are around data quality and definitions. How do we identify specific types of businesses to do things with or for them? How could we define an economic sector or area of activity? Invariably, we don’t have data which directly tells us, so how can we infer what we need and what happens when it goes wrong? How does a company get included/excluded from the policy scope without creating a mass of bureaucracy for them and us?
AI and machine learning can help us around the edges of some of these challenges, but ultimately, we need high-quality, structured data first before we can apply any techniques or technologies. Getting there can require changing laws, service redesign, and data infrastructure. These aren’t really technology problems first and foremost.
How much does this cost compared to the alternatives?
While AI is currently becoming much cheaper in terms of cost per use, that’s only part of the picture. It is still expensive compared to many deterministic options such as automation platforms, which many will already have access to through their existing licensing. We also need more difficult-to-recruit skills to properly manage and deploy AI. If we compare it to the move to the cloud, I don’t need engineers looking after racks of servers in the basement anymore. However, I need a team of more expensive cloud engineers to manage the environment we depend on. So while we can do things not previously easily possible, we probably are seeing a greater total spend.
I keep coming back to the ‘missing’ productivity gains all our previous investments in technology are supposed to have given us. I don’t think the billions we spend on public sector technology are wasted, but they’re not a direct driver of cashable savings or productivity in most cases. However, we can do things we previously couldn’t have done: You can renew your passport with a selfie, identify high-growth overseas markets with a few clicks, and undertake mass-scale studies on anonymised NHS data. Amazing, but still we need to keep everything else going on top of the new stuff. AI is more stuff to buy, build, and run. It lets us do things not before possible, which is also amazing, but someone has to pay for it.
How reliable is good enough?
People make mistakes all the time, so we shouldn’t be holding a technology up to an unrealistic standard. But we’ve spent hundreds of years learning how to manage human error in our organisations and ways of working. AI makes mistakes in different ways, ones we are not really used to spotting and accounting for.
This academic post on reliability, using insights from the nuclear and airline industries, is incredibly helpful in explaining that for reliability, we don’t just need accuracy, we need four dimensions of reliability:
- Getting things right consistently (Consistency)
- Don’t fall apart when conditions aren’t perfect (Robustness)
- Being open about lack of confidence rather than confidently guessing (Calibration)
- When mistakes do happen, the mistakes are fixable rather than catastrophic (Safety)
What the academics argue is that while AI capability (accuracy) has been improving fast, the improvement in the conditions for reliability has been modest. And the worst issue is AI models aren’t good at knowing when they’re wrong. An AI agent’s degree of confidence in an answer does not correlate to the true accuracy. These are pretty fundamental issues that will need much more work.
AI Spam vs AI filters
In the meantime, I worry about race conditions – as people use AI to spam applications for government services, contact centres and so on; we’re going to put barriers up to getting through to the real decision-makers and doers. Who loses out in that scenario? I fear the genuine service user without access to the latest tools. Also, was the friction AI now overcomes in the process, actually part of the service design?
Looking internally within organisations: If a supplier’s agentic flow is going to seek payments on every contract variation it spots in email chains, our agentic flow will appeal each one and also try to catch the supplier on every breach of their service level agreements with us. We’ll then put agents on the escalations and before you know it, we’ll have an arms race.
We’re seeing this in recruitment where so many AI bots are causing havoc in the world of applying, reviewing, and interviewing. So far, the best countermeasure has been to insist on in-person interviews to prevent AI-assisted cheating on technical tests.
Where does this take us? Well, if I was selling AI usage, a race condition of AI spam vs AI filters sells more usage than ever. For the rest of us, we must think really carefully about what we need to own, control, and understand in this emerging world. I can’t see the value in all of us regurgitating AI-created text, which is AI-summarised and then AI-replied to. Writing is thinking; it’s not simply communications.
So where next?
I worry we’re just too easily impressed by the apparently fluency with which LLMs are conversing with us. We, certainly in the UK, tend to have strong biases towards good talkers. It’s a remarkable achievement to have machines able to do so, and on pretty much any topic you throw at them. Now is the time to keep experimenting and learning, as there is so much we still need to think about and understand. Questions I’m keen to explore centre on reliability, trust, accountability, and deeply considering what it is we expect from people in their work. We are living in interesting times, and we must remain open and curious, certain in the knowledge that the future hasn’t been determined; it is wide open for us to discover.