AI – Jason Kitcat

Some thoughts on AI

“Anyone who tries to tell us they know the future is simply trying to own it.” — Margaret Heffernan in her introduction to Uncharted

Take a deep breath. Nobody knows what is going to happen with AI. You aren’t missing out, and lots is yet to be worked out, if it ever will be. Now is a time for learning and experiments.

It is absolutely incredible what AI models can achieve, and their capability has progressed significantly in the last few years. So I’m not suggesting they should be dismissed. Not at all. But that’s not to say I’m convinced AI will fundamentally change the structure of all public services and private enterprise. Here are my thoughts on why I think we need to be a bit more circumspect.

AI for coding doesn’t translate to all contexts

Much of the hype is being driven by the tech sphere getting excited by how helpful AI is proving in producing code. Techies are going to overvalue that capability, but being handy with coding doesn’t translate to being good at legal judgements or policy trade-offs.

There are a number of reasons why AI seems good at coding:

Code is fairly predictable;
There is lots of programming content online to train the models on (the Internet is full of coders!);
Compilers (the things which interpret code) provide helpful error messages which the models can use to adjust and correct their output with, providing a positive feedback loop.

There are not many (any?) other fields of work where these all apply. Some thought medicine would offer ‘low hanging fruit’ but as an example, despite radiology having the highest level of approved AI applications, radiology roles continue to increase in number.

There are also questions on how sustainable AI coding is for major projects. For hobby software, quick little scripts and prototyping, AI coding has fundamentally lowered barriers and empowered people, no doubt. But maintaining, testing, and iterating complex enterprise software is a different ball game where AI’s limited context window and unreliability become more challenging. If nobody actually wrote the code, who understands what it is doing? Cognitive debt is the new name for this challenge.

Without good data, AI can’t help much.

In my work, we spend lots of time thinking about how to achieve various policy outcomes in support of businesses and the economy. Many of our biggest blockers are around data quality and definitions. How do we identify specific types of businesses to do things with or for them? How could we define an economic sector or area of activity? Invariably, we don’t have data which directly tells us, so how can we infer what we need and what happens when it goes wrong? How does a company get included/excluded from the policy scope without creating a mass of bureaucracy for them and us?

AI and machine learning can help us around the edges of some of these challenges, but ultimately, we need high-quality, structured data first before we can apply any techniques or technologies. Getting there can require changing laws, service redesign, and data infrastructure. These aren’t really technology problems first and foremost.

How much does this cost compared to the alternatives?

While AI is currently becoming much cheaper in terms of cost per use, that’s only part of the picture. It is still expensive compared to many deterministic options such as automation platforms, which many will already have access to through their existing licensing. We also need more difficult-to-recruit skills to properly manage and deploy AI. If we compare it to the move to the cloud, I don’t need engineers looking after racks of servers in the basement anymore. However, I need a team of more expensive cloud engineers to manage the environment we depend on. So while we can do things not previously easily possible, we probably are seeing a greater total spend.

I keep coming back to the ‘missing’ productivity gains all our previous investments in technology are supposed to have given us. I don’t think the billions we spend on public sector technology are wasted, but they’re not a direct driver of cashable savings or productivity in most cases. However, we can do things we previously couldn’t have done: You can renew your passport with a selfie, identify high-growth overseas markets with a few clicks, and undertake mass-scale studies on anonymised NHS data. Amazing, but still we need to keep everything else going on top of the new stuff. AI is more stuff to buy, build, and run. It lets us do things not before possible, which is also amazing, but someone has to pay for it.

How reliable is good enough?

People make mistakes all the time, so we shouldn’t be holding a technology up to an unrealistic standard. But we’ve spent hundreds of years learning how to manage human error in our organisations and ways of working. AI makes mistakes in different ways, ones we are not really used to spotting and accounting for.

This academic post on reliability, using insights from the nuclear and airline industries, is incredibly helpful in explaining that for reliability, we don’t just need accuracy, we need four dimensions of reliability:

Getting things right consistently (Consistency)
Don’t fall apart when conditions aren’t perfect (Robustness)
Being open about lack of confidence rather than confidently guessing (Calibration)
When mistakes do happen, the mistakes are fixable rather than catastrophic (Safety)

What the academics argue is that while AI capability (accuracy) has been improving fast, the improvement in the conditions for reliability has been modest. And the worst issue is AI models aren’t good at knowing when they’re wrong. An AI agent’s degree of confidence in an answer does not correlate to the true accuracy. These are pretty fundamental issues that will need much more work.

AI Spam vs AI filters

In the meantime, I worry about race conditions – as people use AI to spam applications for government services, contact centres and so on; we’re going to put barriers up to getting through to the real decision-makers and doers. Who loses out in that scenario? I fear the genuine service user without access to the latest tools. Also, was the friction AI now overcomes in the process, actually part of the service design?

Looking internally within organisations: If a supplier’s agentic flow is going to seek payments on every contract variation it spots in email chains, our agentic flow will appeal each one and also try to catch the supplier on every breach of their service level agreements with us. We’ll then put agents on the escalations and before you know it, we’ll have an arms race.

We’re seeing this in recruitment where so many AI bots are causing havoc in the world of applying, reviewing, and interviewing. So far, the best countermeasure has been to insist on in-person interviews to prevent AI-assisted cheating on technical tests.

Where does this take us? Well, if I was selling AI usage, a race condition of AI spam vs AI filters sells more usage than ever. For the rest of us, we must think really carefully about what we need to own, control, and understand in this emerging world. I can’t see the value in all of us regurgitating AI-created text, which is AI-summarised and then AI-replied to. Writing is thinking; it’s not simply communications.

So where next?

I worry we’re just too easily impressed by the apparently fluency with which LLMs are conversing with us. We, certainly in the UK, tend to have strong biases towards good talkers. It’s a remarkable achievement to have machines able to do so, and on pretty much any topic you throw at them. Now is the time to keep experimenting and learning, as there is so much we still need to think about and understand. Questions I’m keen to explore centre on reliability, trust, accountability, and deeply considering what it is we expect from people in their work. We are living in interesting times, and we must remain open and curious, certain in the knowledge that the future hasn’t been determined; it is wide open for us to discover.

AI + LLM reading list for public servants

It’s clear to me we are in the midst of an AI hype-cycle and I’m skeptical of claims companies are making which directly serve their valuations. Throwing tech into our problems will not solve them.

But this isn’t to say that there isn’t something interesting going on, there is. Machine learning, data science techniques, large language models and all the other stuff being labelled as ‘AI’ are something public servants need to keep a watching brief on, and carefully experiment with. To that end I’ve been sharing some reading I’ve found helpful with colleagues, and I’ve brought all that into one place here.

Benedict Evans is, to my mind, one of the best commentators and analysts out there and his latest essay is extremely helpful for thinking things through: AI and the automation of work.

I really enjoyed a session on Large Language Models (LLMs) at our away day. So many great discussions delving into the philosophy of knowledge. Two recent articles fed into my thinking for the event. Firstly this one in MIT Technology Review exploring why asking an LLM to complete a test like the legal bar exam, is a flawed approach to understanding LLM’s capabilities. Secondly Simon Willinson posted the transcript of his recent sort of ‘year in LLMs’ talk which really helps to focus on the strengths and weaknesses of the current state of the art.

Innovation professor Ethan Mollick writes of his concerns about what the “write this for me” button powered by LLMs means for productivity and meaning. I don’t know if I agree but it’s thought provoking. I have heard that HMRC are already receiving LLM-created letters seeking to reduce people’s taxes with false understandings of tax law. But it still adds to their workload.

Here’s a report of a prototype powered by GPT-4 that lets you draw software that then gets coded for you. What the quality of that code is, I don’t know. But an interesting possible future for our work?

I’m sceptical of the productivity claims being made for Large Language Models (LLMs), but constantly searching for new analysis and insight into this field. Manchester University’s Professor Richard Jones has written a fascinating blog post on this topic, featuring a really interesting example on the use of AI in protein folding for pharmaceuticals. Definitely worth a read.

Large Language Models like GPT are all the hype rage at the moment, so if you want to really understand how they work then Stephen Wolfram has written an epic explanation. Or the simpler version I’ve seen online is “it’s just spicy autocorrect.” Whatever you think of the hype, here’s NCSC’s guidance on their use in government.

How the natural language interface of LLMs makes securing them so hard, enter the world of ‘prompt engineering’.

Max Roser is one of the lead members of “Our World in Data” a wonderful online data resource. I recently read his article from December 2020 Artificial intelligence is transforming our world — it is on all of us to make sure that it goes well and it’s as timely as ever.

Milton Mueller, a Professor at the Georgia Institute of Technology, The Basic Fallacy Underlying the AI Panic, is a punchy argument against fears that “AI will take over”.