Dr Chelsi Slotten - The AI Devil is in the Details

Image showing a polished interface with a backend full of errors including missing data, edge cases, unexpected behavior, broken filters, incorrect outputs, and accessibility issues.

At this point, most of us have seen something built with AI that looks impressive.

You can take an idea and turn it into a prototype faster than ever before. Interfaces appear in minutes. Things that used to require careful wireframing in Figma or Adobe XD can now be “vibe coded” into existence with relatively little effort.

And to be clear, that part works.

AI is genuinely good at helping you move from idea to prototype quickly. It gives people something tangible to react to. It accelerates early-stage thinking and lowers the barrier to getting something in front of users.

But that’s also where the illusion begins.

Because the moment you move beyond that initial prototype, you run straight into the reality that the devil, as it always has been, is in the details.

The Prototype is the Easy Part

A generated interface might look good. It might even mostly work. But it’s still just a prototype.

The real work begins when you try to turn that prototype into something that functions reliably as a product. And that’s where things get messy:

Integrations need to work properly
Data needs to be pulled through correctly
Business rules need to be enforced
Edge cases need to be identified and handled
Costs need to be controlled at scale
Teams need to understand what they’re actually shipping

This is the layer of work that determines whether something is usable, not just plausible. It is also where AI starts to struggle, not because it’s “bad”, but because it lacks the precision, consistency, and deep system understanding required to make those systems hold together.

When “Looks Right” is not the Same as “Works Right”

I saw this play out recently on a project building a search interface.

The initial version looked good. It appeared functional. But once we interrogated it more closely, issues emerged. Data that existed in the database was not being surfaced correctly. Some fields were connected; others were not. Search results were incomplete or inconsistent.

None of this was immediately obvious from the interface itself. It looked right.

Then came iteration, the point where you’d normally expect a system to improve. Feedback was fed into the AI, and a second version was produced. Some issues had been addressed, but new ones had appeared.

Even small, seemingly safe changes had unexpected side effects.

Updating brand colours, for example, caused the layout to shift. Light and dark themes behaved inconsistently. Accessibility requirements, such as colour contrast, were no longer reliably met. These are not edge cases; they are foundational aspects of a usable interface.

What was notable was not that bugs existed (all code has bugs, it’s normal) but that relatively small and contained requests triggered broader, unrelated changes. The system did not localise changes in the way a human developer typically would.

The Compounding Cost of “Small” Changes

The more we iterated, the more instability we saw. Functionality that had worked in the first version stopped working in the second. Filters behaved unpredictably. Data outputs changed without a clear reason.

In one case, instead of returning a known email address, the system started generating plausible-looking URLs for personal websites. The database we were connecting to does not include website details and none of these URLs resolved because they weren’t real websites.

From a product perspective, that’s a serious issue. Not just because it’s wrong, but because it introduces unpredictable behaviour that wasn’t requested. That unpredictability forces a shift in how you work.

You can no longer assume that a small change is contained. Every change introduced the possibility of wider, unintended effects. Which meant retesting everything, every time a change was requested

The 80% Problem

AI is very good at getting you to ~80%, a respectable starting point. It is much less effective at completing the work to a production-ready standard.

The final 20%, the part that defines whether something is usable, reliable, and trustworthy requires:

Understanding what the system is actually doing
Diagnosing why something is not behaving as expected
Making targeted, reliable changes without causing regressions

When AI has generated a lot of that code, that work becomes harder, not easier.

Developers still need to debug, interpret, and fix what’s happening but now they’re doing it with code they didn’t author, created by a system that doesn’t fully understand the context of the work it is undertaking. There is a reason only 44% of AI generated code survives into user commits (SWE-chat: Coding Agent Interactions From Real Users in the Wild)

So, while the start of the process is faster, the end often isn’t.

Are We Actually Saving Time?

AI makes the early stages of development, aka prototyping, visibly faster, but a lot of the invisible work increases:

More QA cycles
More regression testing
More effort to understand what changed and why
More time spent validating outputs

In practice, that often means we’re not actually reducing time to value we’re just redistributing it. The effort shifts from structured development into investigation, debugging, and validation.

Producing something quickly is not the same as delivering something reliable. If additional cycles are needed to stabilise the system, the overall timeline may not improve and may, in some cases, extend.

The Risk of Misplaced Trust

There is a broader behavioural shift underpinning this.

Most people are used to digital systems being deterministic. If you input the same data into a calculator, you expect the same output every time. That expectation of reliability carries over, often unintentionally, into how people engage with AI systems.

But AI systems do not function in the same way. They produce outputs that are plausible, fluent, and often convincing, but not necessarily correct.

Because those outputs look credible, they are easy to trust, which is exactly where the risk lies.

What “Details” Actually Means

When we talk about details in AI systems, we are not referring to minor refinements. We are referring to the conditions that make a system usable and trustworthy:

Data accuracy and completeness
Integration integrity
Business logic enforcement
Accessibility and compliance
Auditability and traceability
Clear ownership and accountability

These are the things that turn something from a demo into a product. They are also the areas where probabilistic systems are least reliable without close human involvement.

Taking the Details Seriously

Taking the details seriously does not require a fundamentally new discipline. It requires applying existing product and engineering rigor more consistently. This looks like:

Explicitly identifying risks and failure modes early in the process
Designing for validation and oversight
Building in human review where it matters
Setting realistic expectations with stakeholders
Maintaining strong product and engineering discipline

In other words: doing the work we’ve always needed to do but more rigorously and earlier in the process.

Final Thought

AI makes it easier than ever to create something that looks finished.

But products aren’t judged on how they look. They’re judged on whether they work reliably, predictably, in line with user needs, and at scale.

That has not changed.

And that’s still defined by the details.