Post

LLMs Writing Code

Exploring the capabilities and challenges of large language models in software development.

LLMs Are Writing Code All Wrong

LLMs are now writing code, and they’re doing it shockingly well. From simple scripts to full-blown applications, the ability of large language models to generate working software has evolved at an incredible pace, but they are getting one core thing totally wrong.

The Code Is Amazing

There’s something magical about watching an AI generate entire functions, optimize queries, or even translate between programming languages in seconds. Need a recursive algorithm? A caching strategy? A multi-file project with a complex API? LLMs can take a prompt and deliver often elegant code without breaking a sweat.

But Sometimes the Code Is Wrong

When LLMs get it right, they look like geniuses. When they get it wrong, they look… well, confident.

And sometimes, the mistakes are subtle. A missing edge case. An off-by-one error. A function that almost works but breaks under a specific condition or references a library that doesn’t actually exist. They refactor, leaving stray variables and functions. If you don’t catch these mistakes early, they propagate. Worse, they take advantage of a complex architecture that you only prompted and didn’t really understand. Suddenly, you’ve got a ton of code… built on a flawed foundation.

The Real Danger: Generating Lots of Wrong Code

A human developer writes a bit of code, compiles it, tests it, and iterates. It’s an incremental process. You may code for an hour and produce 10 lines of code, not 1000. Coding is incremental; it’s small steps that add up.

An LLM, on the other hand, might generate an entire system in one go. Multiple files, new classes, deep inheritance trees, even assembly-level optimizations. One-shot, and it compiles on the first try. It’s jaw-dropping, but I think it’s the wrong approach.

When coding in Cursor or similar tools, I find I’m discarding a significant portion of the code that’s produced. Often the LLM will go down a rabbit hole, overthinking a problem and optimizing the wrong path, creating a mess in its wake.

The problem isn’t just that the code has bugs. It’s that the model has confidently produced a lot of code. More code means more surface area for errors, and more errors mean more time spent debugging AI-generated spaghetti.

What’s Next? Smarter LLM Development Tools

Right now, LLMs lack a crucial skill that human developers rely on: thinking time.

The next generation of AI coding tools will focus on:

  • Chain of thought reasoning: Instead of instantly outputting massive code blocks, AI will take intermediate steps—explaining its logic, breaking down problems, and validating assumptions.
  • Incremental compilation: AI will check that code compiles and that libraries load before continuing, instead of dumping an entire project at once.
  • Smarter debugging: AI won’t just generate code; it’ll run it, test it, review the output, and refine it before handing it over.

This is already starting to happen. MCP, or model context protocol, is a new protocol that lets AI tools and models talk to one another. This creates more of an “agentic” pattern. Some of these tools are starting to pass information back and forth to the coding models. Console output and browser logs are starting to flow, but we’re still very early.

This shift will make LLM-generated code feel less like a black box and more like a collaborative developer that writes, tests, and iterates—just like a human.

We’re not far from an AI-powered future where coding isn’t just faster, but also more reliable. The real challenge isn’t getting LLMs to write code. It’s teaching them to slow down and think about it.

This post is licensed under CC BY 4.0 by the author.