Count the things that can go wrong between picking up a ticket and merging a pull request (PR).

You misread the requirements. You build a helper that already exists three directories over. The branch name is wrong. The commit message says "wip" and stays that way. The PR template has six sections. You fill in two. A reviewer leaves four comments. You resolve one without changing the code because it's Friday, and the diff is 600 lines, and you're done.

Developers know this list. They live it. Not because they're careless, but because there are fifteen things to remember, and the code was the hard part. Everything around it feels like paperwork, and paperwork gets skipped.

Every skipped step is a place where a standard exists, but nobody has enforced it.

The wrong question

Most teams evaluate LLMs task by task. Is it good at writing tests? Is it good at reviewing code? Each task gets a scorecard, and you adopt the ones that pass.

But that treats the LLM as something you opt into. You work without one by default and justify each use. That framing keeps the paperwork problem intact. You've added a tool. You haven't changed the workflow.

Flip it. Start from the assumption that every action passes through the LLM. Now ask: what's still ungated? That question completely changes the adoption strategy. You stop evaluating capabilities and start looking for gaps.

What changes

When the LLM sits between you and every action, the checklist disappears into the action itself. You say "commit this", and the LLM reads the conventions, runs the tests, writes the message. You say "open a PR", and the description fills itself from the diff and the ticket; reviewers get assigned based on who last touched this code, and every template section gets filled in because the LLM doesn't get tired of templates.

None of this is clever. It's consistent. The LLM reads the instructions every time. It doesn't skip the linter. It doesn't bypass the hook with --no-verify at ten to six on a Friday.

A junior working through the LLM produces commits, PRs, and reviews with the same quality baseline as a senior. The floor rises. Not because the junior got better. Because quality stopped depending on individual discipline and became a property of the workflow itself.

The hard part moves

Traditional quality enforcement has one failure mode: people. Checklists rely on someone following them. Linters rely on someone configuring them. Code review relies on reviewers catching everything in an 800-line diff. Every mechanism breaks the same way. Someone skips it, ignores it, or doesn't notice.

When the LLM enforces the rules, that problem vanishes. A different one takes its place. The hard part is no longer "how do we get people to follow the rules?" It's "What are the rules?"

"Write good commit messages" is useless to an LLM. "Commit messages explain why, not what. One line under 72 characters. Reference the ticket number." That's enforceable. Once the standard is written clearly enough for a machine to follow, it gets followed. Every time.

You solve the definition problem once. Enforcement becomes free.

What still breaks

LLMs don't eliminate failure. They shift it. A human reviewer misses a file due to fatigue. An LLM reviewer confidently approves a test that asserts the wrong thing. Humans fail through inattention. LLMs fail through misplaced confidence. Different failure modes, not fewer.

And if quality lives entirely in the workflow, it disappears when the workflow does. The safety net: the LLM works in the open. You see every command it runs, every check it performs. The understanding stays with you even when the execution doesn't.

Still, that's worth naming honestly. The question was never "when should you use an LLM?" It was always "What standards do you want enforced?" And the follow-up nobody wants to hear: are you willing to write them down?