Safe Refactoring with AI
Jump to section
Why everyone is afraid to touch legacy code
You know the feeling: you fix something in one place and it breaks in three others. Legacy code is brittle because it has no tests, no clear interfaces, and often contains hidden dependencies that only manifest in production. So nobody wants to change it. So technical debt grows. And the problem gets worse every month.
AI changes this dynamic fundamentally. Not because it does the refactoring for you (that is a dangerous illusion), but because it helps you build a safety net that lets you change code with confidence that broken things will surface before they reach production.
Rule number one: never refactor without tests. And if tests do not exist, your first step is to have AI generate them — not to refactor.
Characterization tests: capture current behavior
Characterization tests (sometimes called 'golden master' tests) do not test whether code does what it should. They test whether code does what it does now. This is a crucial difference. With legacy code, you often do not know what 'correct' behavior is. But you know what the code currently does — and that is what you want to capture before any change.
Prompt AI: 'Generate characterization tests for this function. For each test use realistic inputs and capture current outputs as expected values. Include edge cases you can infer from the code.' AI reads the code, identifies branches and boundary conditions, and generates tests that capture current behavior.
Refactoring strategy: small steps, frequent checks
Never refactor large chunks of code at once. Break refactoring into small steps where each step changes one thing and tests pass after it. AI helps you design this sequence: 'I need to refactor the OrderProcessor class. Suggest a sequence of small, safe steps where each step is independently testable and reversible.'
A typical sequence looks like this: first rename variables and methods to understandable names. Then extract repeating blocks into their own methods. Then separate side effects from pure logic. Commit each step separately. If something breaks, you revert exactly one commit, not an entire day of work.
Have AI generate each change as a diff. You can review exactly what changes before applying. Never blindly accept large AI-generated changes.
Seam points: where to insert test boundaries
Michael Feathers in 'Working Effectively with Legacy Code' uses the term 'seams' — places in code where you can change behavior without changing the code itself. For example, interfaces you can replace with mock implementations. AI can identify these seam points: 'Find places in this code where I can insert an abstraction or interface so I can test part of the logic in isolation.'
Importantly, creating seams is itself refactoring — which is why you need characterization tests BEFORE this step. First tests that capture behavior. Then seams that enable better testing. Then more tests at a lower level. And only then the real refactoring.
When AI refactoring fails
AI is poor at refactoring that requires deep domain understanding. If business logic requires certain operations to happen in a precise order for regulatory reasons, AI cannot know that from the code. Similarly for performance-critical sections where specific optimizations exist for good reason, even if they look like bad code. Always ask: 'Is there a reason this looks the way it does?'
Refactoring commits should be as small as possible. Ideal commit changes one thing: a rename, method extraction, file move. If something breaks, you revert one commit, not an entire day of work.
Pick one complex function from your legacy project. 1) Have AI generate characterization tests (at least 5). 2) Run the tests and verify they all pass. 3) Have AI suggest a sequence of 3-5 refactoring steps. 4) Execute the first step and verify tests still pass. 5) Commit and continue to the next step.
Hint
If tests fail after refactoring, revert the change and ask AI why behavior changed. You will often uncover a hidden side effect.
Pick one function in legacy code that needs refactoring. Process: 1) Have AI write tests for current behavior, 2) Run tests — all must pass, 3) Have AI suggest refactoring (with explanation why), 4) Implement the refactoring, 5) Run tests again — they must still pass. Tests serve as a safety net — if they fail, the refactoring changed behavior.
Hint
Document your process and results — they'll serve as reference for similar future tasks.
Pick a complex class from legacy code. Prompt AI: 'Find seam points in this class — places where I can insert an interface or abstraction for isolated testing. For each seam point state: 1) Where it is, 2) What interface you propose, 3) How it enables better testing, 4) What are the risks of this change.' Implement one seam point and write a test that uses it.
Hint
Typical seam points: database calls (replace with repository interface), HTTP clients (replace with abstraction), system time (replace with clock interface).
- Never refactor without tests — have AI generate characterization tests as the first step
- Break refactoring into small steps, each independently testable and reversible
- AI identifies seam points where you can insert test boundaries
- AI does not understand domain context — always ask whether code looks odd for a good reason
In the next lesson, we dive into AI-Generated Tests for Untested Code — a technique that gives you a clear edge. Unlock the full course and continue now.
2/6 complete — keep going!