Understanding Legacy Code with AI
Jump to section
Code nobody wrote (but everyone depends on)
Every company has that one repository. Maybe it is an internal tool written in PHP 5, running on a server under someone's desk. Maybe it is a Java monolith from 2012 with three hundred classes and zero tests. Maybe it is a Python script that runs nightly data imports and nobody knows exactly what it does. Everyone depends on it. Nobody understands it. And nobody wants to look at it.
Traditionally you have two options: spend weeks reading every line, or write a new system from scratch (and discover the old one did things you did not know about). AI opens a third path: systematic understanding of existing code in a fraction of the time.
Legacy code is not bad code. It is code that outlived its documentation. That means it did something right — your job is to figure out what.
Architecture extraction: from chaos to diagram
Start by feeding AI the project's directory structure and key files. Do not ask for detailed analysis of every function right away — start from the top. 'What layers does this project have? Where is the entry point? What external services does it call?' AI can infer architecture from directory structure, imports, and config files — architecture you would spend all day assembling by hand.
Practical approach: use tree or ls -R to get the directory structure. Add files like package.json, pom.xml, or requirements.txt. Have AI generate a Mermaid diagram of the main components and their relationships. In five minutes you have a map that would otherwise take half a day of reverse engineering.
Use Claude or Cursor with full-repo context. Modern AI tools can analyze hundreds of files at once — take advantage of that. IDE plugins like Cursor or Windsurf can index the entire project.
Dependency mapping: what depends on what
Legacy code often has hidden dependencies you will not find in any config file. Hardcoded URLs, implicit environment expectations, time zones set deep in the code. AI can scan the code and identify these hidden bindings. Prompt: 'Find all external dependencies — API calls, database connections, file operations, environment variables.'
Understanding data flows is especially important. Where does data come from? How does it transform? Where does it go? Have AI create a data flow diagram — it often uncovers things that would take days of manual tracing. For example, a hidden side effect where a price calculation function also writes to a logging table.
Business logic: understanding rules hidden in code
The most valuable part of legacy code is not the technical details — it is the business logic. Rules that accumulated over years as if-else branches, special cases, and hotfixes. AI can extract these rules and reformulate them into readable form. Prompt: 'Extract all business rules from this class. For each rule state: what it checks, what the outcome is, and what exceptions exist.'
Watch out for hallucinations. AI can misinterpret business logic, especially implicit rules. Always verify that extracted rules match actual behavior — for example by running existing code with test data. AI helps you understand what the code probably does. Confirming it is your job.
Documentation output: foundation for next steps
The output of this phase should be a document covering four areas: architectural overview (layers, components, communication), dependency map (external services, databases, APIs), list of business rules (extracted from code), and risk inventory (what is brittle, what lacks tests, what is hardcoded). This document is your foundation for everything that follows — refactoring, testing, and modernization.
When analyzing legacy code, start from entry points (main, HTTP handlers, cron jobs) and work inward. 'What entry points does this project have? What happens when an HTTP request hits /api/orders?' is better than 'explain all the code'.
Pick one legacy project you have access to. Using AI: 1) Generate an architectural diagram (Mermaid). 2) Identify all external dependencies. 3) Extract at least 3 business rules from the most complex class. 4) Create a list of 'hidden surprises' — things you did not expect. Compare AI output with your own understanding. Where did AI surprise you?
Hint
Start with a smaller module, not the entire monolith. AI can handle large projects, but you get better results when you work in pieces.
Pick the most complex or least documented module in your legacy codebase. Paste it into AI (Claude has 200K token context, handles large files) with the prompt: 'Explain this code. 1) What it does — high-level purpose, 2) Key functions and their roles, 3) Data flow — what goes in, what comes out, 4) Potential issues or code smells, 5) Dependencies on other modules.' Compare AI's explanation with your understanding — anything surprising?
Hint
Document your process and results — they'll serve as reference for similar future tasks.
Pick one module from a legacy project. Prompt AI: 'Analyze this code and create a data flow diagram in Mermaid format. Include: where data comes from (API, DB, files), how it transforms, where it goes (DB, API, logs), and what side effects exist (cache writes, notifications, audit log).' Compare the diagram with your mental model. Where did AI surprise you?
Hint
Side effects are the most common surprise. A 'calculateTotal' function might also write to an audit log or send a notification — AI uncovers these hidden bindings.
- AI can infer architecture in minutes that would take a full day by hand
- Dependency mapping uncovers hidden bindings not found in any documentation
- Business logic hidden in code is the most valuable part of a legacy system
- Always verify AI outputs — hallucinations with legacy code are common