Lesson 2 of 730 min

Generating Tests with AI

Jump to section

Anatomy of a good AI-generated test

AI can generate tests quickly, but not all AI-generated tests are good. A good test has a clear purpose, tests one thing, has a readable name, and is maintainable. In this lesson, you will learn prompt patterns that lead to high-quality tests.

The most common mistake in AI test generation: letting AI generate tests without context. AI then writes tests that test the implementation (mock everything) instead of behavior. Always provide context — what the function's purpose is, not how it is implemented.

Prompt patterns for unit tests

Pattern 1: Behavior-driven prompt

Instead of 'write tests for this function,' describe the behavior you want to test:

# Bad prompt
"Write tests for the calculate_discount() function"

# Good prompt
"Write tests for calculate_discount(). The function:
- Returns 0% discount for orders under 100 EUR
- Returns 5% discount for orders 100-499 EUR
- Returns 10% discount for orders 500+ EUR
- Returns 15% discount for VIP customers regardless of amount
- Raises ValueError for negative amounts

Test each boundary case and VIP + amount combinations.
Use pytest, no mocks."

Pattern 2: Test-first prompt

Have AI write tests BEFORE it writes the implementation. This is TDD with AI:

# Prompt: "Write pytest tests for a parse_csv_row() function
# that takes a CSV row as a string and returns a dict.
# Expected behavior:
# - Standard row: 'John,Smith,30' -> {'first': 'John', 'last': 'Smith', 'age': 30}
# - Empty field: 'John,,30' -> {'first': 'John', 'last': '', 'age': 30}
# - Quotes: '"John, Jr.",Smith,30' -> {'first': 'John, Jr.', 'last': 'Smith', 'age': 30}
# - Empty string: raises ValueError
# - Wrong field count: raises ValueError"

# AI generates:
import pytest
from myapp.parsers import parse_csv_row


def test_standard_row():
    result = parse_csv_row("John,Smith,30")
    assert result == {"first": "John", "last": "Smith", "age": 30}


def test_empty_field():
    result = parse_csv_row("John,,30")
    assert result == {"first": "John", "last": "", "age": 30}


def test_quoted_field_with_comma():
    result = parse_csv_row('"John, Jr.",Smith,30')
    assert result == {"first": "John, Jr.", "last": "Smith", "age": 30}


def test_empty_string_raises():
    with pytest.raises(ValueError):
        parse_csv_row("")


def test_wrong_field_count_raises():
    with pytest.raises(ValueError):
        parse_csv_row("John,Smith")

Pattern 3: Existing code + coverage gap

Give AI existing code AND existing tests and ask it to fill coverage gaps:

"Here is the implementation [paste code] and here are the
existing tests [paste tests]. The coverage report shows
that lines 45-67 are not covered. Write tests to cover
those lines. Focus on error handling and edge cases."

Generating integration tests

Integration tests are more complex than unit tests — they test interaction between components. For AI, the key is to provide context about all participating components:

# Prompt for integration test:
# "Write an integration test for POST /api/payments/create/.
# The endpoint expects JWT authentication, a payload with course_slug,
# creates a Payment in the DB, and returns a redirect URL.
# Use pytest + Django test client. Real DB, no mocks
# except for the external Comgate API (use respx)."

# AI generates:
import pytest
import respx
from httpx import Response


@pytest.mark.django_db
def test_create_payment_success(authenticated_client, course):
    with respx.mock:
        respx.post("https://payments.comgate.cz/v1.0/create").mock(
            return_value=Response(200, json={
                "code": 0,
                "transId": "ABC-123",
                "redirect": "https://payments.comgate.cz/pay/ABC-123"
            })
        )

        response = authenticated_client.post(
            "/api/payments/create/",
            data={"course_slug": course.slug},
            content_type="application/json",
        )

    assert response.status_code == 200
    data = response.json()
    assert "redirect_url" in data
    assert Payment.objects.filter(user=authenticated_client.user).exists()

Coverage analysis with AI

AI can help not only write tests but also analyze coverage and suggest where to add tests:

# 1. Generate coverage report
pytest --cov=myapp --cov-report=json

# 2. Let AI analyze coverage gaps
claude "Analyze the coverage report in coverage.json.
  Identify the 5 most important uncovered areas
  and write tests for them. Prioritize by:
  1. Business-critical functions
  2. Error handling paths
  3. Edge cases in data validation"

Reviewing AI-generated tests

You must review every AI-generated test. Look for these problems:

Tautological tests — a test that tests whether the implementation does what it does (assert result == function(input))
Excessive mocking — a test that mocks everything and tests nothing real
Missing assertions — a test that always passes because it has no assert in the right place
Hard-coded values — a test that only passes for specific data and fails on changes
Missing cleanup — a test that leaves state behind (files, DB records) and affects other tests

Rule of thumb: if you cannot understand what a test is testing within 5 seconds of reading it, the test is bad. AI often writes tests with unclear names. Rename the test to describe expected behavior, not implementation details.

Generate tests for your code

Pick one module in your project that lacks sufficient tests: 1. Run a coverage report and identify uncovered lines 2. Use AI (Claude Code, Copilot, or Cursor) to generate tests 3. Use all three prompt patterns: behavior-driven, test-first, coverage gap 4. Review each generated test against the checklist above 5. Run the tests and verify they pass and increase coverage Compare test quality from different prompt patterns. Which pattern produced the best tests?

Hint

Start with a simple utility function — clean input, clean output, no side effects. That is where AI excels. Then try a more complex function with database or API calls. You will see where AI starts needing more context.

Key Takeaways

Behavior-driven prompts produce better tests than generic 'write tests for X'
Provide AI with context about expected behavior, not just implementation
Always review AI-generated tests — look for tautologies, excessive mocks, and missing asserts
TDD with AI: have AI write tests before the implementation
Coverage gap analysis: give AI the coverage report and let it fill missing tests

In the next lesson, we dive into Visual Regression Testing — a technique that gives you a clear edge. Unlock the full course and continue now.

Previous lesson

LinkedIn X / Twitter

2/7 complete — keep going!

Previous lesson Next lesson