Lesson 5 of 725 min

Critical Evaluation of AI Output

Jump to section

Why critical evaluation is essential

AI produces convincing text. Always. Even when it is wrong, it formulates it with confidence and elegance. This is the biggest risk — not that AI is bad, but that it is convincingly bad. The ability to recognize when AI is wrong is probably the most important AI skill in 2026.

What hallucinations are and why they happen

A hallucination is when AI generates information that looks like facts but is not true. AI does not make things up deliberately — it predicts text that 'looks right' based on training data. When it does not have the correct answer, it generates the most probable one — which can be completely wrong.

Fabricated citations: AI references a book or article that does not exist
False statistics: convincing numbers with no basis in reality
Non-existent people: AI creates a biography for a fictional person
Wrong connections: correct facts incorrectly combined (person X did something that person Y actually did)
Outdated information: correct at training time but no longer valid

How to detect hallucinations

Red flags

Overly specific details: exact numbers, dates, and percentages without source context
Perfect narrative: if it sounds 'too good to be true,' it probably is not true
Unusual claims: information you have never encountered before
Consistent confidence: AI never says 'I don't know' on its own
Missing sources: claims like 'studies show' without a specific study

Verification techniques

Three levels of verification:

Quick check: copy key claims into a search engine — do independent confirmations exist?
Medium check: ask AI 'How confident are you in this claim? What are alternative viewpoints?'
Deep check: find the primary source (study, report, database) and verify directly

Golden rule: the more an AI claim influences your decision-making, the more thoroughly you should verify it. Inspiration for brainstorming? Quick check is fine. Data for a board meeting? Deep verify every number.

Assessing AI output quality

Not every AI output is a hallucination — but it may be low quality. How to systematically evaluate?

Relevance: does AI answer your question, or something else?
Completeness: does it cover all aspects, or skip some?
Accuracy: are specific facts correct?
Recency: is the information current, or outdated?
Balance: does it show multiple perspectives, or just one?
Practicality: can the output be used in practice, or is it only theoretical?

Strategies for different use cases

Brainstorming and creative work

Low verification need. Hallucinations can actually be useful here — unexpected new connections can inspire.

Business decisions

Medium to high verification need. Verify every number and claim. Use AI for framework and structure, not for data.

Legal and health information

Maximum verification need. You can use AI only as a starting point — verify everything with an expert. Relying on AI in these areas can be dangerous.

Teach AI to say: 'I am 90% confident in this claim. Here are the points where I am less certain: ...' The model will not be perfect at self-assessment, but it often identifies areas where it is weakest.

When AI gives you a list of facts, pick the most specific one (a date, a number, a name) and verify it first. If that single fact is wrong, treat the entire output with much higher skepticism — errors tend to cluster together.

Hallucination detection

Ask AI 5 questions from different areas (history, science, current events, your expertise, fiction) and for each response: 1. Rate how convincing the answer sounds (1-10) 2. Identify specific claims you should verify 3. Verify — how many claims are correct, how many wrong? 4. Record where you detected a hallucination and how Pay special attention to answers about your area of expertise — that is where you can best tell when AI is bluffing.

Hint

Questions about your expertise are the most valuable test because you have the knowledge to evaluate correctness. In areas where you are not an expert, it is harder to spot subtle errors — and that is exactly the problem most people face.

Source verification drill

Ask AI to write a 300-word article about a topic that requires specific facts (e.g., 'The history of electric vehicles in Europe' or 'Key milestones in Czech cybersecurity law'). 1. Highlight every specific claim: dates, names, statistics, laws, events 2. For each claim, categorize: 'I can verify this' vs. 'I cannot easily verify this' 3. Verify the ones you can — track correct vs. incorrect vs. partially correct 4. For claims you cannot verify, ask AI: 'What is your source for this specific claim?' 5. Rate the overall reliability of the article on a 1-10 scale Record your false positive rate (claims you initially accepted as true but turned out to be wrong).

Hint

Most people are surprised by their false positive rate — we tend to accept claims that match our existing beliefs without verification. This exercise trains you to question even plausible-sounding statements.

Quality scorecard

Create a personal quality scorecard for evaluating AI output. Use it to evaluate 3 different AI-generated texts: 1. Define your criteria: relevance (1-5), accuracy (1-5), completeness (1-5), recency (1-5), balance (1-5), practicality (1-5) 2. Ask AI to generate 3 texts: a market analysis, a how-to guide, and an opinion piece 3. Score each text using your scorecard 4. For the lowest-scoring text, iterate with AI to improve the weakest dimension 5. Re-score after iteration — how much did the score improve? Keep your scorecard and use it regularly. Over time, you will develop an intuitive sense for AI output quality.

Hint

The scorecard is not about perfection — it is about building a systematic habit. Even a simple 'good/okay/bad' rating on 3 dimensions is better than no evaluation at all.

Key Takeaways

Hallucination = AI generates convincing but untrue information — the main risk
Red flags: overly specific details, perfect narrative, missing sources
Three verification levels: quick (search engine), medium (ask AI), deep (primary source)
The more a claim influences decisions, the more thoroughly verify it
Teach AI to assess its own confidence — not perfect, but it helps

Previous lesson

LinkedIn X / Twitter

5/7 complete — keep going!

Previous lesson Next lesson

Lesson 5 of 725 min

Critical Evaluation of AI Output

Jump to section

Why critical evaluation is essential

What hallucinations are and why they happen

Fabricated citations: AI references a book or article that does not exist
False statistics: convincing numbers with no basis in reality
Non-existent people: AI creates a biography for a fictional person
Wrong connections: correct facts incorrectly combined (person X did something that person Y actually did)
Outdated information: correct at training time but no longer valid

How to detect hallucinations

Red flags

Overly specific details: exact numbers, dates, and percentages without source context
Perfect narrative: if it sounds 'too good to be true,' it probably is not true
Unusual claims: information you have never encountered before
Consistent confidence: AI never says 'I don't know' on its own
Missing sources: claims like 'studies show' without a specific study

Verification techniques

Three levels of verification:

Quick check: copy key claims into a search engine — do independent confirmations exist?
Medium check: ask AI 'How confident are you in this claim? What are alternative viewpoints?'
Deep check: find the primary source (study, report, database) and verify directly

Assessing AI output quality

Not every AI output is a hallucination — but it may be low quality. How to systematically evaluate?

Relevance: does AI answer your question, or something else?
Completeness: does it cover all aspects, or skip some?
Accuracy: are specific facts correct?
Recency: is the information current, or outdated?
Balance: does it show multiple perspectives, or just one?
Practicality: can the output be used in practice, or is it only theoretical?

Strategies for different use cases

Brainstorming and creative work

Low verification need. Hallucinations can actually be useful here — unexpected new connections can inspire.

Business decisions

Medium to high verification need. Verify every number and claim. Use AI for framework and structure, not for data.

Legal and health information

Maximum verification need. You can use AI only as a starting point — verify everything with an expert. Relying on AI in these areas can be dangerous.

Hallucination detection

Hint

Source verification drill

Hint

Quality scorecard

Hint

The scorecard is not about perfection — it is about building a systematic habit. Even a simple 'good/okay/bad' rating on 3 dimensions is better than no evaluation at all.

Key Takeaways

Hallucination = AI generates convincing but untrue information — the main risk
Red flags: overly specific details, perfect narrative, missing sources
Three verification levels: quick (search engine), medium (ask AI), deep (primary source)
The more a claim influences decisions, the more thoroughly verify it
Teach AI to assess its own confidence — not perfect, but it helps

Previous lesson

LinkedIn X / Twitter

5/7 complete — keep going!