Data Classification: What Can Go Into AI and What Cannot
Jump to section
Why data classification is the foundation of everything
Most AI incidents share a common denominator: someone uploaded data into an AI tool that should not have been there. Source code, client contracts, employee personal data, financial reports. Not out of malice — simply because nobody clearly stated what is allowed and what is not. Data classification is your first and most important guardrail. Without it, all other rules are useless.
The good news: you do not need a complex system. You need a simple matrix that anyone can understand in two minutes and apply without thinking. Four levels, clear rules, concrete examples.
Four levels of data sensitivity
Level 1 — Public data. Information that is publicly available or intended for publication. Blog posts, marketing materials, public documentation. This data can go into any AI tool without restrictions.
You are a data privacy specialist helping me classify data for AI usage.
I will give you a list of data types my team works with.
For each, classify into one of four levels:
Level 1 (Public): Can go into any AI tool
Level 2 (Internal): Enterprise AI tools only (no training on our data)
Level 3 (Confidential): Self-hosted AI only, or anonymize first
Level 4 (Strictly Protected): Never into AI, anonymize before any processing
Also provide:
- Justification for the classification
- Anonymization method if Level 3-4
- Example of what the anonymized version looks like
Data types:
- [paste your list here]Level 2 — Internal data. Information that is not secret but is not meant for public consumption. Internal processes, general project notes, non-identifiable metrics. This data can go into approved AI tools with enterprise licenses that guarantee data will not be used for training.
Level 3 — Confidential data. Information with business value or legal protection. Client contracts, source code, financial results before publication, business strategy. This data can only go into on-premise or self-hosted AI solutions where you have full control.
Level 4 — Strictly protected data. Personal data (GDPR), health records, access credentials, cryptographic keys. This data must not go into any AI tool. No exceptions. If you need AI for working with this data, you must anonymize it first.
Rule of thumb: If you would be uncomfortable seeing the data in a newspaper, it belongs in Level 3 at minimum. If it would have legal consequences, it belongs in Level 4.
How to build a matrix for your company
Start with a list of data types your team works with. Walk through a typical workday: what documents, systems, and information do you interact with? Assign a level to each type. Specific examples are key — 'client data' is too vague. 'Client name and email from CRM' is clear. 'Anonymized client count per segment' is something different.
When in doubt about data level, always classify one level higher. Treating internal data as confidential costs you nothing (you just use the enterprise tool). Treating confidential data as internal can cost you a client or a lawsuit.
Involve a lawyer and security specialist if you have them. But do not turn it into a month-long project. A basic matrix can be built in an afternoon. It will have gaps. That is fine — you will fix them as you encounter edge cases. An imperfect matrix today is better than a perfect one in six months.
Common classification mistakes
Mistake one: classifying too granularly. Fifty categories means nobody uses it. Stick to four levels. Mistake two: forgetting context. An employee name by itself is Level 2. An employee name plus their salary is Level 4. Data combinations change the level. Mistake three: not revising. Classifications go stale — new projects, new clients, new regulations. Review quarterly.
Create a quick reference card — one page with a table: data type → level → what you can do. Put it on your intranet, Slack, on the wall. The more visible, the more effective.
List 10-15 data types your team works with daily. Assign each a level from 1-4. For Levels 3 and 4, write down what alternative exists (anonymization, aggregation, self-hosted tool). The result will be the foundation of your company matrix.
Hint
Start with what your team actually does, not what they should do. Open your team chat history — what data flies around there?
Go through your inbox, shared drives, and project tools. Find 10 different document types your team works with daily. For each: 1) Assign a sensitivity category, 2) Decide if it can go into public AI, 3) If not, describe how you'd anonymize the data, 4) Identify who in the company should approve this decision.
Hint
Most documents fall into a 'gray zone'. That's normal — that's exactly why you need clear rules with examples. Document your decisions as precedents for future cases.
For Level 3-4 data that your team needs to process with AI, create an anonymization playbook: 1) List 5 data types you commonly work with that are Level 3-4. 2) For each, describe how to anonymize it before AI processing (replace names with 'Client A', remove dates, aggregate numbers). 3) Create a before/after example for each type. 4) Define who is responsible for anonymization in your workflow. Test the playbook — can someone follow it without additional help?
Hint
The most common anonymization failure: forgetting metadata. A document with all names removed but 'Created by: John Smith, Acme Corp' in the properties is not anonymous. Check file properties, email headers, and embedded comments.
- Four levels: public, internal, confidential, strictly protected
- Data combinations change sensitivity levels — name + salary is a different category than name alone
- An imperfect matrix today beats a perfect one in six months
- Quick reference card visible to the whole team — the simpler, the more effective
- When in doubt, classify one level higher — treating internal data as confidential costs nothing, but the reverse can cost a client
In the next lesson, we dive into Tool Approval: How to Select and Authorize AI Solutions — a technique that gives you a clear edge. Unlock the full course and continue now.
2/6 complete — keep going!