Back to Capabilities of Microsoft Compliance Solutions

Data Classification Capabilities

5 minutes 5 Questions

Data Classification Capabilities in Microsoft Compliance Solutions refer to a comprehensive set of tools and features designed to help organizations identify, categorize, and protect sensitive information across their digital environment. These capabilities are integral to Microsoft 365 Compliance …

Data Classification Capabilities in Microsoft Compliance Solutions

Understanding Data Classification Capabilities

Data classification is one of the most foundational concepts in Microsoft's compliance ecosystem. It enables organizations to identify, categorize, and protect sensitive information across their digital estate. For the SC-900 exam, understanding data classification capabilities is essential as it forms the backbone of information protection and governance strategies.

Why Data Classification Is Important

Organizations handle massive volumes of data daily — emails, documents, spreadsheets, presentations, and more. Without proper classification, sensitive information such as financial records, personal data, health information, and intellectual property can be inadvertently exposed, shared, or mishandled. Data classification is important because it:

• Reduces risk of data breaches by identifying sensitive data before it leaves the organization
• Supports regulatory compliance with standards like GDPR, HIPAA, PCI-DSS, and others
• Enables informed decision-making about how data should be stored, shared, and retained
• Automates protection by applying labels and policies based on the sensitivity of content
• Provides visibility into the types of sensitive information that exist across the organization

What Is Data Classification?

Data classification in Microsoft Purview (formerly Microsoft 365 Compliance) is the process of identifying and labeling data based on its content and context. Microsoft provides built-in tools and capabilities that allow organizations to discover, classify, and protect information at scale.

The key components of data classification in Microsoft Purview include:

1. Sensitive Information Types (SITs)
These are pattern-based classifiers that detect sensitive content such as credit card numbers, Social Security numbers, passport numbers, and more. Microsoft provides over 300 built-in sensitive information types. Organizations can also create custom sensitive information types to match their unique data patterns. SITs use a combination of:
• Regular expressions (regex) — Pattern matching
• Keywords — Specific words or phrases associated with the data
• Checksums — Mathematical validation (e.g., Luhn algorithm for credit card numbers)
• Proximity — How close supporting evidence is to the primary pattern
• Confidence levels — High, medium, or low confidence based on the evidence found

2. Trainable Classifiers
Trainable classifiers use machine learning to classify content that cannot easily be identified by simple pattern matching. There are two types:
• Pre-trained classifiers — Built-in classifiers provided by Microsoft that are ready to use (e.g., resumes, source code, harassment, threat, profanity, financial statements, etc.)
• Custom trainable classifiers — Created by organizations by providing sample content (seed content) that represents the category, then the classifier learns to identify similar content. The process involves creating the classifier, providing positive samples, testing it, and then publishing it.

3. Exact Data Match (EDM) Classification
EDM-based classification allows organizations to create custom sensitive information types based on exact values in a database. For example, rather than detecting any pattern that looks like a Social Security number, EDM can match against the exact Social Security numbers in your employee database. This significantly reduces false positives and provides highly precise classification.

4. Content Explorer
Content Explorer is a tool within the Microsoft Purview compliance portal that provides a snapshot view of items that have been classified with sensitivity labels, retention labels, or identified as sensitive information types. It allows administrators to drill down into specific locations and view the actual content of classified items. Access to Content Explorer requires specific roles:
• Content Explorer List Viewer — Can see the list of items and their locations but cannot view the actual content
• Content Explorer Content Viewer — Can view the actual content of each item in the list

5. Activity Explorer
Activity Explorer provides visibility into what activities are being performed on classified content. It tracks actions such as:
• Files being labeled or relabeled
• Labels being downgraded
• Files being copied to removable media
• Files being shared externally
• Files being printed
• DLP policy matches

Activity Explorer helps organizations understand how sensitive data is being used and whether policies are working effectively.

6. Data Classification Dashboard
The data classification dashboard in Microsoft Purview provides an overview of how content is being classified across the organization. It shows:
• Top sensitive information types
• Top sensitivity labels applied
• Top retention labels applied
• Locations where classified content is stored
• A summary of activities on labeled content

How Data Classification Works

The data classification process in Microsoft Purview works through a combination of automated and manual methods:

Step 1: Discovery and Scanning
Microsoft Purview scans content across Microsoft 365 workloads (Exchange Online, SharePoint Online, OneDrive for Business, and Microsoft Teams) to identify data that matches sensitive information types, trainable classifiers, or exact data match definitions.

Step 2: Classification
Once content is scanned, it is classified based on the patterns, keywords, and machine learning models that match. Classification can happen automatically through auto-labeling policies or manually when users apply sensitivity labels themselves.

Step 3: Labeling
After classification, sensitivity labels or retention labels can be applied to the content. Labels can trigger protective actions such as encryption, access restrictions, watermarking, or content marking (headers and footers).

Step 4: Monitoring and Reporting
Using Content Explorer, Activity Explorer, and the data classification dashboard, administrators monitor classified content, track activities, and ensure compliance policies are being followed.

Key Concepts to Remember for the SC-900 Exam

• Sensitive information types are pattern-based and use regex, keywords, checksums, and confidence levels
• Trainable classifiers use machine learning and come in pre-trained and custom varieties
• Exact Data Match provides the highest precision by matching against exact values from a database
• Content Explorer shows what classified data exists and where it is located
• Activity Explorer shows what actions are being taken on classified content
• Data classification is a prerequisite for effective data loss prevention (DLP), information protection, and data governance
• No data leaves the Microsoft 365 compliance boundary during classification — all processing happens within the tenant
• Classification supports content in Exchange Online, SharePoint Online, OneDrive for Business, and Microsoft Teams

Exam Tips: Answering Questions on Data Classification Capabilities

1. Know the difference between SITs, trainable classifiers, and EDM: Exam questions often test whether you understand which classification method to use in a given scenario. Use SITs for well-defined patterns (credit cards, SSNs), trainable classifiers for unstructured content that requires ML (resumes, contracts), and EDM for exact value matching with minimal false positives.

2. Understand Content Explorer vs. Activity Explorer: This is a frequently tested distinction. Content Explorer = what data exists and where. Activity Explorer = what actions are happening on that data. If a question asks about monitoring user activities on labeled content, the answer is Activity Explorer. If it asks about discovering where sensitive content resides, the answer is Content Explorer.

3. Remember the role-based access for Content Explorer: Content Explorer List Viewer can see file names and locations but NOT the content. Content Explorer Content Viewer can see the actual content. This distinction is commonly tested.

4. Pre-trained vs. custom trainable classifiers: Know that pre-trained classifiers are ready to use out of the box, while custom classifiers require you to provide seed content and go through a training and testing phase before publishing.

5. Focus on the purpose, not deep technical configuration: The SC-900 is a fundamentals exam. You are not expected to know how to configure classification policies step by step. Instead, focus on what each capability does, when to use it, and why it matters.

6. Look for keywords in questions: Words like 'pattern-based,' 'regex,' or 'checksum' point to sensitive information types. Words like 'machine learning,' 'training,' or 'seed content' point to trainable classifiers. Words like 'exact values,' 'database,' or 'reduce false positives' point to EDM.

7. Data classification enables other solutions: Remember that classification is the foundation for DLP policies, sensitivity labels, retention policies, and insider risk management. Questions may test your understanding of how these solutions depend on classification.

8. Eliminate obviously wrong answers: On the SC-900, some answer choices may reference capabilities from Azure or other unrelated services. Stay focused on Microsoft Purview data classification capabilities and eliminate answers that describe network security or identity management features.

9. Watch for questions about the data classification dashboard: The dashboard provides an at-a-glance overview of top sensitive info types, top labels, and classified content locations. It is the starting point for understanding the organization's data landscape.

10. Remember that classification is non-destructive: Data classification identifies and labels content — it does not delete, move, or alter the data itself. Protective actions are applied through policies and labels, not through the classification process alone.

Test mode:

Exam (Timed)

Practice (With explanations)

Start practice test

Learn Security, Compliance & Identity

SC-900 security, compliance & identity basics

Security Concepts: Zero trust, shared responsibility, and defense in depth
Identity Fundamentals: Azure AD, authentication methods, and access management
Compliance: Microsoft compliance solutions, privacy, and data governance
100% Satisfaction Guaranteed: Full refund if unsatisfied
Risk-Free: 7-day free trial with all premium features!

More Data Classification Capabilities questions

45 questions (total)

Start 45 question test