Group Purchasing
Group Purchasing

Your Sensitive Data Has Left the Chat: LLMs as Sensitive Data Detectors

Your Sensitive Data Has Left the Chat: LLMs as Sensitive Data Detectors (PDF, 0.98MB)Published: 12 May, 2026
Created by:
Colten Davis

Data is the new frontier thanks to AI. It has become the most important asset of an organization’s infrastructure. As organizations onboard AI chatbots, users can upload corporate documents and data at will.

How is that data being checked for sensitive content? The AI interface is the last chance a company has to ensure that sensitive and confidential data stays within its own environment. Enterprise DLP offerings are not keeping up. While they do use machine learning functions as the underlying technologies, most detection relies on regular expressions or keywords. True sensitive data detection relies on context, not just patterns. It is time to investigate whether language models can help identify sensitive data in content uploaded to these AI systems.

This paper seeks to evaluate the hypothesis that language models, large and small, can perform well at sensitive data classification and to offer a solution for companies trying to detect contextually sensitive data in their AI workflows.