SEC595: Applied Data Science and AI/Machine Learning for Cybersecurity Professionals


Experience SANS training through course previews.
Learn MoreLet us help.
Contact usBecome a member for instant access to our free resources.
Sign UpWe're here to help.
Contact UsData is the new frontier thanks to AI. It has become the most important asset of an organization’s infrastructure. As organizations onboard AI chatbots, users can upload corporate documents and data at will.
How is that data being checked for sensitive content? The AI interface is the last chance a company has to ensure that sensitive and confidential data stays within its own environment. Enterprise DLP offerings are not keeping up. While they do use machine learning functions as the underlying technologies, most detection relies on regular expressions or keywords. True sensitive data detection relies on context, not just patterns. It is time to investigate whether language models can help identify sensitive data in content uploaded to these AI systems.
This paper seeks to evaluate the hypothesis that language models, large and small, can perform well at sensitive data classification and to offer a solution for companies trying to detect contextually sensitive data in their AI workflows.







