How Many LLMs Does it Take to Classify a Suspicious Email?

How Many LLMs Does it Take to Classify a Suspicious Email? (PDF, 1.03MB)Published: 12 Mar, 2026

Created by:

Bridget Bartell

Large language models (LLMs) such as ChatGPT, Microsoft CoPilot, and Google Gemini are becoming increasingly accessible to end users and may offer a novel avenue for evaluating suspicious email messages as they are encountered. However, little is known about how these publicly available models perform when classifying phishing versus legitimate content without additional tuning.

This study examines the accuracy, reliability, and operational behavior of three widely available LLMs using a dataset of 2000 human-written emails containing both legitimate and suspicious messages. Each model was provided with identical inputs and prompts across six runs to assess variability in output quality, classification consistency, and suspiciousness scoring.

The results show stark differences in performance: ChatGPT demonstrated full dataset acceptance but exhibited highly inconsistent scoring and categorization; CoPilot processed fewer messages but showed strong reliability and accuracy for those it evaluated; and Gemini displayed significant operational instability, returning inconsistent, partial, or malformed outputs. These findings indicate that publicly available LLMs vary widely in their dependability for phishing detection tasks, highlighting critical limitations for real-world adoption and informing recommendations for organizational use and offering future opportunities for study.

Additional Resources

SANS AI Cybersecurity Careers Guide

WhitepaperArtificial Intelligence

28 May 2026

SANS AI Vulnerability Discovery and VulnOps: What to Do Next

WhitepaperArtificial Intelligence

28 May 2026
SANS Institute

SANS AI Security Maturity Model™ eBook

WhitepaperArtificial Intelligence

11 May 2026
Chris Cochran

When Trusted Senders Become Threats: Stopping BEC and Supply Chain Attacks with Self-Learning AI

WhitepaperArtificial Intelligence

24 Apr 2026
Matt Bromiley

Using AI for Source Code Vulnerability Analysis

WhitepaperArtificial Intelligence

15 Apr 2026
Joshua Wright

When the Security Scanner Became the Weapon: TeamPCP Supply Chain TTP Report

WhitepaperCloud Security

25 Mar 2026
Kenneth G. Hartman, Eric Johnson

Related Courses

Slide 1 of 8
SEC535: Offensive AI - Attack Tools and Techniques
NEW
AI-FOCUSED
Intermediate
SEC535Offensive Operations, Artificial Intelligence
GIAC Offensive AI Analyst (GOAA)
3 Days (Instructor-Led)
18 CPEs / 18 Hours (Self-Paced)
Labs: 14 Hands-On Labs
View course details Register
Slide 2 of 8
SEC595: Applied Data Science and AI/Machine Learning for Cybersecurity Professionals
AI-FOCUSED
Advanced
SEC595Cyber Defense, Artificial Intelligence
GIAC Machine Learning Engineer (GMLE)
6 Days (Instructor-Led)
36 CPEs / 36 Hours (Self-Paced)
Labs: 30 Hands-On Labs
View course details Register
Slide 3 of 8
SEC573: AI-Powered Security Automation: Building Tools with Python, LLMs, and MCP
MAJOR UPDATES
AI-FOCUSED
Advanced
SEC573Cyber Defense, Artificial Intelligence
GIAC Python Coder (GPYC)
6 Days (Instructor-Led)
36 CPEs / 36 Hours (Self-Paced)
Labs: 128 Hands-On Labs
View course details Register
Slide 4 of 8
SEC411: AI Security Principles and Practices: GenAI and LLM Defense
NEW
AI-FOCUSED
Intermediate
SEC411Cyber Defense, Artificial Intelligence
18 CPEs / 18 Hours (Self-Paced)
Labs: 5 Hands-On Labs
View course details Register
Slide 5 of 8
SEC545: GenAI and LLM Application Security
NEW
AI-FOCUSED
Advanced
SEC545Cloud Security, Artificial Intelligence
GIAC AI Platform Security (GAIPS)
5 Days (Instructor-Led)
30 CPEs / 30 Hours (Self-Paced)
Labs: 20 Hands-On Labs
View course details Register
Slide 6 of 8
SEC495: Leveraging LLMs: Building & Securing RAG, Contextual RAG, and Agentic RAG
AI-FOCUSED
Essentials
SEC495Cyber Defense, Artificial Intelligence
7 CPEs / 7 Hours (Self-Paced)
View course details Register
Slide 7 of 8
FOR563: Applied AI for Digital Forensics and Incident Response: Leveraging Local Large Language Models
NEW
AI-FOCUSED
Intermediate
FOR563Digital Forensics and Incident Response, Artificial Intelligence
1 Day (Instructor-Led)
6 CPEs / 6 Hours (Self-Paced)
Labs: 4 Hands-On Labs
View course details Register
Slide 8 of 8
SEC598: AI and Security Automation for Red, Blue, and Purple Teams
AI-FOCUSED
Intermediate
SEC598Offensive Operations, Artificial Intelligence
GIAC AI Security Automation Engineer (GASAE)
6 Days (Instructor-Led)
36 CPEs / 36 Hours (Self-Paced)
Labs: 25 Hands-On Labs
View course details Register