SEC536: Adversarial AI - Penetration Testing AI Systems


Experience SANS training through course previews.
Learn MoreLet us help.
Contact usBecome a member for instant access to our free resources.
Sign UpWe're here to help.
Contact Us
Early in my career, I watched a fellow SOC analyst spend hours trying to find file hashes in a web proxy's logs. But file hashes don't appear in web proxy logs. That’s not a supported parameter. The analyst wasn't incompetent — nobody had ever mapped what that log source actually contained against what the detection use case required. The data was flowing, the tool was running, and the search was completely futile.
That gap between having a log source and understanding what it actually contains is where most detection programs quietly fall apart.
I've been doing this for almost 20 years: SOC work, incident response, and building detection programs from scratch for organizations across industries and continents. The pattern I see most consistently is teams operating with blind spots they don't know they have: writing rules against log sources that lack the right fields, mapping coverage against MITRE ATT&CK without a systematic way to measure what "covered" actually means, or maintaining rules they wrote years ago and never revisited.
At SANS Secure Your Fortress 2026, I walked through a practical framework and a set of tools that can bring real rigor to detection engineering. The kind that lets you actually answer the question: Where are my gaps, and what do I do about them?
I'll borrow a line I first heard from Keith McCammon, the CISO at Red Canary, because it grounds everything that follows: The goal is to detect most threats most of the time. Not all threats, and not every second of every day, but most threats, most of the time. That sounds like settling, but it's just being intellectually honest. The moment someone asks me, "are we secure?" I get uncomfortable, not because I don't know enough, but because you never fully know the picture, and the right goal isn't perfection. The goal must be continuous improvement against a clearly understood gap, and to close gaps, you first have to see them.
Before I get into the tools, I want to name the core challenges I see consistently. They're not exotic; they're the same four things threatening the success of detection programs at organizations of all sizes.
Inadequate log source quality. Having a log source isn't enough. The question is whether it gives you the fields you actually need to build the detections you want. If your use case is incident response, do your current log sources contain the event IDs you need to detect attacker behavior? For most organizations, the honest answer is not entirely.
Visibility gaps. Even if your log sources have the right fields, coverage can be incomplete. I had a client in Germany where my team was catching grief for failing to detect certain activity. It turned out the client had never actually forwarded over the logs from the affected systems. You can have perfect detection logic and zero detections if the logs aren't flowing. Visibility means knowing you're actually receiving telemetry from everything that matters, not just assuming you are.
Incomplete ATT&CK mapping. MITRE ATT&CK is a living framework. The enterprise version has 15 tactics, over 200 techniques, and several sub-techniques, and that number changes regularly. I extracted the persistence tactic during my talk and found 22 techniques listed. A screenshot from the week before showed 23. The point isn't to memorize all of them. The point is to remember that your coverage map is a moving target, and if you're not systematically tracking it, you're not maintaining it.
Rules that nobody loves. Detection rules aren't write-once objects. They need maintenance as attackers obfuscate, environments change, and new methods emerge. A rule that was solid three years ago may miss today's variant entirely. The discipline of revisiting, testing, and updating detection logic is just as important as writing it in the first place, and it's the part that most teams deprioritize.
If you're going to build systematic detection coverage, you need to understand the ATT&CK framework at more than a surface level.
The framework describes attacker behavior in three layers:
The framework documents specific procedures from tracked threat actors, such as which malware families have been observed using which sub-techniques, and in some cases, which specific artifacts they leave behind. If I'm building a detection for boot or logon autostart execution via registry run keys (T1547.001), I can look at the procedures tab and see that Andromeda establishes persistence by dropping a specific file. That's a great detection lead. Most teams never get that specific.
Knowing ATT&CK is one thing. Systematically measuring your coverage against it is another, and that's where DeTT&CT comes in.
DeTT&CT is an open-source Python tool built specifically for this problem. It lets you describe your environment, your log sources and their quality, and your detections, and then visualize that coverage against the ATT&CK matrix in MITRE Navigator. What you get out the other end is a heat map that shows you, technique by technique, where you're covered, where you're partial, and where you're blind.
The tool has a few core components: a Python CLI, YAML configuration files that describe your environment, and a web interface to allow easy configuration. There's also a scoring table Excel file that walks through the specific criteria for each component. The output is a YAML file, which you can easily convert into a JSON file that you import directly into ATT&CK Navigator.
What I want to walk through is the scoring model, because this is where the rigor comes from.
DeTT&CT uses three separate scoring dimensions. Understanding what each one measures is what makes the tool more useful than just a visualization.
Detection scores measure the quality of your actual detection logic against a given technique. The scale runs from -1 (nothing) to 5 (excellent). A score of 3 (moderate) means you have a detection that works against known patterns, but you're likely missing edge cases: obfuscated command lines, fileless variants, or behavioral outliers. Getting to a 4 or 5 means moving from signature-based matching to behavioral detection, correlation across multiple data sources, and more sophisticated logic. Most organizations, when they're honest about it, are living around 3 for their best-covered techniques, and much lower across the full matrix.
Visibility scores measure how much telemetry you're actually receiving that would support detection. The scale runs from 0 to 4. A score of 2 (limited) means you have some relevant log sources but nowhere near complete coverage. A 3 might mean you're logging process creation and authentication events but missing command-line arguments, so you can see that a process was spawned but not what it was instructed to do. If you have 2,000 hosts and you're only receiving telemetry from 50, your visibility score should reflect that.
Data source scores measure the maturity of individual log sources across five criteria: device completeness (are you collecting from all relevant endpoints?), data field completeness (does the log contain the fields you need?), timeliness (are timestamps accurate and ingestion delays acceptable?), consistency (are field names and types uniform across sources?), and retention (how long are you keeping the data?).
When you're filling out DeTT&CT for a data source like SSH logs, you might be scoring a 2 or 3: limited retention, missing enrichment context, partial field completeness. For your EDR logs, if you're running CrowdStrike or Defender with full telemetry and reasonable retention, you might be scoring a 4. That contrast is exactly what you want to surface. It tells you where your detection program is genuinely strong and where it's running on faulty assumptions.
Here’s how the workflow actually runs.
You start by adding data sources in the DeTT&CT web interface or CLI: process creation, user authentication, network connections, whatever you're ingesting. For each one, you fill in the scoring criteria I described above. This takes real knowledge of your environment, but it doesn't have to be perfect. An honest 3 is more useful than an optimistic 5 that doesn't reflect reality.
Once your data sources are scored and saved, you export a YAML file and convert it to JSON. You import that JSON into ATT&CK Navigator. What you see immediately is a technique heatmap where coverage lights up based on your telemetry. Techniques your log sources touch show color, and techniques they don't touch remain dark. That color map is your coverage baseline.
Add a second data source — say, process creation with Sysmon — and watch the heatmap change. Dozens of additional techniques suddenly light up across the board because process creation telemetry touches so many ATT&CK techniques, and the coverage shift is immediate and dramatic. When I showed this live, starting from just user authentication logs and adding process creation, the before and after was striking. A heatmap that was mostly dark suddenly had color across execution, persistence, privilege escalation, defense evasion, and more. That moment of seeing your actual coverage rendered visually for the first time has a way of making the problem very real, very fast. It makes the connection between log sources and detection coverage concrete instead of theoretical.
I've put together a Detection Coverage Scorecard you can download and fill out against your own environment — it maps directly to these three dimensions and gives you a structured baseline to bring to your next team meeting.
One of DeTT&CT's most useful capabilities is the ability to map your coverage against specific threat actor groups tracked by MITRE.
Using the group command, you can generate a heat map that shows your current log source quality and detection coverage specifically against the techniques used by known adversary groups. If your organization operates in a sector that puts you in APT41's targeting scope, you can ask a direct question: given what we're collecting today, how much of APT41's known technique repertoire would we actually detect?
In ATT&CK Navigator, you can then layer these views — your current detection coverage on one layer, APT41's techniques on another — and then use Navigator's built-in formula tools to highlight the gaps. The techniques that are low-scoring in your coverage layer but prominent in the threat actor layer are your highest-priority work items. This gives you a structured, reproducible prioritization process.
DeTT&CT takes this one step further by letting you layer in your defensive controls and mitigations. If you've deployed Active Directory hardening, endpoint AV, DLP, and network traffic filtering, you can map those controls against the same threat group view. The result is a color-coded picture: green where your defenses have real coverage against that group's known techniques, and red where they don't. That red is your roadmap — it's a structured, evidence-based view of where you have actual gaps against actual adversaries. That output belongs in conversations with your SOC team, your leadership, and your roadmap planning.
Detection engineering is one of those disciplines where the work never ends, but that doesn't mean you can't make meaningful progress quickly. Here's where I'd start:
If this framework resonates, everything I covered here goes much further in SEC555: Detection Engineering and SIEM Analytics. The course covers detection content development, SIEM tuning, coverage measurement, and how to build a detection program that actually improves over time. That's exactly the problem this post is about. If you've ever felt like your team is running on the treadmill but not gaining ground, that's where to go next.
Watch the full Secure Your Fortress presentation here: https://youtu.be/el17__sCRRE


Nick Mitropoulos is a SANS Certified Instructor and author of SEC555: Detection Engineering and SIEM Analytics. As CEO of Scarlet Dragonfly and a veteran of SOC and incident response leadership, he equips students with real-world skills in detection engineering. Nick also serves on the GIAC Advisory Board, SANS CISO Network, and faculty of the SANS Technology Institute.
Read more about Nick Mitropoulos