What You Will Learn

Harness Data Science and AI for Advanced Cybersecurity Threat Hunting Solutions

Data Science, Artificial Intelligence, and Machine Learning aren't just the current buzzwords, they are fast becoming one of the primary tools in our information security arsenal. The problem is that, unless you have a degree in mathematics or data science, you're likely at the mercy of the vendors. This course completely demystifies machine learning and data science. More than 70% of the time in class is spent solving machine learning and data science problems hands-on rather than just talking about them. You will leave the class not only understanding how these tools and techniques work, but understanding how to think about your data, making it into something that you can apply machine learning and AI techniques to.

Unlike other courses in this space, this course is squarely centered on solving information security problems - in other words, applied rather than theoretical. Where other courses tend to be at the extremes, teaching almost all theory or solving trivial problems that don't translate into the real world, this course strikes a balance. While this course will cover necessary mathematics, we cover only the theory and fundamentals you absolutely must know, and only so as to allow you to understand and apply the machine learning tools and techniques effectively. We show you how the math works but don't expect you to do it. The course progressively introduces and applies various statistic, probabilistic, or mathematic tools (in their applied form), allowing you to leave with the ability to use those tools and to be able to troubleshoot your results since you have developed strong intuitions about the underlying mathematics. The hands-on projects covered were selected to provide you a broad base from which to build your own machine learning solutions. If you want or need to know how AI tools like ChatGPT really work so that you can intelligently discuss their potential uses in your organization, in addition to knowing how to build effective solutions to solve real cybersecurity problems using machine learning and AI today, this is the class you need to take. Check out the extensive course description below for a detailed run down of course content and don't miss the free demo available by clicking the "Course Demo" button above!

NOTE: All the concepts in this course are discussed using Python examples. You should have an intermediate understanding of the Python language! There is no need to be a Python expert. If you have successfully written at least a handful of Python scripts, your Python knowledge is likely sufficient. We will review key Python data structures in class in the first section of the course. If you need assistance determining if your Python knowledge is sufficient, please contact us for more information.

This course is for cybersecurity professionals who are seeking to add machine learning, data science, and artificial intelligence skills to their repertoire. This course is also very useful for individuals with a data science background who are seeking to understand how to use cybersecurity data in meaningful ways for threat hunting, anomaly detection, and monitoring. Intermediate Python fluency is important. Pre-calculus mathematics skills are important, but not required.

"The course content's design is superb in my opinion. It begins by covering the fundamentals of data extraction from diverse sources using Python, followed by a dive into the basics of statistics. From there, it delves into ML models and DNNs. I appreciate the thoughtfulness behind this progression." -Viswanath Chirravuri, Thales

What Is Machine Learning?

Machine Learning is a branch of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed. It involves the development of algorithms that can analyze and make predictions or decisions based on data. This technology is fundamental in creating applications that adapt and become more accurate over time, revolutionizing industries by automating complex tasks and unlocking new insights from data.

Business Takeaways

This course will help your organization:

Generate useful visualization dashboards
Solve problems with Neural networks
Improve the effectiveness, efficiency, and success of cybersecurity initiatives
Build custom machine learning solutions for your organization's specific needs
This course prepares you for the GMLE certification

Skills Learned

Apply statistical models to real world problems in meaningful ways
Generate visualizations of your data
Perform mathematics-based threat hunting on your network
Convert the data you have into representations to which ML/AI techniques can be applied
Understand and apply unsupervised learning/clustering methods
Build Deep Learning Neural Networks
Build and understand Convolutional Neural Networks
Understand how to build representative synthetic data
Understand and build Genetic Search Algorithms
Understand the fundamentals of containerized deployment

Major Topics Covered Include

Data acquisition from SQL, NoSQL document stores, web scraping, and other common sources
Data exploration and visualization
Descriptive statistics
Inferential statistics and probability
Bayesian inference
Unsupervised learning and clustering
Deep learning neural networks
Autoencoders
Anomaly detection with neural networks
Loss fuctions
Convolutional networks
Embedding layers
Practical containerized deployment

Hands-On Machine Learning Training

The hands-on portion of SEC595 and especially suited to the student with a data science background who are seeking to understand how to use cybersecurity data in meaningful ways for threat hunting, anomaly detection, and monitoring. The course includes 30 hands-on labs and over 70% of the class is spent solving machine learning and data science problems hands-on.

Section 1: Python Refresher; Accessing, Manipulating, and Retrieving SQL Data; Accessing, Manipulating, and Retrieving NoSQL data: MongoDB; Webscraping for data acquisition
Section 2: Statistics Fundamentals: Medians and Means; Statistics Fundamentals: Variance, Deviations, and Robust Measures; Applications of Statistics to Data Identification; Probability, Beyes, and Phishing; Threat Hunting through Signals Analysis
Section 3: K-Means/KNN; Elbow Functions and PCA; DNSCAN for Clustering; Support Vector Classifiers; Support Vector Machines; Decision Trees; Random Forests
Section 4: Polyfit Regressions; Hello, World! Sentiment Analysis; Ham vs. Spam via Deep Learning; Identifying Protocols; Protocol Anomaly Detection
Section 5: Predictive Malware Identification -- Finding Zero Days; Ham vs. Spam, CNN Style; Multi-class text classifications via CNNs; Log Anomaly Detection using Autoencoders; Real-time Network Anomalies
Section 6: Solving CAPTCHAs: POC; Solving CAPTCHAs: Functional API; Solving Algorithms

"Labs and exercises have been very helpful, going over them a second time is helping to reinforce what I've learned this week, and to put it all in better context." - Blake Hickson

"The labs gave me the opportunity to use theory that we were taught during the training and gain some hands on experience." - Vasiliki Politopoulou

"SANS SEC595 emphasizes practical, hands allows participants get to work with Python scripts and tools to automate various aspects of information security. This approach ensures that students can apply what they learn immediately in their work." - Louis Valencia, US Government

Syllabus Summary

Section 1: Data Acquisition, Cleaning, and Manipulation
Section 2: Data Exploration and Statistics
Section 3: Essentials of Machine Learning: Trees, Forests, & K-Means
Section 4: Essentials of Machine Learning: Deep Learning
Section 5: Essentials of Machine Learning: Autoencoders
Section 6: Essentials of Machine Learning: Functional Models and Deployment

Additional Free Resources

Anaconda
TensorFlow (and supporting libraries)
Matploitb
VMWare Workstation/Player/Fusion

What You Will Receive

Jupyter notebooks of all labs and complete solutions
Sample data for real-world cybersecurity problems

Syllabus (36 CPEs)

Download PDF

Data Acquisition, Cleaning, and Manipulation
Overview
This section introduces some of the terminology in the data science and machine learning fields, in addition to introducing a number of the technologies that are used as data sources. Since the first step in any data science or machine learning project is to acquire data, the balance of the day is focused on hands-on exercises to prepare the student for these tasks.
The first necessary skill is the use of Python, our chosen language for this course. The only course prerequisite is a fundamental understanding of Python. If you've written even one line of Python, you are probably knowledgeable enough to get started! We will cover lists, arrays, tuples, dictionaries, comprehensions and then begin introducing the numpy variants.
Following the Python refresher the course provides some theory followed immediately by hands-on exercises to give you just enough knowledge of SQL, MongoDB, and webscraping to get real work done.
Exercises
- Python Refresher
- Accessing, Manipulating, and Retrieving SQL data
- Accessing, Manipulating, and Retrieving NoSQL data: MongoDB
- Webscraping for data acquisition
Topics
- Data Science
- Python
- SQL
- NoSQL
- Webscraping
Data Exploration and Statistics
Overview
This section begins with the fundamentals of statistics that matter for data science and machine learning. Following this introduction and hands-on exercises that provide practical uses for these techniques against real-world data, the course transitions to probability theory.
Probability theory is an extensive field of its own. Following the introduction of some fundamentals, the course works directly toward deriving the Bayesian theorem. Building on this introduction, students then engage in a hands-on lab that builds a useful Bayesian analysis tool, upon which students will improve later in the course.
The remainder of this section is translating the statistical knowledge gained into the field of signals analysis. After a discussion concerning the derivation and applications of the Fourier series, the Fast Fourier Transformation, and the Discrete Fourier Transformation, students use these tools in a real-world threat hunting activity.
Exercises
- Statistics Fundamentals: Medians and Means
- Statistics Fundamentals: Variance, Deviations, and Robust Measures
- Applications of Statistics to Data Identification
- Probabiltiy, Beyes, and Phishing
- Threat Hunting through Signals Analysis
Topics
- Statistics
- Robust Measures
- Probability
- Bayes Theorem and Inference
- Fourier Series and Related Derivations
Essentials of Machine Learning: Trees, Forests, & K-Means
Overview
The remaining 18+ contact hours of this course are spent learning about and immediately applying various machine learning models. After each topic is introduced and discussed, students engage in lengthy hands-on labs to develop an intuitive understanding and apply the technique to real problems.
The section begins with various clustering approaches and unsupervised machine learning. The exploration begins with Support Vector Classifiers, kernel functions, and Support Vector Machines. Following this discussion and exercises, we continue the clustering theme by considering the K-Means and KNN approaches. After working through examples in just two or three dimensions, we turn our attention to methods for determining the ideal number of clusters. With this done, we finally explore high-dimensional applications and dimensionality reduction through Primary Component Analysis. The DBSCAN algorithm is covered in some depth, with application made to threat hunting and efficient SOC analysis of large scale data.
The balance of this section is spent discussing Decision Trees. After a hands-on activity and discussion of the limitations of Decision Trees, we expand into Random Forests and explore hands-on how these provide better inferences in most cases. The section wraps up with a cluster-based approach to finding anomalies in user activity on a network.
Exercises
- K-Means / KNN
- Elbow Functions and PCA
- DNSCAN for Clustering
- Support Vector Classifiers
- Support Vector Machines
- Decision Trees
- Random Forests
Topics
- Support Vector Classifiers
- Support Vector Machines
- Kernel Functions
- Primary Component Analysis
- DBSCAN
- K-Means
- KNN
- Elbow Functions
- Decision Trees
- Random Forests
- Anomaly Detection
Essentials of Machine Learning: Deep Learning
Overview
The entire focus of this section is on the theory, development, and use of supervised learning approaches in the field of information security. Building on the mathematics and statistics covered in section 2, this section begins with linear regressions and ends with the application of deep learning neural networks to multi-class classification problems involving real-time network data.
The material is focused on using supervised machine learning and mathematics to create predictive models. The initial discussion and exercises center around forecasting and trends analysis for anomaly detection. Following this, the majority of the material focuses on classification problems.
Building on the Bayes approach used in section 2, this section introduces deep learning neural networks and fully connected dense networks through the development of a far more accurate phishing detection network. Following this, the course explores visualization and measurement of neural network training performance, in addition to discussing overfitting, overtraining, and how to identify (and avoid!) them.
The next portion of this section turns to categorical problems, during which students will build a real-time network protocol classification system. More importantly, students will implement anomaly detection in this classification system, a task typically reserved for unsupervised approaches.
Exercises
- Polyfit Regressions
- Hello, World! Sentiment Analysis
- Ham vs. Spam via Deep Learning
- Identifying Protocols
- Protocol Anomaly Detection
Topics
- Regression and fitting
- Loss and Error functions
- Vectors, Matrices, and Tensors
- Fundamentals of the Perceptron
- Dense Networks
Essentials of Machine Learning: Autoencoders
Overview
This section of the course is dedicated to expanding students' knowledge of deep learning solutions. The first half of the section is focused entirely on convolutional networks (CNNs). The class explores the application of CNNs to text classification problems, but also to predictive identification of zero-day malware.
The second half of this section of the course focuses on autoencoders. The class examines what autoencoders do, why they work, how to select a latent representation, and how reconstruction loss functions work. This knowledge is then applied to creating an automatic log anomaly detection solution that does not use any signatures or human intervention to identify anomalies. Building on this, students work on the building blocks for a large-scale ensemble autoencoder for detecting network threats.
Exercises
- Predictive Malware Identification - Finding Zero Days
- Ham vs. Spam, CNN Style
- Multi-class text classification via CNNs
- Log Anomaly Detection using Autoencoders
- Real-time Network Anomalies
Topics
- Convolutional Neural Networks
- Embedding Layers
- Applying CNNs to text problems
- Autoencoders
- Reconstruction loss measurements
- Creating ensemble autoencoders
Essentials of Machine Learning: Functional Models and Deployment
Overview
The final section of this course continues discussing Convolutional Neural Networks and the application of CNNs and fully connected networks for solving regression problems. The major focus of this section is on the creation of a deep neural network using TensorFlow's functional pattern, allowing you to build networks with complex structures, multiple inputs, and multiple outputs. The main task used to learn about these techniques will be using neural networks for both testing the quality of and solving CAPTCHAs. Whether you are on a red, blue, or purple team, you will learn how to think through and use machine learning to solve what amounts to a computer vision problem and to solve it at greater than 95% accuracy! Along the way you will also learn the key concepts behind the creation of representative synthetic data, how to build synthetic data with generators, and how things can go wrongly. You will also learn how to make use of data augmentation layers.
Following this project, the class covers the use of genetic techniques for hyperparameter optimization. Students are provided with a starting point for genetic optimization for use on their own after class.
The final discussion and demonstration in the course covers practical deployment approaches, including stand-alone deployments for real time critical applications and, for less time critical applications, the more common containerized approaches that can be used with Docker, Rancher, or Kubernetes.
Exercises
- Solving CAPTCHAs: POC
- Solving CAPTCHAs: Functional API
- Solving CAPTCHAs: Split model
Topics
- Convolutional Neural Networks and Regressions
- Functional definition of Neural Networks
- Deep Learning Networks with Multiple Outputs
- Thinking about Machine Learning Problems
- Genetic Algorithms
- Deployment using Containers

GIAC Machine Learning Engineer

The GIAC Machine Learning Engineer (GMLE) certification validates a practitioner’s knowledge of practical data science, statistics, probability, and machine learning. GMLE certification holders have demonstrated that they are qualified to solve real-world cyber security problems using Machine Learning.

Anomaly detection and optimization
Convolutional neural networks
Data acquisition
Data exploration and visualization
Data manipulation and analysis
Deep learning neural networks
Inferential statistics and probability
Loss functions
Probability and inference
Python scripting
Supervised and unsupervised learning

More Certification Details

Prerequisites

All the concepts in this course are discussed using Python examples. You should have an intermediate understanding of the Python language! There is no need to be a Python expert. If you have successfully written at least a handful of Python scripts, your Python knowledge is likely sufficient. We will review key Python data structures in class in the first section of the course. If you are up for the challenge, this course is for you!

Laptop Requirements

Important! Bring your own system configured according to these instructions!

A properly configured system is required to fully participate in this course. If you do not carefully read and follow these instructions, you will likely leave the class unsatisfied because you will not be able to participate in hands-on exercises that are essential to this course. Therefore, we strongly urge you to arrive with a system meeting all the requirements specified for the course.

It is critical that you back-up your system before class. it is also strongly advised that you do not bring a system storing any sensitive data. Your system should meet these requirements:

Modern 64-bit processor (ARM/AMD/Intel) running Linux (Ubuntu or similar recommended, Linux kernel version 6 or higher), Windows 10 or later, or MacOS 11.x or later
A minimum of 16 GB RAM
80 GB Free Hard Drive Space
Your account must have the necessary rights to install Anaconda or Anaconda must be preinstalled.

Your course media will be delivered via download. The media file for class is large, more than 50GB. You need to allow plenty of time for the download to complete. Internet connections and speed vary greatly and are dependent on many different factors. Therefore, it is not possible to give an estimate of the length of time it will take to download your materials. Please start your course media downloads as soon as you get the link. You will need your course media immediately on the first day of class. Waiting until the night before the class starts to begin your download has a high probability of failure.

SANS has begun providing printed materials in PDF form. Additionally, certain classes are using an electronic workbook in addition to the PDFs. The number of classes using eWorkbooks will grow quickly. In this new environment, we have found that a second monitor and/or a tablet device can be useful by keeping the class materials visible while the instructor is presenting or while you are working on lab exercises.

If you have additional questions about the laptop specifications, please contact support.

Author Statement

"AI and Machine Learning are everywhere. How do the vendor solutions work? Is this really black magic? I wrote this course to fill an enormous knowledge gap in our field. I believe that if you are going to use a tool, you should understand how that tool works. If you don't, you don't really know what the results mean or why you are getting them. This course provides you a crash-course in statistics, mathematics, Python, and machine learning, taking you from zero to...I'm reluctant to promise 'Hero...' Let's say competent-person-who-can-solve-real-problems-today!"

- David Hoelzer

"I can think of no one else who could explain the material better. His deep understanding of the technology and his ability to present it in such a way that allowed those not as proficient to understand it was great." - Thomas L, US Military

Ways to Learn

OnDemand

Cybersecurity learning – at YOUR pace! OnDemand provides unlimited access to your training wherever, whenever. All labs, exercises, and live support from SANS subject matter experts included.

Live Online

The full SANS experience live at home! Get the ultimate in virtual, interactive SANS courses with leading SANS instructors via live stream. Following class, plan to kick back and enjoy a keynote from the couch.

View Available Dates & Time Zones

In Person (6 days)

Did someone say ALL-ACCESS? On-site immersion via in-classroom course sessions led by world-class SANS instructors fill your day, while bonus receptions and workshops fill your evenings.

View Available Dates & Locations

Who Should Attend SEC595?

Infosec professionals who want to understand machine learning
Professionals desiring to apply data science principles to real-world problems
Anyone who has tried to learn the basics but can't figure out how to translate your problem into something that can be solved with machine learning
Blue team and SOC members looking to identify anomalies and perform custom threat hunting

NICE Framework Work Roles:

Data Analyst (OPM 422)

"AI/ML for cybersecurity is poorly understood and misrepresented too often. This course provides that balance between what management needs to know in order to grow understanding of the technologies and hands-on experience." - Thomas L, US Military

"Automation is a critical skill in the field of cybersecurity. SANS SEC595 addresses this need by focusing on using Python to automate security tasks, making it highly relevant to the industry's demands." - Louis Valencia, US Government

See prerequisites

Need to justify a training request to your manager?

Use this justification letter template to share the key details of this training and certification opportunity with your boss.

Download the Letter

Reviews

I really like that this is pulling from experience rather than a textbook. The added anecdotes about the history behind various topics really helped pull it together for me.

Brian Morris

City of Austin

AI/ML for cybersecurity is poorly understood & misrepresented too often. This course provides that balance between what management needs to know in order to grow understanding of the technologies and hands-on experience.

Thomas L

US Military

This course covers a wide breath with great depth. I am excited to apply everything after the course.

Denise Berger

MITRE

Courses

Hands-On Simulations

Certifications

Ways to Train

Training Events & Summits

Free Training Events

Security Awareness

By Focus Area

By NICE Framework

DoDD 8140 Work Roles

By European Skills Framework

By Skills Roadmap

New to Cyber

Leadership

Degree and Certificate Programs

Watch & Listen

Read

Download

SANS Community Benefits

CISO Network

Team Development

Leadership Development

Security Awareness

Public Sector Partnerships

Sponsorship Opportunities

SEC595: Applied Data Science and AI/Machine Learning for Cybersecurity Professionals™

GIAC Machine Learning Engineer (GMLE)

Course Authors:

David Hoelzer

What You Will Learn

Harness Data Science and AI for Advanced Cybersecurity Threat Hunting Solutions

What Is Machine Learning?

Business Takeaways

Skills Learned

Major Topics Covered Include

Hands-On Machine Learning Training

Syllabus Summary

Additional Free Resources

What You Will Receive

Syllabus (36 CPEs)

Data Acquisition, Cleaning, and Manipulation

Overview

Exercises

Topics

Data Exploration and Statistics

Overview

Exercises

Topics

Essentials of Machine Learning: Trees, Forests, & K-Means

Overview

Exercises

Topics

Essentials of Machine Learning: Deep Learning

Overview

Exercises

Topics

Essentials of Machine Learning: Autoencoders

Overview

Exercises

Topics

Essentials of Machine Learning: Functional Models and Deployment

Overview

Exercises

Topics

GIAC Machine Learning Engineer

Prerequisites

Laptop Requirements

Important! Bring your own system configured according to these instructions!

Author Statement

Ways to Learn

Who Should Attend SEC595?

Need to justify a training request to your manager?

Reviews

Filters:

Register for SEC595

Loading...