Get an iPad mini, ASUS ZenScreen LED Monitor, or $350 Off with OnDemand Training thru 5/19

Reading Room

Subscribe to SANS Newsletters

Join the SANS Community to receive the latest curated cyber security news, vulnerabilities and mitigations, training opportunities, and our webcast schedule.

Artificial Intelligence

Featuring 2 Papers as of March 10, 2021

  • Malware Detection in Encrypted TLS Traffic Through Machine Learning Graduate Student Research
    by Bryan Scarbrough - March 10, 2021 

    The proliferation of TLS across the Internet leads to a safer environment for the end user but a more obscure setting for the network defender. This research demonstrates what can be learned using Machine Learning analysis of TLS traffic without decryption. It applies a novel approach to TLS analysis by analyzing data available in the unencrypted portion of the handshake combined with Open-source Intelligence (OSINT) data about Internet Protocol (IP) addresses and domain names. The metadata is then analyzed using three different machine learning algorithms: Support Vector Machine (SVM), One-Class SVM (OC-SVM), and an Autoencoder Neural Network. This research also addresses the imbalanced data distribution between malicious and benign traffic with the OC-SVM and the Autoencoder Neural Network. Finally, this research demonstrates that when using the correct header data the SVM and OC-SVM classify malware with a more than 99% F2 score and the Autoencoder approximately 95% F2.

  • Times Change and Your Training Data Should Too: The Effect of Training Data Recency on Twitter Classifiers Graduate Student Research
    by Ryan O'Grady - July 11, 2018 

    Sophisticated adversaries are moving their botnet command and control infrastructure to social media microblogging sites such as Twitter. As security practitioners work to identify new methods for detecting and disrupting such botnets, including machine-learning approaches, we must better understand what effect training data recency has on classifier performance. This research investigates the performance of several binary classifiers and their ability to distinguish between non-verified and verified tweets as the offset between the age of the training data and test data changed. Classifiers were trained on three feature sets: tweet-only features, user-only features, and all features. Key findings show that classifiers perform best at +0 offset, feature importance changes over time, and more features are not necessarily better. Classifiers using user-only features performed best, with a mean Matthews correlation coefficient of 0.95 ± 0.04 at +0 offset, 0.58 ± 0.43 at -8 offset, and 0.51 ± 0.21 at +8 offset. The R2 values are 0.90, 0.34, and 0.26, respectively. Thus, the classifiers tested with +0 offset accounted for 56% to 64% more variance than those tested with −8 and +8 offset. These results suggest that classifier performance is sensitive to the recency of the training data relative to the test data. Further research is needed to replicate this experiment with botnet vs. non-botnet tweets to determine if similar classifier performance is possible and the degree to which performance is sensitive to training data recency.

Most of the computer security white papers in the Reading Room have been written by students seeking GIAC certification to fulfill part of their certification requirements and are provided by SANS as a resource to benefit the security community at large. SANS attempts to ensure the accuracy of information, but papers are published "as is". Errors or inconsistencies may exist or may be introduced over time as material becomes dated. If you suspect a serious error, please contact

All papers are copyrighted. No re-posting or distribution of papers is permitted. Graduate Student Research - This paper was created by a SANS Technology Institute student as part of the graduate program curriculum.