A Call to Arms for Intrusion Detection Software Innovation
For over a generation of professionals, Moore's Law has guided strategic planning related to computer hardware and software development. The security industry is no exception. However, there is a looming cataclysmic shift in the manifestation of this reality; one that requires the focus and attention of our vendors, lest our network analysis be left in the digital dust.
Network analysis is hard. Be it the real-time analysis expected of IPS devices, or the cached analysis which is badly needed but never provided by our vendors, our ability to detect hostility is constrained by four fundamental factors: what we look for, how we look for it, the amount of data we need to sift through to find it, and the computational power available to execute said detections. It is the interdependence of these last components that stands to most immediately and severely impact our ability to analyze network traffic.
I'll start with the first two components. There is a necessary trade-off between what we look for, and how we look for it. The more accessible our definitions of hostility are, the more difficult optimization becomes. Take for example the Snort ruleset. The snort engine can be optimized around the functionality of the legacy rule language due to the limited set of options, but shared object rules open the full capabilities of C, complicating optimization. It seems likely that there is room for optimization of executing what the user's analytical desires are, but such increments will most likely be marginal.
We already make painful sacrifices in what we look for, and how we look for it. We don't decode MIME attachments in email. We don't deflate gzip compression in HTTP. We don't analyze entire data streams. Our shortcuts have been exploited by adversaries, which I have seen first-hand. Email with a trojanized attachment is the primary delivery mechanism of sophisticated adversaries, and ostensibly benign web pages contain malicious javascript or command-and-control commands placed at the end of the HTML, all of which avoid detection due to these analytical compromises. While moving from a wire speed to cached/delayed analysis paradigm may buy us enough time to gain back some of these cut corners, we'd still be hanging on by our fingernails. The bottom line is that we've already sacrificed more than we can afford in lost analysis.
I think I've made my point that we have pushed our existing methods to their limits, and any perturbation in this delicately balanced equation could spell disaster for us. This brings us to computational power and data volume, which translates in this case to network bandwidth. What I intend to show here is that we are on the cusp of an unprecedented divergence in the interplay of these parameters, one that can only be solved by a fundamental redesign of the code that underpins our tools.
I was unable to find an authoritative or rigorously peer-reviewed source to cite for either maximum CPU speed, or aggregate internet bandwidth over time. What I've compiled are ballpark numbers derived from multiple sources that seemed to be in approximate agreement. These are provided in Figure 1.
Figure 1: CPU clock speed and aggregate internet bandwidth over time
As you can see, the maximum available CPU speed in MHz has leveled off, while the growth of aggregate internet bandwidth usage is and will likely continue to be growing geometrically (derived from [1][2][3][4]). The CPU clock speed numbers are roughly reinforced by an article in a recent issue of Communications of the ACM [5], and the resulting need for parallel programming as a result has been echoed by a number of academics and practitioners in computer engineering and science fields [5][6][7], some of them quite directly and poignantly [8]. On the problem of clock speeds specifically, one author states flat out that "microprocessor clock speeds are no longer increasing with each new generation of chip fabrication." [9, p26].
In arguing my conclusion, there is an underlying assumption that most organizations that use IDS are experiencing similar growth as the internet as a whole. While this may not be universally true, the ubiquity of the factors driving this bandwidth consumption (rich content, Web2.0 apps, streaming media, etc.) supports this assumption as valid. I will acknowledge that this is not a rigorous scientific study, but feel that it convincingly makes my point: the delicate balance of processing cycles available to analyze network activity has been tipped, and parallelism is the only long-term solution to push the equation back into balance. Parallelism is by no means the only pressing need for IDS innovation, but it is certainly the one that impacts the broadest segment of the IDS customer space.
I was pleased to read that a new IDS, Suricata, has been recently released. Suricata is a multithreaded application, according to the authors. I have not delved into the details and cannot comment on its parallelism, but GPU support as a future feature is promising. If this is a sincere development intention, it appears the authors well understand the point in this post, and the corresponding state of the art in solving parallel computing problems. GPU's are emerging as a good commodity solution to parallel processing. This is covered in depth by a number of recent publications discussing parallelism, and I am by no means an expert in this field, so I will simply leave follow-up on this point as an exercise for the reader.
Update:
One key fact supporting my argument that I failed to mention is that Moore's Law is now being realized by an increase in cores per dye, rather than clock speed. The engineering innovations that permitted regular clock speed increases for so long have reached a local, possibly global, maximum in efficacy. Where we previously saw default increases in our applications' performance with each new generation of chip, we now have to work for such performance improvements by designing our code to be truly parallel and continue to exploit the
new manifestation of Moore's Law.
This is true for many application domains, and as I illustrate above, is an immediate need for ours to continue to meet ever present bandwidth growth.
Sources:
- Odlyzko, Andrew, Internet Growth: myth and reality, use and abuse, http://tlsoft.net/vault/wireless/internet.growth.myth.html, 2000.
- Telegeography Research, Global Internet Geography Executive Summary, http://som1.csudh.edu/fac/lpress/471/hout/telegeographygig_execsumm.pdf, 2008.
- Cisco, Inc., Cisco Visual Networking Index: Forecast and Methodology, 2008-2013, http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481360.pdf, 2008.
- Intel Corp, various documents & press releases, http://www.intel.com/
- Asanovic, Krste, et. al., A View of the Parallel Computing Landscape, Communications of the ACM, Vol 52, No 10, pp 55-67, 10/2009.
- Creeger, Mache, CTO Roundtable: Cloud Computing, Communications of the ACM, Vol 52, No 8, pp 50-56, 8/2009.
- Goth, Gregory, Entering a Parallel Universe, Communications of the ACM, Vol 52, No 9, 9/2009.
- Ghuloum, Anwar, Face the Inevitable, Embrace Parallelism, Communications of the ACM, Vol 52, No 9, pp 36-38, 9/2009.
- Wehner, Michael, Oliker, Lenny, Shalf, John, Low-Power Supercomputers, IEEE Spectrum vol 46 no 10, 10/2009.
Michael is a senior member of Lockheed Martin's Computer Incident Response Team. He has lectured for various audiences including SANS, IEEE, and the annual DC3 CyberCrime Convention, and teaches an introductory class on cryptography. His current work consists of security intelligence analysis and development of new tools and techniques for incident response. Michael holds a BS in computer engineering, an MS in computer science, has earned GCIA (#592) and GCFA (#711) gold certifications alongside various others, and is a professional member of ACM and IEEE.