October 24, 2000
In today’s world where nearly every company is dependent on the Internet to survive, it is not surprising that the role of network intrusion detection has grown so rapidly. While there may still be some argument as to what is the best way to protect a companies networks (i.e. firewalls, patches, intrusion detection, training, …) it is certain that the intrusion detection system (IDS) will likely maintain an important role in providing for a secure network architecture.
That being said, what does current intrusion detection technology provide us? For the analyst who sits down in front of an IDS, the ideal system would identify all intrusions (or attempted intrusions), and take or recommend the necessary actions to stop an attack.
Unfortunately, the marketplace for IDS is still quite young and a "silver bullet" solution to detect all attacks does not appear to be on the horizon or necessarily even plausible. So what is the "next step", albeit the "next phase" for intrusion detection? A strong case could be made for the use of data mining techniques to improve the current state of intrusion detection.
According to R.L. Grossman in "Data Mining: Challenges and Opportunities for Data Mining During the Next Decade", he defines data mining as being "concerned with uncovering patterns, associations, changes, anomalies, and statistically significant structures and events in data." Simply put it is the ability to take data and pull from it patterns or deviations which may not be seen easily to the naked eye. Another term sometimes used is knowledge discovery.
While they will not be discussed in detail in this report, there exist many different types of data mining algorithms to include link analysis, clustering, association, rule abduction, deviation analysis, and sequence analysis.
In order for us to determine how data mining can help advance intrusion detection it is important to understand how current IDS work to identify an intrusion. There are two different approaches to intrusion detection: misuse detection and anomaly detection. Misuse detection is the ability to identify intrusions based on a known pattern for the malicious activity. These known patterns are referred to as signatures. The second approach, anomaly detection, is the attempt to identify malicious traffic based on deviations from established normal network traffic patterns. Most, if not all, IDS which can be purchased today are based on misuse detection. Current IDS products come with a large set of signatures which have been identified as unique to a particular vulnerability or exploit. Most IDS vendors also provide regular signature updates in an attempt to keep pace with the rapid appearance of new vulnerabilities and exploits.
While the ability to develop and use signatures to detect attacks is a useful and viable approach there are shortfalls to only using this approach which should be addressed.
Data mining can help improve intrusion detection by adding a level of focus to anomaly detection. By identifying bounds for valid network activity, data mining will aid an analyst in his/her ability to distinguish attack activity from common everyday traffic on the network.
The concept of data mining has been around for years. Despite this data mining in intrusion detection is a relatively new concept. Thus there will likely be obstacles in developing an effective solution. One is the fact that even though the concept of data mining has been around for some time the amount of data to be analyzed and its complexity is increasing dramatically. As stated previously, it is possible for a company to collect millions of records per day which need to be analyzed for malicious activity. With this amount of data to analyze one can guess that data mining will become quite computationally expensive. Unfortunately, for some processing power or memory is not always cheap or available. Of course there may be the argument that you only need samples of the data in order to generate profiles, but there will also be the argument that analyzing anything, especially network traffic, without all the data could lead to false conclusions. Another obstacle will be tailoring data mining algorithms and processes to fit intrusion detection. An effort to identify how the data needs to be looked at in order to provide us with a better picture is surely integral in providing accurate and effective results.
Obviously data mining and anomaly detection is not a silver bullet for intrusion detection, nor should it be a replacement for misuse detection. The goal should be to effectively integrate anomaly detection and misuse detection to create an IDS which will allow an analyst to more accurately and quickly identify an attack or intrusion on their network.
Bass, Time. "IDS Data Mining." 4 Mar 1999. URL: http://www.silkroad.com/papers/html/ids/node4.html (10 Oct 00).
Gordeev, Mikhail. "Intrusion Detection: Techniques and Approaches." URL: http://www.infosys.tuwien.ac.at/Teaching/Courses/AK2/vor99/t13 (10 Oct 00).
Grossman, R.L. "Data Mining: Challenges and Opportunities for Data Mining During the Next Decade." May 1997. URL: http://www.lac.uic.edu/grossman-v3.htm (10 Oct 00).
Lee, Wenke and Stolfo, Salvatore. "Data Mining Approaches for Intrusion Detection." URL: http://www.cs.columbia.edu/~wenke/papers/usenix/usenix.html (12 Oct 00).
Rothleder, Neal. "Data Mining for Intrusion Detection." The Edge Newsletter. Aug 2000. URL: http://www.mitre.org/pubs/edge/august_00/rothleder.htm (9 Oct 00)