As computer systems and the Internet have grown in size, complexity and usage the demands placed upon those responsible for ensuring the continued operation and security of these systems has also grown. This has lead to a demand for automated systems for detecting malicious activity on both individual hosts and networks. In line with our capitalistic society where a demand exists suppliers will seek to meet that demand. This has lead to the development of a range of Intrusion Detection Systems. Some of these systems are available as free open source applications, while others are offered as commercial products. As a result any organisation considering implementing a IDS has a range of options available. The aim of this document is to outline various criteria that can be used to evaluate Network Intrusion Detection Systems.
It should be noted that any organisation seeking to implement a IDS is likely to have their own needs and requirements from the system. The environment in which the IDS will be implemented is likely to vary from one situation to another, as is the availability of staff, funding and other essential resources. The aim of this document is to provide a review of various methods that can be adapted to meet the needs of any organisation.
Definition of a Intrusion Detection System
A Intrusion detection system is generally considered to be any system designed to detect attempts compromise the integrity, confidentiality or availability of the protected network and associated computer systems. A Network Intrusion Detection System (NIDS) aims to detect attempted compromises by monitoring network traffic for indications that a attempted compromise is in progress, or a internal system is behaving in a manner which indicates it may already be compromised. A host based IDS (HIDS) monitors a single system for signs of compromise.
A important point raised by Ranum (1998) is that a Intrusion Detection System should only report intrusions that either will successfully compromise the target system, or have not been seen before. Thus attempts to exploit a known Windows 2000 vulnerability on a Solaris system should not cause a IDS to generate a alert (this event should still be logged). Currently most available IDSs do not provide this level of functionality. This is one example of a measure that should be used for evaluation of IDSs.
Published IDS evaluations
At this point it would be worthwhile to outline some of the evaluations that have been published, and discuss the difference between a evaluation and a comparison. For the purposes of this paper a evaluation a considered to be a determination of the level to which a particular IDS meets specified performance targets. A comparison is considered to be a process of 'comparing' two or more systems in order to differentiate between them. It is proposed that a organisation intending to implement a IDS will increase the likelihood of a successful implementation by establishing their own requirements of a IDS and then evaluating the available options to determine the level to which these requirements are met. This is the alternative to conducting a comparison of the available systems and then selecting the one which appears to be the 'best'.
The majority of published documents claiming to evaluate IDSs are conducted as comparisons, rather than evaluations. These documents serve as a useful starting point for any organisation considering the implementation of a IDS, however they may not prove sufficient, or as valid to a particular situation as is desirable. The list of following articles is presented in chronological order of publication, with most recent first. A brief summary of each article is provided with some discussion of evaluation techniques used.
Intrusion Detection System Comparisons
The NSS Group. (2001). Intrusion Detection Systems Group Test (edition 2). [online]http://www.nss.co.uk
This is a comprehensive report on 15 commercial and open source Intrusion Detection Systems. The first edition of the report was published in 2000, using slightly different performance tests and evaluating a range of systems that were available at the time (some of these systems were also included in the 2001 report). The NSS Group intend to continue to produce this report on a annual basis.
The evaluation of each IDS consists of two components. The first component is a qualitative analysis of the various features and functions of each product. This analysis is performed by IDS specialists, who have a range of experiences in the field. The comments and analysis of the various features are well considered and unbiased.
The quantitative component of consisted of four tests of the NIDSs on a controlled laboratory network. These test focused upon specific performance indicators, attack recognition, performance under load, ability to detect evasion techniques and a stateful operation test. The weakness of these tests is that the background traffic was generated using a Adtech AX/4000 broadband test system and a Smartbits SMB6000. Both of these traffic generators are designed to test network equipment, not Intrusion Detection Systems. Although the traffic generated consisted of valid IP packets, the traffic flow itself would be inconsistent with real life traffic. Two problems with this technique is firstly the likelihood of false positives being generated is reduced, if not eliminated and secondly that the actual attacks would differ significantly to the background traffic.
The advantage of these traffic generators is that they are capable of generating sufficient traffic to saturate the network. However the relevance of this type of test on a IDS is debatable. In a production environment it is unlikely that a network would be operating close to network saturation for any length of time. If this was the case the network would be redesigned or upgraded. For a in depth discussion of this topic see Ranum (2001)
v The greatest criticism of this testing process is the lack of testing for false positive alerts. However this report is the most comprehensive, in terms of products tested and scientifically rigorous evaluation of Intrusion Detection Systems of which the author is aware. Any organisation contemplating implementing a IDS must read this report.
Allen, J. Christie, A. William, F. McHugh, J. Pickel, J. Stoner, E. (2000) State of the Practice of Intrusion Detection Technologies. Carnegie Mellon Software Engineering Institute.
This publication covers a wide range if issues facing Intrusion Detection Issues, both in terms of functionality, performance and implementation. Section 3 of this paper discusses the performance of a number of IDSs available at the time. Although the authors did perform tests on various systems the testing methods and results are not directly mentioned. However this publication does discuss a wide range of issues relating to intrusion detection, and is highly critical of most of the systems tested at the time of publication. This document provides useful insights to important weaknesses of IDSs and a plethora of links to further information.
This publication also includes a list of recommended IDS selection criteria as a appendix. This list was originally published by Edwarrd Amoroso and Richard Kwapniewski. The author was unable to find a copy of the original document.
This list provides seven headings of topics of importance for IDSs. These are divided into two groupings, detection capabilities and operational capabilities.
Richard P. Lippmann, Robert K. Cunningham, David J. Fried, Issac Graf, Kris R. Kendall, Seth E. Webster, Marc A. Zissman(1999). Results of the DARPA 1998 Offline Intrusion Detection Evaluation, slides presented at RAID 1999 Conference, September 7-9, 1999, West Lafayette, Indiana.
Haines, J, W. Lippmann, R, P. Fried, R, P. Korba, J. & Das, K. (1999) The 1999 DARPA Off-Line Intrusion Detection Evaluation.
Haines, J, W. Lippmann, R, P. Fried, R, P. Zissman, M, A. Tran, E. & Bosswell , S, B. (1999) DARPA Intrusion Detection Evaluation: Design and Procedures. Lincoln Laboratory, Massachusetts Institute of Technology.
This series of publications is a combined research effort from Lincoln Laboratory, DARPA and the American Air force. These combined publications refer to two comprehensive evaluations of IDSs and IDS technologies carried out on behalf of and with the assistance of DARPA. The aim of these evaluations were to assess the current state of IDS within the US defence and government organisations. These evaluations attempted to quantify specific performance measures of IDSs and test these against a background of realistic network traffic.
The performance measures used by these evaluation were: a ratio of attack detection to false positive, ability to detect new and stealthy attacks, a comparison of host vs. network based systems to detect different types of attacks, the ability of anomaly detection techniques to detect new attacks, improvements between 1998 and 1999, the ability of systems to accurately identify attacks. The research also attempted to establish the reason each IDS failed to detect a attack, or generated a false positive.
Both the 1998 and 1999 evaluations identified a number of weaknesses with existing IDSs. A number of these issues have since been resolved, while others are still valid. The testing process used sample of generated network traffic, audit logs, system logs and file system information. This information was then distributed to various evaluators who would provide the appropriate data to the Intrusion Detection Systems. This ensured each system was provided with identical data, whilst allowing proper configuration of each system.
A number of mass media publications, both online and printed have published comparisons of Intrusion Detection Systems. However the articles reviewed by the author were lacking in scientific rigor and tended to depend upon qualitative evaluations, based solely upon the impression of the journalist. The majority of these articles were extremely superficial in nature and in a some cases displayed a lack of understanding of IDS concepts by the relevant author. For this reason these articles have not been included.
IDS Evaluation Methodologies
Ranum, M, J. (2001). Experiences Benchmarking Intrusion Detection Systems. NFR Securityhttp://www.nfr.com
This article discusses a number of issues relating to techniques used to benchmark (ie compare) IDSs. This article has the interesting perspective of a expert in the field and a vendor of a commercial IDS. This article is highly critical of many published IDS comparison for their lack of understanding of IDS techniques, and thus ability to design appropriate testing methodologies.
In particular Ranum discusses the various measures that can be and have been used measure the performance of IDSs. Recommended measures include a ratio of false positives to attacks and positives to attacks. The point is also made of the importance of using real life traffic and attacks in the evaluation process, rather than simulated traffic and attacks.
Alessandri, D. (2001). Using Rule-Based Activity Descriptions to Evaluate Intrusion Detection Systems. :RAID 2001http://www.raid-symposium.org/raid2001/program.html
Alessandri proposes the use of a systematic description scheme for regulating the descriptions used to describe IDS functions. This approach should allow for a evaluation of IDSs based upon their descriptions, without necessitating experimentation. The disadvantage of this approach is the requirement of accurate descriptions. Currently such a approach does not exist so implementing it is not possible. This approach does hold a certain promise for the future.
Puketza, N. Chung, M. Olsson, R, A. & Mukherjee, B. (1996). Simulating Concurrent Intrusions for Testing Intrusion Detection Systems: Parallelizing Intrusions. University of California, Davis.
Due to the age of these documents the tests recommended are now quite dated. However the testing methodology used is still relevant. Puketza et al have developed a application to simulate specific attacks against a target system. These attacks can be scripted to run concurrently or in a specific sequence. The advantage of this methodology is that each test can easily be repeated for each device under test. One disadvantage of this application is that it does target older vulnerabilities in UNIX systems, which should not apply to a current operating system. However this can easily be updated to include more contempary attacks.
Puketza, N. Zhang, K. Chung, M. Olsson, R, A. & Mukherjee, B. (1996). A Methodology for Testing Intrusion Detection Systems. University of California, Davis.
Puketza, N. Chung, M. Olsson, R, A. & Mukherjee, B. (1997). A Software Platform for testing Intrusion Detection Systems. University of California, Davis.
Criteria for Evaluating Network Intrusion Detection Systems
The aim of this section is not to suggest a method of benchmarking NIDSs. Benchmarking as a method of evaluation is only valid in situations where the controlled environment has a close resemblance to the real life environment. As the performance of any NIDS is highly dependant upon its individual configuration, the network it is monitoring and it's position in that network benchmarking does not provide a definitive method of assessing a NIDS in a given situation. For further discussion on this topic see Ranum (2001). Rather this section aims to present a number of criteria that can be used to determine the suitability of a given NIDS for a particular situation or environment.
The first step in the evaluation process should be to identify the importance of each of the topics listed in the following sections. The importance of individual criteria is likely to change from organisation to organisation. In many cases a topic will also require the identification of features specific to the network and systems to be monitored.
Ability to identify attacks
The main performance requirement of a NIDS is to detect intrusions. However the definition of a intrusion is currently unclear. In particular, many vendors and researchers appear to consider any attempt to place malicious traffic on the network as a intrusion.
In reality a more useful system will log malicious traffic and only inform the operator if the traffic posses a serious threat to the security of the target host. Snort is tending towards this direction with the use a alert classification ranging from 1 to 10. With 1 representing a point of interest only and 10 representing a major threat to security.
Known vulnerabilities and attacks
All NIDSs should be capable of detecting known vulnerabilities. However research (Allen 2000), (NSS 2001) indicates that many commercial IDS fail to detect recently discovered attacks. On the other hand if a vulnerability or attack is known all systems should be patched, or workarounds applied thus the need for a NIDS to detect these events will be removed. Unfortunately the reality is that many systems are not patched or upgraded as vulnerabilities are discovered. This is clearly indicated by the number of system compromises that occur everyday, and the fact that most of the problems on the SANS top twenty list are predominantly old well known problems, with fixes available.
This must be the most important feature of any IDS. It is the IDS that can detect attacks that are not yet known which will justify its implementation. New vulnerabilities are discovered every day. By its very nature these are also the most difficult attacks to detect.
Relevance of attacks
This refers to the ability of the NIDS to identify the relative importance of any attack. To return to the example already given the use of a windows exploit on a UNIX system is not of high importance. However if the alert is raised, and the analyst must investigate every alert, a mechanism should be available to distinguish the relevance of different alerts.
Stability, Reliability and Security
Any IDS should be able to continue consistantly operate in all circumstances. The application and operating system should be capable of running for years without segmentation faults or memory leakage.
A important function of a NIDS is to consistently report identical events in the same manner. One disadvantage of a product using signature recognition is the ability of different users to configure different alerts to provide different messages. Thus traffic on one network may trigger a different alert to the same traffic on another system of the same type. A number of efforts are currently underway to solve this problem. Both securityfocus and CVE provide databases of known vulnerabilities, and exploits targeting them.
The system should also be able to withstand attempts to compromise it. If a attacker can identify a NIDS on a network it will could prove to be a valuable asset. It is also possible the attacker will attempt to disable the system using DoS or DDoS techniques. The system should be able to withstand all of these types of attack.
Information provided to analyst
The information provided to the analyst when a alert is raised should be enough to clearly identify the reason the event causing the event to be raised, and the reason this event is of interest. It should also provide links to vulnerability databases, such as bugtraq or CVE to assist the analyst in determining the relevance and appropriate reaction to a particular alert.
Identify target and source
The alert should also identify the source of the alert and the target system. Further information such as a whois or DNS lookup on a IP address would be also be beneficial.
Severity, potential damage
Identification of the potential severity of a attack. Some alerts are triggered by events to related to information gathering, such as port scanning. Although this information may be relevant if a more serious attack in launched the volume of scanning that occurs on the internet makes it impractical to investigate every time a network is scanned. On the other hand indication that a local hosts has been compromised by a trojan should be given higher priority.
Outcome of attack (Success or failure)
Another useful (although currently non existent) feature of a NIDS should be to indicate the outcome of a attack. In most cases a alert simply indicates that a attempt has been made. It is then the responsibility of the analyst to search for correlating activity to indicate the outcome of the attack. If a NIDS were to present the analyst with a list of other alerts generated by the target host, and a summary of other (non alert) traffic the evaluation of the outcome could be greeted accelerated.
Legal validity of data collected
The legal validity of the data collected by any IDS is of extreme importance if any legal will be taken against the attacker. A disturbingly large number of systems do not collect the actual network packets, instead they simply record their own interpretation of events. A more robust systems will also capture and store the network traffic, as well as raising the alert.
One of the greatest risks of a IDS is that once the system is implemented it will not be utilised to its full capabilities. Often the reason for this is due to the complexity of configuring and maintaining the system. It is also important that a IDS can be optimised for a particular network. There is no point in monitoring for web server exploits if there is not a web server on the network.
Ease or complexity of configuration
Unfortunately the usability of a system is usually inversely proportional to the flexibility and customisability of that system. The desire for flexibility can configurable of the system will be determined by the users of the system, the network in which it will be operating and the level of functionality required from the system.
If the system is to be maintained by a network administrator who is also responsible for standard network management he or she is unlikely to have the time available to optimise and configure the system so useability will be a primary consideration. On the other hand if a intrusion analyst if employed specifically to manage intrusion detection a more complex system with greater functionality may be desired.
Possible configuration options
The NIDS should be capable of being optimised for the systems on the network. As mentioned earlier there is no point in performing http analysis if a web server is not operating on the network under inspection. The level of traffic on the network will also determine the intensity of analysis performed. A simple system suitable for a single network segment with low traffic will be able to combine the sensor and analysis functions within the single unit. A network with high levels of traffic may need to separate the sensor and analysis functions across different hosts.
There are also a number of other configuration options that may apply to particular situations. For example in some situations the NIDS (ie analyst) may not be allowed to view the contents of packets on the network. In this case it should be possible to configure the NDIS to only examine (and store) the header information from the packets.