4 Days Left to Save $400 on SANS Security East 2015, New Orleans

Intrusion Detection FAQ: Can you explain traffic analysis and anomaly detection?

Kevin Liston

Introduction
The ideal Network Intrusion Detection System will efficiently and effectively classify network traffic between benign and belligerent. A great deal of research and work in Network Intrusion Detection involves the development of attack signatures. Protected network traffic is fed into a matching-engine and alerts are generated when the network traffic matches a given attack signature. An intrusion analyst uses these alerts to determine how to allocate analysis resources. “Traffic analysis is the branch of signal intelligence analysis that deals with the study of external characteristics of signal communications. The information was used: (1) to effect interception, (2) to aid cryptanalysis, (3) to rate the level and value of intelligence in the absence of the specific message contents, and (4) to improve the security in the communication nets.” (Nichols, p71.) The performance of a Network Intrusion Detection System can be more effective if it includes not only signature matching but also traffic analysis. By using traffic analysis, anomalous traffic is identified as a potential intrusion, no signatures are involved in the process, so it more likely to detect new attacks for which signatures have yet to be developed. Traffic analysis deals not with the payload of a message, but its other characteristics such as source, destination, routing, length of the message, time it was sent, the frequency of the communication, etc. Traffic payload is not always available for analysis, the traffic may be encrypted, VPN links may be flowing through your monitoring area, or it may simply be against policy to analyze packet payload. For the purposes of Network Intrusion Detection we gather these characteristics either from the actual network traffic itself (via a method such as tcpdump,) or by log-files from network sensors such as firewalls or routers (Lee and Stolfo.) These data are processed for visualization or by data mining techniques to generate alerts and provide useful information to the analyst to aid them in the decision on how to allocate analysis resources.

The Aim of Network Intrusion Detection
The intent of a Network Intrusion Detection system is to guide the analyst towards network-events that are malicious. The two major approaches are misuse detection and anomaly detection. Pattern-matching solutions primarily use misuse detection. They employ a library of signatures of misuse, which are used to match against network traffic. The weaknesses of these systems are: variants, false positives, false negatives, and data overload. Since they rely on signatures, a new variant of an attack can be created to evade detection. Additionally, the signatures themselves can create false positives if they are not written correctly, or if the nature of the attack is difficult to isolate from normal traffic characteristics. A signature-based system cannot detect attacks for which is has no signature—they don’t react well to the unknown. Data overload can occur when a sensor, or an analyst is presented with too much information to analyze effectively (Phung.) Traffic analysis performed on network traffic can mitigate the limitations of signature-based systems, because they are anomaly detectors. It is important to note that they cannot replace signature-bases systems; ideally a analyst would have both tools at his disposal. A system based on traffic analysis can detect attack variants because it is not looking for the pattern of the attack, but triggering on the anomalous nature of the connection (either from a strange IP, or to a strange port, or of an odd packet length or flag-setting.) False positives are also a weakness of anomaly detection, but if the alerts from both methods can be correlated first, the relevance of the alert will improve. The strength of anomaly detection is its low rate of false negatives. New attacks, for which signatures have not been developed for the signature-based system to trigger on, will be anomalous by nature. An anomaly-based detection system might not catch the latest IIS UNICODE exploit, but the behavioral change of the compromised system will get its attention. Reducing data overload is accomplished by data-mining and visualization techniques. Abstracting the data and presenting it visually to the analyst can detect anomalies and patterns that the heuristics of the traffic analysis system could not.

A Multi-dimensional Model
A given packet can be broken down into a number of fields such as protocol, source IP, destination IP, ports used and flag settings (in the case of TCP or UDP,) or message type (in the case of ICMP,) and length. A field is either numerical, or can be converted to a numerical range via a mapping function (e.g. mapping IP numbers to integer ID numbers.) If the information captured by a packet has n fields, then a packet can be expressed as a vector of n elements (i.e. an n-tuple) in an n-dimensional vector space. This gives a unique spatial-representation of each event and creates a context for the network activity (Girardin, p. 4.) Within this n-dimension vector space events can be correlated, compared, and visualized. In order to satisfy the need to communicate clearly to the analyst, one can employ visualization and data mining to produce useful graphs and alerts.

The Quarry
A Network Intrusion Detection System provides the analyst with not only information about what has happened, but potentially what is about to happen. If an analyst is able to detect an attacker’s reconnaissance of the protected network, and the alert arrives in time, an attack can potentially be thwarted. An attacker will probe your defenses, and this will leave traces that can be detected. At this point the analyst is much like a tracker, attempting to infer the intent behind the signs left (Carss, P 22.) Scans have footprints, which can be defined as the set of port and IP combinations scan is targeting (Stainford, Hoagland, and McAlerney pgs 2-4.) They characterize a horizontal scan as one that searches a group of IP numbers for a single port, and a vertical scan as a single IP being scanned for multiple ports. Other footprint geometries can be described, such as box scanning—a combination of vertical and horizontal. When plotting destination port versus destination IP numbers these patterns become prominent. The size of the footprint can be calculated from the sum of the IP/port combinations used in the scan. This size can serve as a metric on how difficult it will be to detect a scan. Clearly an NMAP scan on a server will be easier to detect than a scan for port 53 on 4 machines in the network.

The response to a scan’s stimuli also forms a detectable footprint. SYN-ACK, responses from open TCP ports, ICMP port unreachable messages from UDP inverse-scans, drops or rejects logged by a firewall, can be used to detect a scan and evaluate how much information the attacker gained from the scan. These are all responses to the scan, much like disturbed pebbles or broken foliage tell the passing of your quarry.

The attacker will often employ some sort of deception to disguise their reconnaissance scan. Some methods of deception can successfully elude simple port-scan detectors. An attacker can alter the scanning software to randomize the hosts scanned, or slow the scan down to cover a larger time-window, or randomize the period between probes. They can blur the signature of a single scan packet by randomizing non-essential fields such as source port of the scan, the sequence or ack number, or IP id. These techniques can evade simple port-scan detection software based on signatures, sequential scan detection, or exceeding x-events over a y-second threshold. Furthermore, the true source scan can be disguised by hiding with-in forged scans, or employing distributed scanning. A scan hiding within a smoke-screen of other scans does little to disguise that scanning is going on—in fact it draws a lot of attention to the scanning-event, but it can protect the identity of the source. Distributed scanning, on the other hand, can be difficult to detect if a system is simply looking at events correlating source IP, destination IP, and destination port (Stainford, Hoagland, and McAlerney, pgs. 4-5.)

One special method employed by some scanners to avoid detection is the stealth-scan. This is a bit of a misnomer, since from the point of view of most Intrusion Detection systems these scans, in the words of SNORT’s author Marty Roesch, “are more like sore-thumb scans.” These scans operate by using illegal flag-settings that can evade some simple packet filters. Perhaps penetration-scan is better label for the technique.

Anomaly Detection Tools
Traffic analysis is performed through visualization and data mining techniques. Recall that network events can be represented as vectors in an n-dimensional vector space. The dimension of this vector space can be reduced in intelligent ways in an attempt to highlight detectable patterns. A simple method of dimension reduction would reduce the vector space down to destination IP, and destination port, and this reduced space would be visualized as a scatter-plot. From this plot an analyst can visually detect horizontal, and vertical scan footprints. Another simple dimension reduction method would reduce the space down to the source and destination IP of the events. This plot would illustrate which machines are communicating with each other. As the data space is reduced, information is lost, so it is important that a number of reduction and visualization methods are employed, in order to give a more complete picture to the analyst. An analyst could use more sophisticated mapping techniques, such as neural-network-computed self-organizing maps (Girardin,) or spicules (Vert, Frincke, and McConnell,) in an attempt to gather a higher-resolution picture of the status of the network. These visualization techniques can work from captured network traffic, or network equipment logs. In addition to visualization, the results of data mining can be compared to heuristics to detect patterns or anomalies. Mark Prager describes a technique of reducing firewall logs to detect scan and DoS scans (Pragger.) Additional tools for generating alerts from log files are available from http://www.spitzner.net/.

SPICE/SPADE
Silicon Defense’s SPICE (Stealthy Portscan and Intrusion Correlation Engine) project is a DARPA-sponsored development-effort whose aim is to build a better mousetrap capable of detecting stealthy port scans. SPICE consists of two components, an anomaly sensor and a correlation engine. SPADE is the anomaly detector, which acts as a plug-in preprocessor to SNORT. The correlation engine is still under development.
v Each packet coming into the anomaly detector is assigned a anomaly score A(x). This score is calculated from the negative log of the probability of the event, P(x). I.e., A(x) = - log(P(x)) (Staniford, Hoagland, and McAllerney, P 4.) A sharp eye would note that their anomaly score is close, if not equivalent to the calculation of a signal’s Entropy (Shannon, p 13.)
v The calculation of P(x) is based on observed network traffic, since network traffic, as a signal, is non-ergodic, and thus not subject to universal computation of probability distributions for all possible signals (Pierce p57-59.) SPADE uses four methods of calculating P(x): P(destination IP, destination port), P(source IP, destination IP, destination port), P(source IP, source port, destination IP, destination port), and a Bayes network approximation of P(source IP, source port, destination IP, destination port.) From observation, a packet to port 80 on a web-server will be more probable than say, port 37337 to the same server. The higher P(x) is, the lower A(x) will be. If A(x) exceeds the provided threshold, SPADE will generate an alert.

In their published paper on SPADE, they measure the performance of the system using efficiency (ratio of true positives to all positives,) and effectiveness (ratio of true positives to all trues.) From their results, it appeared that the most effective and efficient method was P(destination ip, destination port), or joint-2 method, which is now the default probability-setting for SPADE. It was found that filtering the calculation to include only the protected network further improved efficiency and effectiveness (Staniford, Hoagland, and McAllerney, P 13.) The anomaly threshold could be lowered when filtering was used since the probabilities were matched against local IP/port pairs, which is a smaller space to consider than the rest of the Internet.

The settings for SPADE need tuning to match the protected network. One tunable factor is the alert-threshold. This can be set manually and tuned by the analyst, or SPADE itself can be set to set its own threshold level. In default learning mode, SPADE will monitor network traffic for 24 hours. Then it will calculate the threshold level required to create 200 alerts in that monitoring period. The length of the monitoring period and the number of alerts to generate are selectable.

As it runs, SPADE will generate a probability table of observed network traffic. This table contains critical information and should be protected and backed-up. If this file is lost, SPADE will need to be retrained, exposing your network as the history is rebuilt and alerts are not generated. When operating in survey-mode, SPADE will generate reports on observed probability distributions of the network traffic. SPADE can be placed into statistical mode if one wishes to view the probability tables on a regular basis.

Currently, SPADE simply generates alerts on packets whose anomaly score (as calculated by SPADE,) exceeds the anomaly threshold level. These alerts are logged along with the other SNORT alerts. It is the correlation engine, which is still under development, which promises to detect the stealthiest of port scans.

The correlation engine is fed alerts from the anomaly detector. The alerts contain the event, and the anomaly score. The correlation engine will keep an event in memory based on its anomaly score. The higher the score, the more anomalous the event, thus the longer it will keep its state. The correlation engine then attempts to link the events into groups to possibly link rare events to a single cause. Links between a given pair of events are calculated by a series of heuristic functions. There are four basic heuristic functions, and the connection between two events is scored as a combination of the heuristic functions’ results. If the source IP or the destination port or network are the same in the two events, a given heuristic would fire. A second heuristic would look for events close to each other in time, or in the n-dimensional space (a Euclidean difference.) A third would link two events that were off by one IP number, or one source port number, or one destination IP. Another heuristic would detect covariance relations (such as increases of one in destination IP and destination port.) The connection function would be a combination of these heuristic outputs. The correlation engine would build a graph linking related events, and alert the analyst of these correlations (Staniford, Hoagland, and McAllerney, P 8.)

Future Steps
Once SPICE itself is released, Silicon Defense intends to apply the tool to detect and track worms and DDoS attacks, in addition to stealthy port scans (Staniford, Hoagland, and McAllerney, P 15.) In the field of traffic analysis and data mining there is plenty of room for work in Intrusion Detection Fusion, where the logs and alerts from different NIDS systems, firewalls, and routers are synthesized into one data-space for analysis (Girardin, P 12.)

Conclusion
There is still much work to be done in the field of anomaly-based detection. Misuse detection based on signature matching has limitations; these limitations can be mitigated through the use of anomaly-based detection. The fusion of both misuse-detection and anomaly-detection techniques will result in a more effective and efficient Network Intrusion Detection System.

References
Carss, Bob. The SAS Guide to Tracking. New York: Lyons Press, 2000.

Girardin, Luc. “An eye on network intruder-administrator shootouts.” URL: http://www.usenix.org/event/detection99/full_papers/girardin/girardin_html/index.html (June 6, 2001)

Lee, Wenke and Stolfo, Salvatore J. “Data Mining Approaches for Intrusion Detection.” URL: http://www.cs.columbia.edu/~wenke/papers/usenix/usenix.html (July 9, 2001)

Nichols, Randall K. ICSA Guide to Cryptography. New York: McGraw-Hill, 1999. 71-75.

Phung, Manh. “Data Mining in Intrusion Detection.” Intrusion Detection FAQ. October 24, 2000. URL: http://www.sans.org/resources/idfaq/data_mining.php (July 9, 2001)

Pierce, John R. An Introduction to Information Theory: Symbols, Signals and Noise, Second Revised Edition. New York: Dover Publications, Inc., 1980.

Prager, Mark. “Firewall Log-Checking Techniques.” Sys Admin. August 2001: 33-37.

Shannon, Claude E. “A Mathematical Theory of Computation” October 1948. URL: http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf (July 19, 2001)

Staniford, Stuart, Hoagland, James A., and McAlerney, Joseph M. “Practical Automated Detection of Stealthy Portscans.” URL: http://www.silicondefense.com/pptntext/spice-ccs2000.pdf (July 19, 2001)

Vert, Greg, Frincke, Deborah A., and McConnell, Jesse C. “A Visual Mathematical Model for Intrusion Detection.” URL: http://citeseer.nj.nec.com/vert98visual.html(July 19, 2001)