Judy Novak, Vern Stark, David Heinbuch
Johns Hopkins University Applied Physics Laboratory
This paper analyzes some recent traffic that was received on a sensor residing outside our site’s perimeter firewall. This sensor is running the network intrusion detection software Shadow. The activity drew attention because of the volume involved and the uniqueness compared to previous activity witnessed. Upon initial cursory examination, it was not obvious whether the activity was some kind of flood with the purpose of denial of service, a scan, or something else. The methodical analysis presented demonstrates how and why the incident was determined to be a concurrent scan by several hundred suspected zombie hosts.
Examination of an hour’s traffic on June 29, 2001 at 12:00 captured by a Shadow sensor positioned outside our site’s perimeter firewall revealed a large number of source hosts scanning what appeared to be our entire Class B address space for destination port 27374. Shadow provides both a packet-capture mechanism as well as a simple network intrusion detection system. Shadow is based on the packet-capture tool tcpdump, which it uses to retrospectively analyze each hour’s traffic for anomalies. Anomalies or events of interest are culled by running the previous hour’s collected tcpdump traffic through a series of tcpdump filters. One of the filters looks for TCP activity and selects any records that have a source IP outside of the site’s IP address space attempting to initiate connections to a destination host in the site’s IP address space. Specifically, this filter looks for the TCP SYN flag set in the TCP flag byte to signal an attempted connection initiation.
Destination port 27374 is associated with a Trojan known as SubSeven. SubSeven is a Windows-based Trojan that can allow full access to the victim’s machine. This Trojan can inform the "owning" hacker when the host is online via ICQ, IRC, or e-mail. Scans of port 27374 have been commonplace recently with the SANS Internet Storm Center reporting this as one of the most actively scanned ports as seen in Figure 1. We have witnessed a large number of port 27374 scans recently, however we had never seen a scan that generated such a large volume of traffic nor had we seen one that had come from multiple concurrent sources.
Figure 1. Internet Storm Center July 24, 2001 Most Commonly Scanned Ports
The Internet Storm Center released a report on June 26, 2001 about a Microsoft Windows worm named W32.leave.worm . The speculation is that this worm may be used as a zombie in Distributed Denial of Service (DDoS) attacks. According to the report, the worm spreads via connections to hosts listening on port 27374. The report notes that the worm scans predetermined network blocks associated with @Home and Earthlink for destination port 27374. However, it makes no mention of synchronized scanning, nor does it mention scanning of networks other than those previously mentioned. Although the described worm activity appears to be different than the activity that was witnessed at our site, it is possible that the worm activity has mutated since the initial report.
Sample of some tcpdump records
Table 1. Sample of tcpdump Records Associated with Activity
Table 1 represents a handful of tcpdump records to provide the general "flavor" of the activity. The source and destination hosts are underlined. These are the first ten records associated with the activity on June 29th; there are four different source hosts involved in scanning ten different destination hosts.
The timestamps associated with the records should be regarded with caution. The sensor that captured these records is running Redhat Linux 7.1 with a packet capturing mechanism known as turbopacket compiled into the kernel. It is supposed to contain a method for more efficient buffering, but it also appears that the timestamp precision has been lost. Timestamps should have microsecond fidelity, but these timestamps appear to have 10-ms resolution.
DDOS or Scan?
At first, it was not apparent if this was some kind of attempted DDoS or an actual coordinated scan of some sort. During the examination of the activity, we were fortunate (from the analysis perspective) to receive additional activity on July 2, 2001 at 16:00 that was remarkably similar. Individual fields found in the received packets of both sets of activity were methodically analyzed to interpret the nature and intent of the activity.
In the first scan, 132,706 total packets were sent and there were 314 unique source hosts involved. Of those hosts, only 17 (~5.4%) did not have DNS registered hostnames. In the second scan, 157,842 total packets were sent; there were 295 unique source hosts with only 24 (~8.1%) with unresolved hostnames. This alone is quite telling. Two choices for categorizing the source hosts are that they either do or do not reflect the genuine source host that is sending the traffic. If the source host reflects the actual sender, no subterfuge is used in sending the packet. If the source host is not the actual sender, a spoofed source IP number is placed in the packet.
Typically, when source IP numbers are spoofed, it is a random generation of different IP numbers in the instance of a flood. Other attacks may use a selection of one or more source IP numbers that may be either a decoy or an eventual target of some kind. When the source host reflects the true sender, the intent is more likely than not to be able to receive a response to the sent traffic.
Therefore, it appears that the activity that was seen is using genuine source IP numbers. If this were a flood and the source IP’s were spoofed using randomly generated IP numbers, it is statistically unlikely that these IP numbers would resolve to hostnames 91.9 – 94.6% of the time. It would be unusual that IP numbers would be spoofed using a predetermined set of IP numbers that resolved to hostnames since this takes a lot of effort for little or any gain.
It can be speculated that because of the sheer number of source hosts involved, they most likely represent zombie hosts that have somehow been exploited and owned.
The number of packets sent by each source host ranged anywhere from 1 to 1531 with an average of 422.63 packets for the first scan. This figure includes retries of the same source to destination IP if no response was received. The number of packets sent by each source host ranged anywhere from 1 to 1484 with an average of 535.07 packets for the second scan. The country origins of the source hosts are all over the world. Figure 2 shows the top five scanning source networks. Many of these source networks are cable modem or DSL providers. This corroborates the speculation of zombie hosts since home users are more likely to be unaware of security threats and unprotected compared to most commercial or larger networks with some kind of perimeter protection.
Figure 2. Top Five Scanning Networks
The analysis moved to examination of the destination hosts to provide more evidence of a scan. The scanned network is Class B with the possibility of 65,535 IP numbers to scan. The first set of activity scanned 32,367 unique destination hosts and the second set scanned 36,638 unique destination hosts. Superficial scrutiny of missing Class C destination subnets for both scans revealed that 21 subnets were not targeted in the first scan and 17 were missing from the second scan. There were different missing Class C destination subnets for the two scans. This is important because an initial reaction to missed subnets was that there was some prior reconnaissance performed to directly target active Class C subnets. This does not appear to be true since some of the missing subnets are live populated subnets on our network. Additionally, had there been successful prior mapping, there should be no scanning of IP numbers that are not live. Many IP numbers were scanned with no associated live hosts.
The more plausible explanation for the missing destination subnets and missing destination hosts is that perhaps the zombie or zombies that were assigned the mission of scanning those subnets were somehow not active or responsive during the scan and did not participate. A single missing destination host in an otherwise scanned subnet may be interpreted as a dropped initial packet rather than an omitted destination IP number.
Had this been a flood of activity, the attacker would likely randomize the destination IP’s so that they would be unpredictable and hence less prone to being blocked by the defending site. Figure 3 reveals that while one unique source host scanned most destination hosts, multiple source hosts scanned some destination hosts. The scanner appears to have some redundancy of scanned hosts to ensure a response.
Figure 3. Number of Unique Scanning Source Hosts per Destination Host
In the both scans, source hosts had ranges of destination hosts that they were responsible for scanning. In the first scan, they had a range of 2 – 450 destination hosts to scan with an average of 134 hosts to scan. In the second scan, the source hosts had a range of 1 – 569 destination hosts to scan with an average of 163 hosts to scan.
Another indication of a scan versus a flood was the scanning rate of the source hosts. Both scans sustained some kind of activity for five or six minutes, however the ramp-up time was fast and there was a burst of activity for the first two minutes. Figures 4.a and 4.b illustrate the packets per minute.
The measure of bandwidth consumption is as follows. Each packet was a SYN packet with TCP options and no data or payload. Most packets had a length of 48 bytes, a few had more, and few had 4 bytes less depending on the number and types of TCP options used. Packets had a standard 20 byte IP header with no IP options. Because the majority of packets had a length of 48 bytes, this was used as the packet length for the computation of bandwidth consumption. Since throughput or bandwidth is measured in bits per second, the packet length was 384 (48 * 8) bits.
The first scan reached a maximum rate of 1.7 Mbps at peak. The second scan reached a maximum rate of 2.4 Mbps at peak. This did not adversely affect us, but a site with a smaller ingress pipe such as a T-1 with 1.554 Mbps capacity may have suffered a temporary denial of service as a side effect of the scan. Figures 5.a and 5.b show the bits per second during peak scan minutes.
Figure 5.a. Bits per Second June 29
Figure 5.b. Bits per Second June 29
Looking at Figures 5.a and 5.b together, it is apparent from the general contours; the scanning rates for both scans were very similar. In fact, both scans reached peak scanning rates at exactly 21 seconds after the scan began. This obviously involved some kind of coordination by the "commander" who allocated scanning assignments and rates for the zombies.
Peak rates could have occurred because there were more scanning hosts during that second or because the number of packets sent by hosts increased. Further scrutiny of the data revealed that the peaks and valleys correlated with the number of scanning hosts. During the June 29th scan, a breakdown by second, showed that a minimum of 12 source hosts and a maximum of 222 source hosts during any second. The same breakdown for July 2nd showed a minimum of 2 source hosts and a maximum of 249 source hosts during any second.
Conversely, an examination of the number of seconds and the duration of each individual scanning host exposed that scanning hosts didn’t necessarily scan for consecutive seconds. Also, the scanning hosts didn’t scan for the same number of seconds. In the June 29th scan, any given source host scanned anywhere from 1 to 80 seconds. The July 2nd scan showed any given source host scanned from 3 to 88 seconds.
What does this all mean? It means that there was a great deal of thought applied to a scanning algorithm. The scan was not just a haphazard ordeal of scanning hosts unpredictably examining destination hosts. There appears to be a formula for the number of destination hosts assigned to a scanning host as well as an assigned time frame and frequency for each source host to scan. While it is easy to explain that source hosts are assigned a range of destination hosts to scan for redundancy and efficiency purposes, the reason for the timing of the scans is not so easy to explain.
An explanation of network latency cannot explain the time gaps in scanning by source hosts. If this were the case, the peak scan rates would not have occurred at exactly the same time. The timing parameters look more deliberate than happenstance. Yet, the benefit of such prearranged timing is not obvious.
What Kind of Hosts Involved
The assumption now is that the zombie hosts that have been "infected" with some malware that is generating the scanning activity. The question then becomes is there a specific operating system that has been exploited and transformed into a zombie for this scan? An examination of "passive fingerprints" can assist in identification of zombies’ operating systems.
Passive fingerprinting categorizes operating systems by looking at unique fields in the packets that have been sent. The notion of operating system fingerprinting has best been successfully explored and used by Fyodor, the author of the scanning tool nmap. nmap can be used to determine a remote operating system with a very high degree of accuracy not just for the operating system, but for the version as well. nmap uses nine tests to categorize the remote operating system by sending a series of normal and mutant traffic to the destination host and analyzes all responses for fingerprints that distinguish different operating systems.
Different operating systems choose unique values for certain fields such as Time to Live (TTL), TCP window size, and TCP options. There are other fields that can be examined such as the Type of Service (TOS) and the Don’t Fragment (DF) flags. But, because most operating systems use a default TOS value of 0 and set the DF flag, this may only determine the small percentage of anomalous packets sent from the more esoteric operating systems. And, these two fields are best examined in conjunction with other fields and not alone. Examination of the TOS values of the first scan showed 97.4% packets had a value of 0 and showed that 96.0% packets had the DF flag set. In the second scan, the TOS values were 0 for 97.5% of the packets and the DF flag was set for 94.9% of the packets.
The following chart in Table 2 provided by the Honeypot Project was used in determining some of the scanning hosts’ operating systems. The lines that are highlighted represent the operating system and associated fingerprints of the majority of the scanning hosts that were observed for this activity.
Table 2. Fingerprinting Values by Operating System
Arriving TTL Values
The arriving TTL values can be used to help identify the scanning host’s operating system. Different operating systems use different initial TTL values when sending a packet. Each router through which the packet travels on its journey from source to destination host examines the TTL value and decrements it by 1. This becomes an indication of the number of "hops" that the packet has traveled. It a router ever discovers a TTL of 0, it discards the packet and sends back an ICMP error message of "time exceeded in-transit" to the sending host. This informs the sending host that the packet has exceeded its welcome on the Internet. This is a mechanism that is used to discard lost packets such as ones that have become involved in a routing loop.
Initial TTLs of many operating systems have typical values of 32, 64, 128 and 255. These may be different per protocol – TCP, UDP, ICMP. For instance, Windows NT 4.0 Service Pack 6 had an initial TTL value of 128 for TCP and an initial TTL value of 32 for ICMP packets sent. Fortunately, this examination is limited to TCP so there is no need to account for protocol differences. The arriving TTL values are examined and the initial TTL values are "guessed". The caveat here is that while most operating systems will be configured to use the default initial TTL values, these can be changed. All that can be determined with absolute certainty from the arriving TTL is that it is less the initial TTL. The average hop count of a packet traveling on the Internet is 16.
Examination of Figure 6.a for Jun 29, 2001 shows that there are three clusters of arriving TTL values for the scans. More specifically, the closest scanning host appears to be eight hops away and the most distant appears to be 25 hops away from the capturing sensor interface. The assumption is that the scanning hosts had initial TTL values of 128, 64, and 32 and the arriving TTL values are associated with an initial TTL value that is greater than the initial TTL value by the least amount. For instance, if an arriving TTL is 50, it is assumed to have an initial TTL value of 64 and not 128 although either initial TTL value would be valid.
In the June 29th scan, the largest percentage of scanning hosts, 92.13, had an initial TTL of 128. Over 37% of the hosts with an initial TTL of 128 were approximately 11 –13 hops away from the sensor. According to Table 2, an initial TTL value of 128 is indicative of Windows 9x/NT/2000. An initial TTL value of 32 is Windows 9.x/NT, which comprised 2.66% of the scanning hosts. The initial TTL value of 64 is associated with many of the Unix platforms including the Linux 2.2.x kernel. The percentage of hosts with an initial TTL of 64 was 5.2%.
Examination of Figure 6.b for July 02, 2001 shows the same clustering. More specifically, the closest scanning host appeared to be 8 hops away and the most distant appeared to be 27 hops away from the capturing sensor interface.
Looking at the July 02 scan, the largest percentage of scanning hosts, 92.29, had an initial TTL of 128. Over 37% of the hosts with an initial TTL of 128 were approximately 11 –13 hops away from the sensor. 2.36% of the scanning hosts had an initial TTL of 32. Finally, 5.35% of the scanning hosts had an initial TTL of 64.
The determination from this is that the scanning hosts are not exclusively Windows hosts, but it appears that Windows hosts are the majority scanners. This means that whatever malware is exploiting the scanning hosts, it is not exclusive to Windows.
Figure 6.a. June 29, 2001 Arriving TTL Values
Figure 6.b. July 02, 2001 Arriving TTL Values
TCP Window Size
A host advertises the TCP window size when it attempts to make an initial connection. The window size is a dynamic value that changes as information is exchanged between hosts and represents the current TCP buffer size for the incoming data. This buffer allows multiple packets to be sent and queued before passing them to TCP and the application. But, the initial window size can be used to fingerprint the operating system. The user or administrator can customize this, but commonly the default is used.
The bulk of the connections had an initial window size of 8192. This is associated with Windows 9x/NT connections according to Table 2. While, the chart in Table 2 doesn’t have a window size entry for 16384, research discovered it is associated with Windows 2000. Table 2 alludes that a window size of 65535 is associated with Cisco. However, it appears that the high percentages associated with this window size would include other operating systems.
Search engines on the Internet failed to find any operating system associations with a window size of 65535. Attempts were made to examine a week’s collection of tcpdump data for our site to find hosts that had a window size of 65535. Only a dozen of approximately 5,500 hosts were found with a window size of 65535. A scan by nmap could not determine the operating systems. Some of the hosts had ports open such as 135 and 139 that would indicate Windows versions prior to Windows 2000. Others had port 445 listening, which was introduced in Windows 2000 to support Server Message Block (SMB) talking directly over TCP/IP without the need for the intermediate layer of NetBIOS over TCP/IP (NBT). Yet, other hosts with a window size of 65535 listened at ports 111 (portmapper), 515 (line printer daemon) and 6000 (X11), which are all associated with Unix hosts. No conclusions could be reached about the operating system associated with a window size of 65,535 based on these findings.
Other unique window sizes that were seen were 32120 associated with Linux, which was found in the June 29th, scan only and comprised .19% of the total scanning hosts. A window size of 8760 was seen in both scans and reflects a Solaris host. The first scan had 5.21% hosts with this window size, and the second scan had 6.60% hosts with this window size. Yet, according to Table 2, Solaris hosts have an initial TTL of 255. No packets were seen with arriving TTL values over 128. This means that either there are Solaris hosts with altered initial TTLs, or ones that traveled more than 128 hop counts (unlikely), or the window size of 8760 reflects other operating systems than Solaris.
The conclusion that can be drawn examining the TCP window size is the same as examining the arriving TTL values. Looking at Figure 7, most of the scanning hosts appear to have a window size associated with Windows, yet it also appears other operating systems other than Windows are involved in the scanning too.
Figure 7. TCP Window Sizes
Another interesting field for examination is the Maximum Segment Size (MSS), which is found in the TCP options. This represents the maximum amount of payload that a TCP segment can carry. This does not include the TCP header and the IP header. Generally speaking, the MSS is 40 bytes less than the Maximum Transmission Unit (MTU) assuming a 20 byte IP header with no IP options and a 20 byte TCP header assuming no TCP options. The MTU can then be used to determine the media on which the sending host is connected.
The MTU, and hence the MSS, may reflect the path MTU. The sender may send a "discovery" packet that looks for the smallest MTU from source to destination by setting the DF flag on the packet. If no ICMP error messages are returned, it is assumed that the using the size of the local MTU for packaging packets will not cause fragmentation. If an ICMP error message "unreachable – need to frag (mtu ##)" is returned, it contains the MTU size (##) of the link that is smaller than the size of the local MTU. The sender can decrease the size of the packets to avoid fragmentation. The point is that it is possible that the MSS may not reflect the local MTU. However, since there is no indication of discovery packets or that path MTU was used, the assumption is that the MSS does reflect the local MTU.
Figure 8 reveals that the greatest percentage of scanning hosts resided on a link with an MTU of 1500. This is indicative of Ethernet found in LAN connections or DSL. The MTU of 576 is associated with PPP or ISDN. Finally, the MTU of 1454 is associated with PPP over Ethernet that is also found on DSL connections.
Figure 8. Percentage of Scanners With MSS/MTU Values
Although the MSS of 536 is associated with PPP and dial-up modems, it is supposed that most of the hosts reside on ISDN, which uses the same MSS. The scenario is that these are all zombie hosts that are directed to do some type of activity at a given time. Either they respond to a catalyst or they all have some kind of time synchronization and are directed to respond at a given time.
The idea of participants from dial-up modems is worth some reflection. First, if a zombie is associated with a dial-up connection, this may not be a sustained connection unless there is some kind of dedicated phone line for the traffic. Additionally, many dial-up connections are at the mercy of Dynamic Host Configuration Protocol (DHCP) with a leased IP number for a certain period of time. How would the "commander" direct a zombie with a changing IP number to launch the activity? One guess is that the zombies report home to the "commander" periodically. Therefore, only ones that are active and online just before the attack are directed to participate in the attack.
Another question arises from this discussion. It has already been determined that zombies have assignments of mostly unique address ranges to scan. Is there some kind of formula used to assign the addresses ranges to scan so that the maximum numbers of hosts get scanned?
The suspicion is that most of the participating zombies have a sustained and dedicated Internet connection. This doesn’t adequately explain the missing destination hosts and subnets.
In TCP, when a source host attempts to connect to a destination host and is unsuccessful yet gets no indication of the failure, it will attempt one or more retries. A source host will not be notified of a failure if the connection packet never gets to the destination or the destination host’s response doesn’t get back to the source. In the case of our scanned network, most of the activity to port 27374 was blocked. Yet, the firewall that blocks the activity "silently" drops the packet with no notification in the form of an ICMP error message to the original source host that there is a problem. The purpose of the "silent" drop is so that no additional reconnaissance is disseminated about our network perimeter and defense.
If a packet-filtering device blocks traffic and has not been silenced from reporting ICMP error messages, it will deliver an ICMP "unreachable - admin prohibited" message back to the sending host. This ICMP error message contains enough information for a savvy scanner to determine exactly what traffic the packet-filtering device is blocking.
For the purposes of this investigation, a TCP retry is defined as one that has the same source and destination hosts, ports and TCP sequence numbers as the initial attempt. The number of successive retries and the back-off time between retries is TCP/IP stack dependent.
Retries are associated with source code that uses socket connections. In other words, the source code is written so that the socket calls go through the proper layers of the TCP/IP stack. In this case, the socket uses the TCP and IP layers to form the appropriate headers and values for those headers.
The alternative is known as a raw socket, which does not use the TCP/IP stack to form the packet. Instead, the programmer is responsible for supplying the appropriate headers and data. This packet is written directly to the network interface card. Many scanners such as nmap and hping2 use raw sockets.
This scan manifested multiple retries when the destination host was unresponsive. What does this mean that regular and not raw sockets were used? First, the scanning host really wanted to maximize the opportunity to elicit a response from the destination host. Second, raw sockets require an additional level of complication since they require the installation of an application programming interface for packet capture on the scanning host – either winpcap for Windows or libpcap for Unix. The use of standard sockets simplifies the set-up required to scan.
The determination is that this was a very efficient scan looking for hosts listening on TCP port 27374. The scan was conducted by zombie hosts, which were mostly Windows hosts. It appears that hosts with other operating systems were involved, yet they played only a small part in the percentage of scanning hosts. The significance of this is that the means of infection of the zombie hosts does not appear to be Windows-specific. It is unknown whether the percentage of Windows-based scanning hosts and the percentage of scanning hosts that have other operating systems actually mirrors the percentage of Windows versus all other operating systems that are found on the Internet. The implication here would be that the operating systems of the zombie hosts may be consistent with a normal distribution found on the Internet.
Is the sole purpose of this scan to efficiently identify hosts listening on port 27374? It can be surmised that not all of the zombie hosts were exploited by the SubSeven Trojan. SubSeven is a Windows-based Trojan and it appeared that not all the zombie hosts are Windows. Perhaps there are SubSeven Trojans that have been developed for other operating systems as well. Whatever the exploit used to "own" the zombies, the "commander" knew about the owned zombie hosts and had no need to scan to find them. Is it possible that this scan search was to find other candidate zombies "owned" by another "commander"? This assumes that these new zombie hosts would be Windows-based since they would be listening at the SubSeven port. The new zombies may be used for other activity other than the scanning that was witnessed at our site.
Whatever the purpose of this scan, it looks like a pretty sophisticated way to maximize a scan. In a couple of minutes, over 30,000 destination hosts were scanned. This activity demonstrates the sophistication evolving in zombie activity and it also shows the burgeoning number of exploited hosts that can be marshaled into active duty.
Appendix A. Summary of Activity by Date
Table 4. Summary of Activity
 Intrusion Detection FAQ – SubSeven Trojan v 1.1 http://www.sans.org/resources/idfaq/subseven.php
 w32.LeaveWorm http://www.incidents.org/react/w32leaveworm.php
 W. Richard Stevens, TCP/IP Illustrated, Volume 1, Addison-Wesley, 1994
 "Assessing average hop count of a wide area Internet packet" http://www.nlanr.net/NA/Learn/wingspan.html
 Description of Windows 2000 TCP http://support.microsoft.com/support/kb/articles/Q224/8/29.ASP