Thought Leaders

Table of Contents

Andrew Hay, Q1 Labs

Stephen Northcutt - May 13th, 2008

Andrew Hay, one of the authors of the popular OSSEC Host-Based Intrusion Detection Guide and upcoming Nagios 3 Enterprise Network Monitoring book has agreed to be interviewed for the SANS Security Thought Leader series, and we certainly thank him for his time.

Andrew, how did you first get into security, what was the driver?

I was one of the many people laid off from Nortel Networks in Ottawa, Ontario, Canada. I must have applied to at least 500 different jobs in various locations throughout Canada, the United States, Europe, and Australia. No one wanted me. The problem with being laid off by Nortel is that, typically, you’re not the only person. In fact, I was one of a few thousand people laid off, all looking for the same (any) job.

I received a call from a company who was contracted, by Nokia, to find some people to work front line firewall and network support. I jumped at the opportunity and within a week I was working as a contractor at Nokia. Since I had little security experience there was a steep learning curve but Nokia provided exceptional training for both Nokia IPSO (the routing platform), Nokia IP Series appliances (their hardware), and Check Point VPN-1/Firewall-1 (the bundled firewall package).

While working at Nokia, I made a point of learning everything I could about the products I supported. I also ensured that I obtained the certifications for the training I received in order to make myself stand out from the rest of my coworkers. Within 8 months, a record at the time I might add, I was hired by Nokia to work full-time. Even though I was hired into the job I made sure not to stop learning. I felt my routing and switching knowledge was weak, so I paid -- out of pocket -- for a CCNA prep-course, and subsequent exam. Customers were calling in having problems with their Cisco to Check Point VPN’s, so I bought books on Cisco PIX and Cisco VPN Concentrators and learned how to troubleshoot VPN related issues.

By this time, I was hooked on security. At first, I tried to read as much as I could on security topics to make me better at my job. The more I read the more I realized that I was genuinely interested in all facets of security, even those that didn’t relate directly to my current role. I started teaching a CompTIA Security+ prep-course, based on my own course content, through a local business to give back to the community. The funny thing was that most of my students were in the same boat that I was in before getting hired at Nokia.

I also started doing some consulting on the side for Cisco and Check Point issues. This helped me learn quite a bit about working with government organizations and subcontracting through larger, more established consultancy firms. In 2004, after speaking with two friends at Nokia, we decided to form a business to help add credibility to our consulting engagements and help limit the taxes that could be taken from us. This is how Koteas Corporation was formed. Even though we didn’t perform a large volume of work due to our full-time jobs, our customers consistency returned to us when they need help or advice.

Awesome, and what is your primary focus today?

In February 2006, I accepted a third-level support role at Q1 Labs Inc. in Fredericton, New Brunswick, Canada. I came in as the "security guy" and was again faced with another steep learning curve. This time, however, I had to learn about intrusion detection and network attacks as it related to QRadar, our Network Security Management (NSM) solution. In the first year, I went home with a head full of new and exciting security information every day. So much information, in fact, that often I would lay awake at night trying to finish processing everything thing I learned.

In my second year I was offered the opportunity to lead the team of software engineers who were responsible for integrating event and vulnerability information into QRadar. I jumped at the chance. In three months, I learned more about how applications and platforms than I ever knew before.

Currently, I am in my third year at Q1 Labs and am the Integration Services Program Manager. My day-to-day duties involve researching new vulnerability scanner, patch management, and application/platform event-emitting technologies. I can honestly say that I love getting up in the morning and arriving at work. I bet not everyone can say that about his or her job.

I think this is a wonderful time to be interested in intrusion detection; a lot of folks have turned a blind eye, even though as Dr. Eric Cole puts it, effective configuration is ideal, but detection is a must. I guess the biggest focus today is on botnets, can you share a bit about the techniques one uses to detect botnets?

When asked by Warren Lee for my take on the December 2007 "Bot Roast" arrests by the FBI [1] I told him what I told everyone: botnets can only be effectively detected by using advanced flow and log correlation. A lot of people might disagree with me on that statement but I stand behind it.

OK, what is the first reason you feel botnets can only be detected using flow and log correlation?

If you rely purely on flow technology you will have to deploy your flow collectors at every network intersection so that every flow can be recorded and subsequently inspected. I'm not saying that this is a bad thing to do...far from it. In fact I think the world would be a better place if everyone could monitor every flow from every part of the network. The reality, however, is that deploying flow collectors to every corner of your network might not be financially, or architecturally, feasible. Do your switches support SPAN capabilities so that you can passively monitor traffic on a specific port or VLAN? Do you have room in the rack for another system? Do you have budget to deploy flow collectors to all of your office locations or just in the main office?

Sure thing, actually this is the intrusion detection problem as well, the most valuable real estate in a data center seems to be monitoring locations, so by flow, just to be sure, you are talking about NetFlow, yes?

Well NetFlow is one well-known "flow" technology, but when I refer to a "flow" I’m generically talking about a full communication session between two specific IPs on your network. A full, bi-directional flow will show you exactly what is happening between the two systems. How cool is it that you’re able to read the email of the employee sending your confidential information to a competitor? That’s a rhetorical’s really cool!

However, if deploying multiple flow collectors isn’t an option, you could use a technology like NetFlow, sFlow, jFlow, cFlow to report flows seen by your routers and switches. Unfortunately using these flow technologies, which only gives you the source IP, destination IP, source port, destination port, and protocol information, just won't cut it. To really see what is going on you need to be able to inspect the fully recombined session, including the application layer information, to accurately see what is happening between the hosts involved.

OK, in the past few years a number of vendors have added flow analysis to their products, or so they say. Can you give us a war story, an example of an attack, or some botnet activity that can be detected by flow analysis?

One of the best examples, where flow analysis shines, is in the case of network based information theft. I recall one instance where a State Government Agency, using network flows, was able to detect the download of sensitive utility information to a location in Syria. Incorrect firewall policy implementation allowed the breach to occur but, due to their ability to analyze the network flows, they were able to see exactly what was being accessed, and subsequently stolen.

Another example would be the detection of policy violations, in the form of the transfer of inappropriate materials, across the company network. In this instance, a member of the IT staff was violating corporate policy by streaming inappropriate material from the server room. The security staff was able to reconstruct the inappropriate stream from the collected flow information (much to their dismay) and use it as evidence related to the policy violation.

Thanks Andrew, I am sure that is compelling enough for us to join the camp of believers in flow analysis. So you say flow analysis and logs, can you bring logs into the discussion?

Logs are great. I love logs almost as much as Dr. Anton Chuvakin (well maybe not that much). In fact we talk about logs, logging, and log analysis on a daily basis, blog about logging on our respective blogs, and even share a blog on the topic with some other industry giants who feel the same way (

Oh my! Forgive an off topic moment please. As you know Anton works for LogLogic and they had a promo piece that was a bumper sticker that says "I love logs" where love is the heart symbol. Somehow, I ended up with one of them. I live on Kauai, very nice, but the ocean is very dangerous here, we had three drownings just this week. So there is one pretty safe place to swim called Lydgate park. The have this big ring of rocks to create a safe pool. Tourists and locals can still swim in the ocean, but no sharks, currents or other serious issues and there is even a lifeguard tower. Only problem, it is near the mouth of one of the biggest river valleys, when it rains hard, logs flow down the river out to the ocean and the waves push them over the rock wall into the pond. Lydgate literally fills up. Then volunteers get together and drag them out so we can swim again. One of the primary volunteers is a guy named Robin. The last time we did a big haul-a-thon, I gave him the bumper sticker. I can't say he was totally pleased with the gag gift. Anyway, anytime you say I love logs, you can now imagine what I am thinking . . . wait till you drag a few wet sandy ones out of the pond, then we can talk about love. But enough of that, back to detection. What do you do with the logs?

If you take in logs from your applications and hosts you will be able to see how the worm or bot is interacting with the infected or compromised system. Is the malware trying to enumerate remote Windows shares? Has the malware installed or started a backdoor application to allow remote access? Has the malware been designed to attack the core network infrastructure (i.e. routers, switches, etc.) once it has found a host to stage from?

The problem, however, is that relying only on logs from your applications, hosts, and devices might produce an unmanageable amount of logs. Not only will the number of logs be hard to manage by a human analyst but the variations in the generated logs are almost enough to drive you crazy!

A firewall accept log from a Cisco PIX firewall via UDP syslog protocol won't look like a firewall accept log from a Check Point Firewall-1 firewall via the OPSEC LEA protocol. The logs generated from a successful authentication to a Windows XP Professional workstation won't look anything like the logs generated from a successful authentication to a Red Hat Enterprise Linux 4 server.

By the time you figure out what is happening, based on the logs received, the worm or bot could have spread exponentially throughout your network.

Well now that is cheery Andrew! I am spending a lot of time talking with security thought leaders in the SIEM space. They are all convinced they are king of the road, the center of the IT shop in a year or two. They tell me how their devices will aid detection. What is your real world take on this with the current state of SIEM?

I have this conversation with someone at least once a week and I’ll tell you what I tell them: John Heywood said it best, "Many hands make light work."

Think about it, the more "people" you have witnessing a crime the better chance of catching the criminal in the act. The "people" in this case are your intrusion analysts, your security operations staff, and your SIEM solution. We live in a 24/7/365 world and humans are prone to error, need sleep, need to eat, and cannot be tied to a screen for extended periods of time and maintain a current state of alertness. SIEM solutions are a "helping-hand" for security operations staff. False positives can be reduced, terabytes of data can be combed through automatically, and all security alerting can be centralized into one location. SIEM solutions also provide a unique view of the network and of the security devices that are in place to protect your network.

The typical follow-up question I get is "Are SIEM solutions perfect and do they make errors if simply plugged into a network without being tuned?" Of course they’re not perfect but then again, is your firewall perfect? How about your NIDS? If they were perfect then everyone would have one already and the technology would have been perfected 10 years ago. As for the "do they make errors" part of the question, I think you’ll agree Stephen, if you put any security solution on your network and say "<POOF> protect me," you’re asking for all kinds of trouble.

So you are pushing for flow analysis AND log analysis right, give us some words about that?

If you can't afford, or if it is architecturally impossible, to deploy flow collectors at every network intersection then why not plan on deploying your collectors at the major network connection points (external network, DMZ, internal network, sensitive server network, etc.). If your network devices are capable of sending flow information enable it so that some flow tracking can be performed in remote or non-critical network devices. Log everywhere and log often. Enable logging on all of your hosts and devices so that you can see exactly how malware, or regular users for that matter, are interacting with the resources.

Taking flows and logs into a central location and generating alerts from the combined data really helps with the hair-pulling frustration of juggling multiple information repositories (I for one cannot afford to lose any more hair than I've already lost). The two sources give each other context. Your logs from your firewall may indicate a certain type of security threat against a network asset while your flow data can validate the existence fo the asset, whether it’s vulnerable, and how valuable that asset is to you.

OK, can you manage an example; can you demonstrate this with a flow and a log example?

The best example is the compromise of a host, or hosts, for the purpose of creating another bot. In Figure 1.2 you see a fairly typical network configuration. Our attacker (1), who we’ll call Number One, attempts to compromise a specific Host (4) on the ACME network using a zero-day web exploit. His purpose is to compromise a new system to include in his botnet (5).

For Number One to reach his target he must cross over the Internet, past the ACME core router (2), through the ACME Firewall (3), and into the DMZ where the DMZ Host (4) resides.

[Figure 1.2 - The Field of Battle]

Let’s say that Number One was able to successfully compromise the DMZ Host (4) and install his malicious payload. Immediately after installation, the infected host communicates with its "master" system and joins Number One’s botnet (5).

Now what would the various points seen during this transaction...

ACME Router (2) - Would have seen the traffic from Number One (1) destined for the DMZ host (4) and any responses from the DMZ Host (4) or ACME firewall (3). If we were exporting flow information from this router (e.g. NetFlow) we would be able to reconstruct the full session and compare it to the logs received from the other points. Note - we would not see any payload in the flow records because NetFlow doesn’t contain a full payload as it operates on layer 3.

ACME Firewall (3) - Would have logged the conversation between Number One (1) and the DMZ Host (4). Any accepts, denies, blocks would have been logged.

DMZ Host (4) - Several things...
  • Host Firewall (if present) - would have logged the conversation between itself and Number One (1) including any accepts, denies, or blocks.
  • Host IDS (if present) - may have logged the installation of new or replacement of existing files. May also have logged the start of a new service or open port.
  • Web Application Logs - would have logged Number One’s (1) interaction with the webserver on the DMZ Host (4). May have logged the malicious interaction that resulted in the compromise of the DMZ Host (4).
  • Host Logs - may have logged the start of a new service, the installation of a new application, etc.
  • Anti-Malware/Virus - may have detected/stopped the installation of the malicious application
Based on the above example, the only point that could tell us exactly what attack was being attempted would be the end host (which obviously isn’t ideal). Since the flow information from the ACME Router (2) doesn’t show us the payload of the communications between Number One (1) and the DMZ Host (4) all we can confirm is that a communication took place. This isn’t really a bad thing because at least we can confirm the source/destination of the attack and the source/destination of the bot/botnet.

However, if we were able to place an additional Flow Collector (6) that was able to look at layers 1-7, see Figure 1.3, we could actually reconstruct the attack before it hits the DMZ Host (4) and perhaps instruct the ACME Firewall (3) or ACME Router (2) to block the communications from Number One (1).

[Figure 1.3 - Calling in Reinforcements]

So, assuming you are a true believer in flow and log analysis, I know you work for a security company, what do they do? Do they implement your passion in their product?

At Q1 Labs we have developed a network security management (NSM) solution called QRadar that provides a fantastic set of network security management services including: log management, threat management, and compliance management. We follow a strategic principle of "Complete Network and Security Knowledge Delivered Simply for Any Customer."

"QRadar provides numerous advantages over other security management solutions because it includes an intuitive centralized command and control console, network, security, application, & identity awareness, advanced threat and security incident detection, and scalable distributed log collection and archive capabilities."

A large part of my job is to investigate how to incorporate the information collected by other products into QRadar for correlation with flows and external log sources. This information, be it patch information from a patch management solution, discovered vulnerabilities from a vulnerability scanner, or authentication logs from a critical database server, can be used to generate meaningful alerts for incident handlers to investigate. The developers take this information and build the integration points between the products so it feels like a little part of me is in each integration mechanism we create.ate.

Nice, tell us a bit about the OSSEC book project, what ever convinced you to write a book that is a lot of hard work? Oh yeah, and where is my signed copy?

It’s in the mail, I swear! Seriously though, I wanted to take a minute to thank you for agreeing to write the forward for our little book.

The OSSEC book came to be due to a serious lack of documentation on how to install, configure, and operate the OSSEC HIDS. The creator and primary developer, Daniel Cid, also works at Q1 Labs. Like many developers, documentation is typically an afterthought (or non-thought in some cases). I liked the product but wanted to know more. Unfortunately most of the advanced information was locked away in Daniel’s head. A light went off in my head and I thought, "Hey, we should write a book."

My buddy Harlan Carvey, author of Windows Forensic Analysis, introduced me to his publisher at Elsevier and I pitched the idea. They loved the idea and we - me, Daniel Cid, and Rory Bray (another coworker) - started writing the book.

I can honestly say it was an interesting experience but also a personally rewarding one. I liked it so much that I offered to contribute to the Nagios 3 Enterprise Network Monitoring book and am in talks for a couple of other future titles.

Where do the like-minded believers in logs get together? Do you have a conference, a yearly meeting, a blog? Please let me know, I would like to invite my friend Robin! *grin*

Honestly, there aren’t a lot of places. The Log Analysis mailing list [2] has traditionally been the place to discuss log-related information. There is also the SANS WhatWorks in Log Management Summit that I have heard quite a few good things about. *wink, wink*

Recently, the MIS Training Institute’s Log Management Summit was co-located with InfoSec World 2008 and, as Anton told me while attending, everyone was there but me...I don’t know why I talk to that guy. *smile*

So what are you doing with SANS, are you doing any teaching and writing, I do not recall seeing your name fly by?

Actually, I’ve been piloting the SANS Security Essentials Review course (SEC401R) for those who need some additional review prior to sitting for the SEC401 exam. I’ve run two sessions so far and they’ve been well received.

Throughout 2008, I’ll be teaching the SANS Training for the CompTIA Security+ Certification course in various locations.

I’d like to be more involved in the courseware development at SANS and have tossed some ideas around about creating a couple of new SANS courses but those are still a "work in progress." I’d also like to present at some of the bigger SANS conferences and am constantly working to prepare myself for the day that I’m "called up to the Majors."

Great, you clearly are a guy with a lot to share, what outlets are there to learn more about what Andrew Hay thinks? Do you have a blog? What is your major focus, surely not just repeating that you love logs, Anton has that covered I think :)

Actually, I do have a blog. I tend to post my broad security-related thoughts at my personal blog ( and even produce a Suggested Blog Reading post where I mention interesting posts or news that I found from around the blogospohere on a weekly basis. However, for more log-centric posts, I tend to post them over at the Log Analysis Professionals blog.[3]

I’m also a regular contributor to the Security Catalyst Community[4] where people get together to talk security and help people with security related questions. I encourage everyone to check it out.

I recently jumped on the Twitter bandwagon as well. If you want to follow my tweets you can subscribe to @andrewsmhay with your favorite Twitter client.

Any other book projects? Upcoming movies? Broadway appearances?

Well, I don’t think I’ve got the moves or voice for Broadway but I do have a few projects that I’m working on.

As Peter Giannoulis mentioned in his recent interview, I’m involved in The Academy with Peter, Adam Winnington, and Jason Ingram. Our goal is to create a new way for everyone to learn about security products and tools because, lets face it, some people learn better visually.

I’ve signed on to write the Nokia Firewall, VPN, and IPSO Configuration Guide (ISBN 9781597492867) with Peter Giannoulis and my lovely wife Keli Hay. I’ve also been tossing around the idea of writing a fictional novel about an elite group of hackers who work for the government to electronically soften their targets prior to further efforts. Think Tom Clancy’s "Net Force" meets James Bond after being trained by the guys on the TV show "The Unit".

I’m also working on some more log-oriented training materials that focus on log analysis and log management. Anton and I have also come up with a plan to present some fun/crazy/interesting log-related presentations for conferences. I can’t let the cat out of the bag just yet, but the first one is about making logs and logging "sexy" again.

There are always more books to be written on various subjects so I’ll probably sit down to brainstorm some ideas for books that just aren’t out there yet. I’m open to ideas by the way.