Thought Leaders

Table of Contents


An Interview with David Hoelzer, author of DAD, a log aggregator

Stephen Northcutt - May 1st, 2007


The advantages of log management extend well beyond security to system health monitoring, forensics, regulatory compliance and marketing. Log monitoring that detects system problems early can affect the bottom line by minimizing overtime and outages due to system failures. Legal fees can be reduced by having solid forensic evidence to support business decisions. Marketing groups can gain insight into what products people are interested in on a web site. Regulatory compliance is a relatively new reason for log management that has become a necessity with SOX, HIPAA and other regulations. The market for the Global 2000 was based an average of $173,000 spent annually for the Fortune 2000 companies that were surveyed but $346 million from the Fortune 2000 segment. These trends have helped spawn a $380 million log management market, $346 million from the Global 2000.[1]

However, the trick is finding the right tools for your environment. Many of the early adopters have taken out their original solution and are in their second or third implementations. One company, unhappy with their commercial solutions, commissioned a homegrown Windows log aggregation and analysis tool. These folks are more serious about log analysis than most organizations. They were using NetIQ and Microsoft Operations Manager, called MOM, and simply needed additional performance and features. The answer to their problem was DAD. The system, which is running right now, is currently aggregating logs from some 50+ servers to the tune of more than 20 million events over the past week or so, and it's still pretty responsive. DAD is a Windows event log and syslog management tool that allows you to aggregate logs from hundreds to thousands of systems in real time. DAD requires no agents on the servers or workstations. Correlation and analysis is driven through a web front end.[2]

We were fortunate enough to conduct an email interview with the author of DAD, David Hoelzer, who is also a SANS Instructor[3] and lead author for Audit 507, Auditing Networks, Perimeters Systems[4] which can lead to the GIAC GSNA[5] certification.

So David, what is DAD?
DAD is a more or less feature complete alpha of an enterprise class log aggregation and reporting tool. In the next few weeks I would like to see more people take it out for a spin so that we can iron out any big wrinkles before we lock down the current version as a stable release and turn our attention to documentation and the like. One of the most attractive features of DAD is that it is completely agentless.

What drove you to create the tool, that is a lot of work, isn't it?
Aside from the original customer who had a need for higher performance as you mentioned in your opening Stephen, students who have sat through a class where I talked about Windows log management know that I have said for years,"You could just..." Some time ago one of my customers said, "Let's see you just..." and this product was born. It is released under GPLv2,[6] FOSS (Free Open Source Software)[7] project. DAD allows you to aggregate logs from all of your major systems, most notably Windows systems which are notoriously difficult when it comes to effective log aggregation, into a central repository.

Will there also be a better commercial version?
We are committed to DAD remaining available under GPLv2, or a similar OSI approved license. We are willing to state publicly that this will never be changed to a commercial product. We will also state that this will not spawn a "professional" version that is both commercial and better, faster or stronger. All portions of the main DAD tree will always remain free open source software.

Where can I get code to build DAD?
If you are looking for a solution to do agentless Windows, syslog and any arbitrary text based log aggregation from your enterprise, please stop by the Sourceforge page (http://www.sourceforge.com/projects/lassie) and grab a copy. The installation documentation for the product can be found in the "DAD\Source" directory. We very strongly recommend that you actually read the directions in order to have a relatively pain free installation process.

What would you say is the most difficult part of using DAD?
In some ways, the installation of DAD is the most difficult part of the entire process. We have provided a very detailed installation document. Typically, if the directions are followed step by step, the installation can be pretty painless. Most times that we see people run into trouble is when they try to install DAD without looking at the directions.

So, what you are saying, David, is that I would need to read the directions?
Yup, I am saying that you would need to read the directions!

What is it exactly that makes DAD useful? Don't we already have log aggregation tools?
That's a good question. It is true that there are a large number of log aggregation tools available today. One of the things that truly distinguishes DAD, especially in a Windows context, is that matter of being agentless. Many administrators are gun-shy of agents, probably for good reason. We are using the native Windows interface to extract the logs remotely in near real time. In addition to aggregating your logs, DAD provides convenient search facilities for the data collected. Using DAD, you can also specify correlation alerts and simple alerts to take virtually any action that you would like to respond to or inform administrators about the event.

So DAD is really a Windows log management tool then?
DAD includes a syslog engine, allowing the collection from any syslog source. Also included with DAD is a regular expression based log carving interface that makes it possible to carve up syslog (or any other arbitrary log format) into appropriate pieces to store into the database. This means that with some effort it is actually possible to correlate and alert on the occurrence of a Windows event when it is seen in conjunction with a certain IDS alert or a series of router log events, etc.

Regular expressions could scare some people away. How hard is it to use this log carver?
First, let's be clear that you don't need to know any regular expressions to handle Windows logs. With that said, it is true that the log carving interface does require some knowledge of regular expressions to use effectively. The interface allows you to specify as many selection rules and carving rules as are necessary to manage your logs. To ease creation of these types of rules, the interface allows you to paste in sections of logs and test out your expressions. This will show you exactly how your current rules will cut the data up into database fields. It also provides a detailed explanation of exactly what your regular expression means. which is extremely valuable when trying to troubleshoot a regular expression.[8]
Editor's note: SANS offers a class in regular expressions, Introduction to Using Regular Expressions, and, once you have learned them, they are incredibly useful to find and distill information from any text source, not just logs.[9]

With DAD gathering so much information from your environment, it seems like there is potential for the information to be misused. What sort of security controls are built into DAD to protect this data?
DAD was designed with security in mind. For instance, Role Based Access Control (RBAC)[10] is applied throughout the interface and access to all functions is completely mediated based on the roles. This includes the ability to apply roles to the queries that can be run against the database as well. Another security feature is that DAD does not require domain administrator rights in your Windows domain. The only right required is "Manage and audit security logs" which can be configured using group policy. Even so, we want to be very clear that DAD is currently released as alpha software, so we're not guaranteeing that we haven't missed anything at this point. Also, there are some features in DAD that are inherently unsafe. For instance, one of the interfaces actually permits you to run raw SQL queries against the database unchecked. For this reason, it is important to consider who would be given which roles in your environment. Clearly, the ability to create new queries, design log carving rules, create users, etc. would be limited to a few trusted individuals.

How well does DAD scale?
We are happy to report that one of our beta test sites has recently passed the 1.2 billion event mark. These folks log just about everything that happens in their domain except process tracking. To be truthful, they actually do monitor process tracking as well, but we filter out that information on its way into DAD. Their DAD installation is currently monitoring something on the order of 250 servers and a significant number of workstations. The hardware driving their DAD platform is currently a dual Xeon (single cores) at 2 Ghz and 4 gigabytes of RAM. Under that configuration, the only time that DAD falls behind by about 15 minutes is during the first half hour of the day when everyone in their domain logs in. Even here, with more auditing enabled than you will find in most environments, only the aggregation from the global catalog servers falls behind in the domain. To give an idea of how many events we're talking about, their 32 megabyte security log on the global catalog servers actually rolls over something like four times between 8am and 8:30am every day. This is a very long answer to say that, essentially, DAD scales very well. It has a modular design that allows the pieces to be broken apart for very large scale installations but even on a single box it can handle an enterprise environment.


1. The Log Management Industry, Northcutt, Ong, Shackleford and Shenk based on 2005 data
2. http://sourceforge.net/projects/lassie
3. http://www.sans.org/training/instructors.php#Hoelzer
4. http://www.sans.org/training/description.php?tid=418
5. http://www.giac.org/certifications/security/gsna.php
6. http://www.gnu.org/copyleft/gpl.html
7. http://en.wikipedia.org/wiki/FOSS
8. Email David to Stephen April 30 supplying definition of a log carver
9. http://www.sans.org/training/description.php?tid=552
10. http://www.sans.edu/resources/securitylab/311.php

Appendix: Definition of a Log Carver
Log carver is a regular expression based engine for chopping any arbitrary log format that you might have in your enterprise into something that DAD can digest. The primary use in the tool as it comes is for syslog messages, but we chose to leave the interface open so that you could actually point it at absolutely any kind of log that you might have.