IT
Security administrators are
always being pulled in two directions - on one hand you have the wish
to
implement bullet-proof system security. On the other hand, system
administrators have kitted out a computer to do a particular job.
Anything that
doesn't contribute directly to the 'core mission' of the system, needs
to be as
resource-efficient as possible, or you'll start seeing the admin team
knocking
at your door for taking up too much of their bandwidth, CPU or disk
space. IT
Security specialists always have to compromise, balancing the need for
security, against the need for an organization to get the work done;
the word
'compromise' has needless negative connotations though; so let's use an
acceptable industry phrase instead: "a risk based approach".
Regardless of whether we're using
As log volumes grew, the database storage paradigm was just
not working well
enough for us. CPU speed increases were significantly outpacing the
increase in
speed of storage, and it made a lot of sense for us to use that
additional
latent power to make queries faster. We've moved our Snare Servers to a
system
where we compress log data - sometimes at huge compression ratios. This
shifts
some of the burden of accessing log data, away from our (relatively)
slow disk,
and up to our (faster, and faster)
Small tweaks can also go a long way to improving the speed at which the end user perceives the query is running. Sitting down with people who use log data day-in/day-out can sometimes result in 'light-bulb' moments that can make a big difference in how usable your query tool is; little things like 'query caching', for example, can go a long way. Quite often, queries such as "tell me who has logged in over the course of the last 7 days" will be run fairly regularly - sometimes, on a daily basis. This means that on the second day a query is run, 6 days worth of valid log data has already been processed; if we had saved off the results of the previous query somehow, then we have to do one seventh of the hard work. Similarly, when our expert users are doing forensics work, they quite often 'gradually narrow' their query to reduce the false positives, based on the results they're seeing. If they have to wait an hour for the complete objective to be processed, before they see the first results, that's a lot of wasted time if they just have to tweak the query a little more. If we can return at least SOME results very quickly back to the user (even if they're unsorted), then the user can stop the query from running, go back to the objective, narrow the results, and re-run the query. Although the total time that it would take to run a complete query might not have changed at all, the actual speed at which the user is able to work increases significantly.
So, I guess, the first precondition to significantly improving query performance these days is the ability to recognize opportunities presented by our ever-changing computing ecosystem, rather than seeing the changes as a series of hurdles that require clawing ever decreasing performance boosts out of an existing, rigid code structure. Who knows where we'll find our next speed jump - maybe utilizing the graphics processing unit for some limited speed-critical tasks? Perhaps by using a neural network at the front end to cull obviously irrelevant events? The second is listening to, and understanding, how people use the system. Following your users' work-flow, rather than forcing them to follow a non-optimal
Figure:
On the topic of visualization, I previously highlighted that
visualization
depends a great deal on your target audience. In the past, IT security
has been
a very centralized, controlled function, usually with some very
technically
proficient people; you could get away with down-and-dirty
representations of
logging information, with a reasonable level of confidence that the
security
administrators would be able to skim the data for useful highlights.
Data owners
were represented by proxy only - they would rarely see the output of
security
tools, and were almost never provided with access to the application
that did
most of the analysis. This strategy is a bit inefficient, when you're
not
taking advantage of resources that have the most to lose out of a
potential
breach, and have the most knowledge about how the information being
protected
should be limited. I'm not really talking about operating-system level
logging
here, that probably needs to remain in the 'specialist' category, in
most
circumstances; but for the security of data store, or membership of
groups that
have access to sensitive information, data owners are a spectacularly
good
source of corporate knowledge. Currently regulatory requirements such
as