About a year ago, I needed to add an Apache log to a supertimeline I was working on. I wrote a bash script to do this, as I was not familiar with perl at the time. I later went back and learned some basics of perl and converted it to my first log2tlimeline plugin. Since then, I'm wrapping up my third plugin.
Before you begin writing your plugin, in addition to this post, it's best to look through the gold paper Kristinn Gudjonsson wrote. This will give you a good understanding of how the tool works and should answer many of your questions about the architecture. In this post, I'm covering how to create a OSX PLIST plugin for the tool, but the technique is the same for most files you'll want to parse.
Getting Started
When writing the plugin, it is important to understand the file you are parsing. You should understand all the different conditions that may generate different results in the file. A couple of different ways to find this out is: review source code for the program, look for open source tools that parse the file already, and generate your own output from the program while trying to replicate all options that will generate a log.
In the Download.plist, I've found four different conditions: normal completed download, canceled download, file was downloaded and deleted, and file downloaded outside the user directory. If the file you are parsing exists on multiple platforms, then you'll need to check the format for each OS.
When I start working on a new plugin, I create at least 2 scripts. One I call the master script, this is in the proper log2timeline format. The other is a scratch script for all my testing. I find it much easier to troubleshoot basic perl problems using this rather then troubleshooting through log2timeline. I test the code logic in the scratch file, then move it to the final script. Below is the initial scratch script I start with for each section of code I'm testing.
#!/usr/bin/perl
$file = $ARGV[0];
open (<INPUT_FILE>, $file)
or die "Could not open file";
while (<INPUT_FILE>, $file){
stuff
}
close(INPUT_FILE);
The script above takes a filename as an argument, opens the file and does something with each line in the while loop. Then closes the file. Replace the "stuff" with the code you want to test on the file contents.
Step 1 Copy a template
The default install path for log2timeline is as follows:
OSX /opt/local/lib/perl5/site_perl/5.12.3/Log2t/
Ubuntu /usr/share/perl5/Log2t/
Copy a template to base your plug-in on.
- The author has created a template file that is located in the source directory dev/template_for_input_logfile.pm.
- If there is already a parser for a similar file type, I would start with that one and make the necessary changes as needed.
Since I'm creating a PLIST plugin I'm going to base my plugin off the /opt/local/lib/perl5/site_perl/5.12.3/Log2t/input/safari.pm plugin that Hal Pomeranz created.
Step 2 Edit basic information
At the top of the file, fill in your information. Include the version of the plugin code and explain what the plugin is going to do.
Step 3 Name your plugin
Right after the comments you should fill in the name of your plugin. Mine is safari_download. This should also match the file name of the plugin which is safari_download.pm.
<pre>package Log2t::input::safari_download;</pre>
Step 4 Determine initial packages you will needed.
The great thing about having this program written in perl is the large amount of libraries already available to make life easier. Log2timeline also has a list of libraries that are available.
Library | How to include | Purpose |
Common | use Log2t::Common; | Mostly used by the main tool itself. Provides information about where to find library files, version number, etc. Some input modules load this library to use the "get_username_from_path" subroutine, which tries to extract the a username from the path of the file (as the name clearly indicates). |
Time | use Log2t::Time; | Used by most if not all input modules. This module provides multiple subroutines that take as an input a date or a timestamp in various formats and returns back the timestamp in Epoch format. It also has subroutines to change Epoch time to text. |
BinRead | use Log2t::BinRead; | Used by most input modules that deal with binary files. This library is created to make it easier to read data from binary files. |
Network | use Log2t::Network; | Very simple library, currently the only subroutine is the get_icmp_text that takes as an input both the ICMP text and code and returns a text value. |
Numbers | use Log2t::Numbers; | Simple library that contains two subroutines, one to join together two numbers and another one to round up a number. |
Win | use Log2t::Win; | Library that can be used by input modules that parse Windows artifacts that might contain some GUIDs. It contains a list of few GUIDs that can be transformed into default values of software. |
WinReg | use Log2t::WinReg; | A library that registry modules use to extract deleted registry entries from a hive file. |
In my plugin, I'm using the following.
use strict;
use Log2t::Common;
use Log2t::Time;
Step 5 Determine how to process the file
The first subroutine you need to modify is the new() routine. It is the default constructor for the module and it starts by running the parent's class constructor (input.pm).
The parent's class defines few variables that can be changed in the new() routine if needed:
$self->{?multi_line'} = 1;
$self->{?type'} = ?file';
$self->{?file_access'} = 0;
The above values are the default ones and need not be defined unless you want to change them. To explain each variable, it is very important to understand how the main engine in log2timeline calls the input module.
The main engine starts by initializing the module and uses the values of these variables to adjust how it calls the module when parsing files. There are basically two methods of retrieving timestamps. Either the engine only asks once for a timestamp and the module is supposed to return a hash value that contains timestamp objects (explained later) or the engine calls the input module once for each timestamp there is.
The variable that defines this behavior is the ?multi_line'. If it is set to one, the engine will treat this as an ASCII file or a file that contains one timestamp per line. It calls the input module once for each line that contains a timestamp, until there are no more. If the ?multi_line' variable is set to zero, then there is only one call made to retrieve timestamps. The module should return a reference to a hash that contains timestamp objects.
In this Plist parser, I will be using 0. You need to parse all the XML elements at one time.
Step 6 Plugin Descriptions
In sub get_description, enter a short description of what the plugin does. This will be displayed when you run ( log2timeline -f list ).
<pre><code><br>sub get_description<br>{<br> return "Parse the contents of a Safari Download.plist file";<br>}<br></code></pre>
In get_help, enter a long description how the the module works and what is does. This will display when you run (log2timeline -f safari_download -h).
sub get_help
{
return "Usage: $0 -f safari_download ... -- [-u username] [-h hostname]
This plugin parses the content of Download.plist, a binary property
list file containing Safari download history. On Mac OS X systems,
this file is typically in /User/<username>/Library/Safari";
}
Step 7 Determine the format of the file
This is where you actually start doing some work. Verify is the subroutine that runs and checks if the file is the correct format to parse. This needs to be very specific. If other files also meet the same criteria, they will be parsed incorrectly. In a normal log file, you will need to setup a regex that will match the line format of the file and then check to make sure the data is valid.
Binary Plist Files
The Plist files are binary or XML and you'll need to do something a little different. To determine what Hal did for the safari plugin, we need to look at the format of the History.plist file.
Look at the first line of the file
#cat /Users/twebb/Library/Safari/History.plist |head -n1
??list00_WebHistoryFileVersion_WebHistoryDatesPUtitle_lastVisitedDateZvisitCountQDQW_http://www.apple.com/startpage/]Apple - Start[322428070.5?
Now we see how the file starts above in green. This may not be unique to the file so you'll need to do some testing on similar files, but in this case it is unique. Now lets look at Hal's code.
<pre><code>read($self->{'file'}, $buf, 32);</code></pre>
Log2timeline is feeding the file to his plugin and he is reading the first 32 characters into a variable called buf.
unless ($buf =~ /^bplist00.*WebHistoryFile/) {
$return{'msg'} = 'Does not appear to be a History.plist file';
return \%return;
Now that we have data in variable $buf we need to see if the file matches what we expect. Take the contents of variable $buf and see if it matches the regex of /^bplist00.*WebHistoryFile/ . If you look at the file in blue (above) , it does match the beginning of the file. If it does not, it returns the error.
XML Plist Files
The web history plist is a binary xml file where the download history plist is a standard xml file, but the same perl library will parse both types in the same way.
#cat /Users/twebb/Library/Safari/Downloads.plist |head -n5
?xml version="1.0" encoding="UTF-8"?> !DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd"> plist version="1.0"> dict> key<strong>>DownloadHistory</strong></key>
What makes this file unique is the DownloadHistory key on the 5th line of the file. We will need to read the start of the file and check to see if DownloadHistory exists. Let's start writing the subroutine verify.
sub verify
{
my $self = shift;
my $buf = undef;
my %return = ('success' => 0, 'msg' => 'No such file or directory');
return \%return unless (-f ${$self->{'name'}});
The first portion above just sets up default variables. $buf is the name of the variable we will read data into for analysis.
<pre><code><br>for( my $i=175 ; $i < 200; $i++ ) # This sets up a counter i for lines<br>{<br> seek($self->{'file'},$i,0); #Goto byte i<br> read($self->{'file'},$temp,1); #Read and stores into $temp<br> $line .= $temp; #Setup variable for line<br> }<br>unless ($line =~ /DownloadHistory/) { #Search the line for DownloadHistory<br> $return{'msg'} = 'Does not appear to be a DownloadHistory.plist file';<br> return \%return;<br>}<br></code></pre>
The code above is the main portion where the match happens. We start reading the file at bit 175 and stop at bit 200. If it contains /DownloadHistory/ then it will parse the file. You want this portion of the script to be as fast and exact as possible. When using timescanner with this plugin, all files will be scanned using this part of the code. The faster the better.
$return{'success'} = 1;
$return{'msg'} = 'Success';
return \%return;
}
The $return{'success'} is the value to tell the main log2timeline program that this is a valid file to parse. If you return the value 0, it will not parse the file.
The $return{'msg'} value will be displayed to the user. For error messages the more descriptive, the better.
If the match works, it will return 1 to the main program and the message successful.
Step 7.1 BinRead Library
The BinRead Library is setup to make reading files easier. Its able to support ASCII and binary files. Below are the details of the library.
$line = Log2t::BinRead::read_ascii_until( $self->{'file'}, \$ofs, "\n", 100 );
read_ascii ( \*FH, \$ofs, $length )
This function returns an ASCII string of length $length read from the binary file FH (accepts FH as a reference to a typeglob of the filehandle).
The variable offset dictates where in the binary file we find the start of the string, the offset variable is a reference, since the offset variable is increased
as each character is read (so the offset variable will be $ofs+$length at the end of the function).
read_ascii_end ( \*FH, \$ofs, $max )
This function returns an ASCII string of maximum length $length, from the binary file FH (accepts FH as a reference to a typeglob of the filehandle), but otherwise until an end of a string or a null character is seen. The variable offset dictates where in the binary file we find the start of the string, the offset variable is a reference, since the offset variable is increased as each character is read (the offset variable will be set at the end of the string).
read_8 ( \*FH, \$ofs )
This function reads 8 bits or one byte from the file FH (accepts FH as a reference to a typeglob of the filehandle) and return it according to the set endian of the file (default is little endian). The offset is then increased by one.
Program using the BinRead library.
You can replace the for loop in the code earlier ,(for( my $i=175 ; $i < 200; $i++ ), with this BinRead library for cleaner code.
my $ofs =175;
my $line = Log2t::BinRead::read_ascii( $self->{'file'}, \$ofs, 200 );
<pre> unless ($line =~ /DownloadHistory/) { #Match the line for DownloadHistory<br> $return{'msg'} = 'Does not appear to be a DownloadHistory.plist file';<br> return \%return;<br> }</pre>
Step 8 Init and File Location
sub init
{
my $self = shift;
# Try really hard to get a user name
unless (defined($self->{'username'})) {
$self->{'username'} = Log2t::Common::get_username_from_path(${$self->{'name'}});
}
return 1;
}
}
The engine calls the init subroutine after the file has been verified and before the file is parsed. For many files, this sub routine will not be needed and can be skipped or removed from the script. If the file you are planning to parse is under the users directory, you may want to include the code above. Log2timeline will then try to parse the username from within the file path.
The init section can also be used to setup other items. In my generic_linux plugin, I use it to calculate the last modified date for the syslog file. This is due to syslog not including the year along with month, day and time within the message.
Step 9 get_time
Now that we know the file is valid, we need to actually parse the file. The sub-routine get_time is where the magic happens.
my $self = shift;
my $Data = undef; # Perl data structure produced from plist file
my %container = undef; # the container that stores all the timestamp data
my $cont_index = 0; # index into the container
my $objects;
eval { $objects = Mac::PropertyList::parse_plist_file($self->{'file'}); };
You'll need to setup the variables for the parsing. Then you'll need to read in the file name that referenced in ($self->'file'). This is handed to your plugin from the main log2timeline perl program. In this instance above, it passing the file name to the library Mac::PropertyList for parsing.
<pre><code><br>eval { $Data = $objects->as_perl; };<br>foreach my $ref (@{$Data{'WebHistoryDates'}}) {<br> # New %t_line structure. Most of the basic information is fixed.<br> $container{$cont_index} = ('source' => 'WEBHIST',<br> 'sourcetype' => 'Safari history', 'version' => 2,<br> 'extra' => { 'user' => $self->{'username'}, },<br> );<br></code></pre>
The plist library returns what it processes as the variable $Data. The XML file element WebHistoryDates is the main element that everything branches off of in the file. So for each item($ref) that is loaded in from the array $DATA under WebHistoryDates will be used.
TLINE STRUCTURE
This is what gets sent back to the main log2timeline program and generates the output that we all know and love.
# create the t_line variable
%t_line = (
'time' => { 0 => { 'value' => $date, 'type' => 'Time Written', 'legacy' => 15 } },
'desc' => $text,
'short' => $text,
'source' => 'PLIST',
'sourcetype' => 'LOG',
'version' => 2,
'extra' => { 'user' => 'username extracted from line' }
);
Time:
- Value -Needs to be converted to epoch.
- Type- What the time means. Last Visited, Time Written.
- Legacy- MACB notation (8,4,2,1) This is a 4-bit value.
- 1=Modify Time
- 2=Access Time
- 4=Create Time
- 8=Birth Time
- If you want all entries listed add them up for 15.
Description:This what is in the file that we care about. What was access, downloaded, viewed, created..
Short: A shorter description.
Source: Short Description where the data came from.
Sourcetype: Long Description where data came from.
AV => anti virus logs
EVT => Event Log
EVTX => Event Log (newer format)
EXIF => metadata
FILE =>filesystem timestamp
LOG => log file
version: Version of the t_line format. Currently 2.
extra: Anything additional available from parsed data.
Testing
When you think you are ready to test, copy the file into the input directory under Log2timeline and give it a try.
#log2timeline -f (plugin) file
In my case I use:
#log2timeline -f safari_download /User/webb/Library/Safari/Downloads.plist
You will need to make sure that your test file includes all the known values in the file. This will insure that you are parsing all the data correctly.
Wrapping It Up
If you made it this far, then hopefully you will decided to create a plugin for this awesome tool. This is a great way to give back to the community and support open source. Special thanks to Kristinn Gudjonsson for help with clarifications in the post. There is a Google group for log2timeline developers, if your intrested in working on plugins please feel free to join.