homepage
Open menu
Go one level top
  • Train and Certify
    Train and Certify

    Immediately apply the skills and techniques learned in SANS courses, ranges, and summits

    • Overview
    • Courses
      • Overview
      • Full Course List
      • By Focus Areas
        • Cloud Security
        • Cyber Defense
        • Cybersecurity and IT Essentials
        • DFIR
        • Industrial Control Systems
        • Offensive Operations
        • Management, Legal, and Audit
      • By Skill Levels
        • New to Cyber
        • Essentials
        • Advanced
        • Expert
      • Training Formats
        • OnDemand
        • In-Person
        • Live Online
      • Course Demos
    • Training Roadmaps
      • Skills Roadmap
      • Focus Area Job Roles
        • Cyber Defense Job Roles
        • Offensive Operations Job Roles
        • DFIR Job Roles
        • Cloud Job Roles
        • ICS Job Roles
        • Leadership Job Roles
      • NICE Framework
        • Security Provisionals
        • Operate and Maintain
        • Oversee and Govern
        • Protect and Defend
        • Analyze
        • Collect and Operate
        • Investigate
        • Industrial Control Systems
      • European Skills Framework
    • GIAC Certifications
    • Training Events & Summits
      • Events Overview
      • Event Locations
        • Asia
        • Australia & New Zealand
        • Latin America
        • Mainland Europe
        • Middle East & Africa
        • Scandinavia
        • United Kingdom & Ireland
        • United States & Canada
      • Summits
    • OnDemand
    • Get Started in Cyber
      • Overview
      • Degree and Certificate Programs
      • Scholarships
      • Cyber Aces
    • Cyber Ranges
  • Manage Your Team
    Manage Your Team

    Build a world-class cyber team with our workforce development programs

    • Overview
    • Why Work with SANS
    • Group Purchasing
    • Build Your Team
      • Team Development
      • Assessments
      • Private Training
      • Hire Cyber Professionals
      • By Industry
        • Health Care
        • Industrial Control Systems Security
        • Military
    • Leadership Training
  • Security Awareness
    Security Awareness

    Increase your staff’s cyber awareness, help them change their behaviors, and reduce your organizational risk

    • Overview
    • Products & Services
      • Security Awareness Training
        • EndUser Training
        • Phishing Platform
      • Specialized
        • Developer Training
        • ICS Engineer Training
        • NERC CIP Training
        • IT Administrator
      • Risk Assessments
        • Knowledge Assessment
        • Culture Assessment
        • Behavioral Risk Assessment
    • OUCH! Newsletter
    • Career Development
      • Overview
      • Training & Courses
      • Professional Credential
    • Blog
    • Partners
    • Reports & Case Studies
  • Resources
    Resources

    Enhance your skills with access to thousands of free resources, 150+ instructor-developed tools, and the latest cybersecurity news and analysis

    • Overview
    • Webcasts
    • Free Cybersecurity Events
      • Free Events Overview
      • Summits
      • Solutions Forums
      • Community Nights
    • Content
      • Newsletters
        • NewsBites
        • @RISK
        • OUCH! Newsletter
      • Blog
      • Podcasts
      • Summit Presentations
      • Posters & Cheat Sheets
    • Research
      • White Papers
      • Security Policies
    • Tools
    • Focus Areas
      • Cyber Defense
      • Cloud Security
      • Digital Forensics & Incident Response
      • Industrial Control Systems
      • Cyber Security Leadership
      • Offensive Operations
  • Get Involved
    Get Involved

    Help keep the cyber community one step ahead of threats. Join the SANS community or begin your journey of becoming a SANS Certified Instructor today.

    • Overview
    • Join the Community
    • Work Study
    • Teach for SANS
    • CISO Network
    • Partnerships
    • Sponsorship Opportunities
  • About
    About

    Learn more about how SANS empowers and educates current and future cybersecurity practitioners with knowledge and skills

    • SANS
      • Overview
      • Our Founder
      • Awards
    • Instructors
      • Our Instructors
      • Full Instructor List
    • Mission
      • Our Mission
      • Diversity
      • Scholarships
    • Contact
      • Contact Customer Service
      • Contact Sales
      • Press & Media Enquiries
    • Frequent Asked Questions
    • Customer Reviews
    • Press
    • Careers
  • Contact Sales
  • SANS Sites
    • GIAC Security Certifications
    • Internet Storm Center
    • SANS Technology Institute
    • Security Awareness Training
  • Search
  • Log In
  • Join
    • Account Dashboard
    • Log Out
  1. Home >
  2. Blog >
  3. Least frequently occurring strings?
Dave Hull

Least frequently occurring strings?

April 23, 2011

My phone rang. It was a small business owner looking for some help. He had a system he wanted me to take a look at, but was light on specifics. I asked to speak to his IT person. He laughed and said he was the IT person and that he knew next to nothing about computers. An hour later I was sitting in his office filling out a chain of custody form and trying to get more information out of him.

"I can't really tell you much about it or what to look for, the system was just acting strange," he said.

"How so?" I asked.

"It seems slower than normal and I notice the lights on the back are blinking more than usual," he offered.

"When did this start?" I asked.

"Hard to say for sure, last week maybe," he said.

The conversation wasn't progressing as I'd hoped. Nevertheless, I told him I'd take a look and see what I could find out. I figured I'd start with a time line, paying careful attention to file system activity from the last few weeks and go from there.

Nothing in the time line stood out and I had no keywords or phrases of interest, no indicators of compromise to search for. I scanned an image of the drive with a couple different anti-virus tools and found nothing. Maybe there wasn't anything to find. Maybe I needed a new approach. I could build a Live View image of the system, boot it up and monitor the network traffic for anything noteworthy, a method I'd used before when in similar situations.

I collected the strings from the image using the old standby:

strings -a -t d sda1.dd>sda1.dd.asc

This collected ASCII strings and their byte offsets in the disk image.

Then I ran: 

strings -a -t d -e l sda1.dd > sda1.dd.uni

to gather Unicode strings and their byte offsets.

I quickly took a look at each file with "less":  

9000 /dev/null
9096 1j :1j :1j :1j :
9224 !j :1j :
9235 81j :
9352 !j :1j :
9363 81j :
264224 lost+found
264244 boot
264256 homeA/
264292 proc
264388 root
264400 sbinR
264412 floppy
264428 .bash_history

(Note: ACTUAL FILE CONTENTS HAVE BEEN CHANGED TO PROTECT THE INNOCENT)

I knew paging through looking for evil would be inefficient. Then it occurred to me that I could apply the Least Frequency of Occurence principle that Peter Silberman spoke about a few years ago at the SANS Forensics Summit. It would at least reduce the size of the data I was looking at and data reduction is good strategy for digital forensics practitioners.

So I ran the following commands:

cat sda1.dd.asc | awk '{$1=""; print}' | sort | uniq -c | sort -gr > sda1.dd.asc.lfocat sda1.dd.uni | awk '{$1=""; print}' | sort | uniq -c | sort -gr > sda1.dd.uni.lfo
Let me explain the purpose of this compound command. The cat command dumps the contents of a file to standard output (usually your screen). The pipe (|) that follows causes the output to be passed to the awk command. AWK is a powerful utility for processing text and while I'm far from an expert in awk, I know enough to get some useful things done. In this case, awk is removing the first field in the .asc and .uni files. The first field is the byte offset where the subsequent string occurs in the original image file. Awk assigns each field a numeric value, in this case the byte offset is $1 so setting $1 to "" effectively removes that value. The print command sends the line of text to standard output where it is piped to sort. sort does what you expect. Next the uniq -c command removes duplicate lines and counts the number of occurrences of any duplicate lines, this data is then piped again to the sort command and this time we tell sort to do a numeric sort and to reverse the output. The results of all of this is redirected to a new file called "sda1.dd.asc.lfo" and "sda1.dd.uni.lfo",  respectively.

Now when I look at "sda1.dd.asc.lfo",  I see something like this:

3703 GCC: (GNU) egcs-2.91.66 19990314/Linux (egcs-1.1.2 release)
1268 return;
1116 else
757 done.
755 Disabling CPUID Serial number...
734 .text

Field one in this file is a number of times the string after it occurs in the disk image. So far so good. If an attacker has placed code on this system, I would expect it to be one of the least frequently occurring items on the system. Let's jump to the bottom of the file and look at the least frequently occurring strings:

1 `|~\
1 `<%
1 `~+<

Hm, well this is certainly ugly and not very useful. I needed to further reduce the data set. I sent out a call for a new command like "strings" called "English" that would be smart enough to discern English text from garbage.

Within minutes people were replying that I could use grep and a dictionary file. On my Ubuntu box there are several dictionary files including one in "/usr/share/dict" called "american_english" that contained 98K+ words. I decided to clean it up a little by running "strings" against it to remove short words that may yield false positives and I removed all words containing apostrophes just because. The result was a file containing just under 73K words.

I was afraid that grep would take forever to search for matches in my ".asc" and ".uni" files, but figured I'd try it and leave it running overnight. To my surprise, it finished quickly, but didn't appear to eliminate the garbage at the end of the file. I thought grep may be silently failing due to the size of the dictionary file, but after a little trial and error, I discovered the problem was that the first line of the dictionary file was blank. Blank lines in your indicators of compromise of keyword file is a known problem, get rid of them.

Because the dictionary file is a list of "fixed strings" and not regular expressions, I used the "-F"  flag (nod to ubahmapk for that) to tell  grep to interpret the strings as such, this dramatically improves performance and that is a huge understatement. The commands I used were:

grep -iFf american_english_short sda1.dd.asc.lfo > sda1.dd.asc.lfo.words
grep -iFf american_english_short sda1.dd.uni.lfo > sda1.dd.uni.lfo.words

The "-i" tells grep to ignore case when matching and the "-f" tells grep that the "patterns" are to be read from a file.

Now the least frequently occurring lines in "sda1.dd.asc.lfp.words" looked like this:

1 %02(hour{date}):%02(min{date}) \
1 02 4 * * * root run-parts /etc/cron.daily
1 01 * * * * root run-parts /etc/cron.hourly
1 # 0 1 0 1 1200 baud

Gone was the garbage, granted there was still plenty of useless info to wade through, but at least now there was less of it. And within minutes of careful review of the least frequently occurring text, I noted the following:

1 else if $HISTFILE has a value, use that, else use ~/.bash_history.
1 } else if (!(get_attr_handle(dcc[idx].nick) & (USER_BOTMAST | USER_MASTER))) {
1 } else if (get_assoc(par) != atoi(s)) {

That middle line looked awfully suspicious. I went back to my original strings file and grepped for "USER_BOTMAST":

grep USER_BOTMAST sda1.asc
...
8819144 if ((atr & USER_BOTMAST) && (!(atr & (USER_MASTER | USER_OWNER)))
8820316 if ((get_attr_handle(dcc[idx].nick) & USER_BOTMAST) &&
8823918 if ((get_attr_handle(dcc[idx].nick) & USER_BOTMAST) &&
...

Now I had the byte offsets where the "USER_BOTMAST" string occurred in the disk image. I recovered the file using the techniques we teach in SANS Forensics 508 and saw the following:

/*
This file is part of the eggdrop source code
copyright (c) 1997 Robey Pointer
and is distributed according to the GNU general public license.
For full details, read the top of 'main.c' or the file called
COPYING that was distributed with this code.
*/
#if HAVE_CONFIG_H
#include
#endif
#include
#include
#include
#include
#include
#include "eggdrop.h"
#include "users.h"
#include "chan.h"
#include "tclegg.h"
...

Of course this approach will not be appropriate in many cases. Thankfully, we almost always have more useful information to go on, but it's fun to explore new techniques and think about new ways of tackling cases and who knows, you may be faced with a situation where looking for least frequently occurring artifacts will yield useful information. The other point of this post is, ahem, pedagogical. That is to say, the information presented here is not meant to be applied exactly as it has been in this post, it is meant to expose less experienced Linux users to some powerful command line tools and to spur thought and conversation about unorthodox approaches to investigations.

I'll follow this post in a few days with another that is orthogonal. It will be less about forensics specifically, but for those who use the Linux command line for forensics, it may prove useful.

Dave Hull is an incident responder and forensics practitioner for Trusted Signal. 

Share:
TwitterLinkedInFacebook
Copy url Url was copied to clipboard
Subscribe to SANS Newsletters
Receive curated news, vulnerabilities, & security awareness tips
United States
Canada
United Kingdom
Spain
Belgium
Denmark
Norway
Netherlands
Australia
India
Japan
Singapore
Afghanistan
Aland Islands
Albania
Algeria
American Samoa
Andorra
Angola
Anguilla
Antarctica
Antigua and Barbuda
Argentina
Armenia
Aruba
Austria
Azerbaijan
Bahamas
Bahrain
Bangladesh
Barbados
Belarus
Belize
Benin
Bermuda
Bhutan
Bolivia
Bonaire, Sint Eustatius, and Saba
Bosnia And Herzegovina
Botswana
Bouvet Island
Brazil
British Indian Ocean Territory
Brunei Darussalam
Bulgaria
Burkina Faso
Burundi
Cambodia
Cameroon
Cape Verde
Cayman Islands
Central African Republic
Chad
Chile
China
Christmas Island
Cocos (Keeling) Islands
Colombia
Comoros
Cook Islands
Costa Rica
Croatia (Local Name: Hrvatska)
Curacao
Cyprus
Czech Republic
Democratic Republic of the Congo
Djibouti
Dominica
Dominican Republic
East Timor
East Timor
Ecuador
Egypt
El Salvador
Equatorial Guinea
Eritrea
Estonia
Ethiopia
Falkland Islands (Malvinas)
Faroe Islands
Fiji
Finland
France
French Guiana
French Polynesia
French Southern Territories
Gabon
Gambia
Georgia
Germany
Ghana
Gibraltar
Greece
Greenland
Grenada
Guadeloupe
Guam
Guatemala
Guernsey
Guinea
Guinea-Bissau
Guyana
Haiti
Heard And McDonald Islands
Honduras
Hong Kong
Hungary
Iceland
Indonesia
Iraq
Ireland
Isle of Man
Israel
Italy
Jamaica
Jersey
Jordan
Kazakhstan
Kenya
Kiribati
Korea, Republic Of
Kosovo
Kuwait
Kyrgyzstan
Lao People's Democratic Republic
Latvia
Lebanon
Lesotho
Liberia
Liechtenstein
Lithuania
Luxembourg
Macau
Macedonia
Madagascar
Malawi
Malaysia
Maldives
Mali
Malta
Marshall Islands
Martinique
Mauritania
Mauritius
Mayotte
Mexico
Micronesia, Federated States Of
Moldova, Republic Of
Monaco
Mongolia
Montenegro
Montserrat
Morocco
Mozambique
Myanmar
Namibia
Nauru
Nepal
Netherlands Antilles
New Caledonia
New Zealand
Nicaragua
Niger
Nigeria
Niue
Norfolk Island
Northern Mariana Islands
Oman
Pakistan
Palau
Palestine
Panama
Papua New Guinea
Paraguay
Peru
Philippines
Pitcairn
Poland
Portugal
Puerto Rico
Qatar
Reunion
Romania
Russian Federation
Rwanda
Saint Bartholemy
Saint Kitts And Nevis
Saint Lucia
Saint Martin
Saint Vincent And The Grenadines
Samoa
San Marino
Sao Tome And Principe
Saudi Arabia
Senegal
Serbia
Seychelles
Sierra Leone
Sint Maarten
Slovakia
Slovenia
Solomon Islands
South Africa
South Georgia and the South Sandwich Islands
South Sudan
Sri Lanka
St. Helena
St. Pierre And Miquelon
Suriname
Svalbard And Jan Mayen Islands
Swaziland
Sweden
Switzerland
Taiwan
Tajikistan
Tanzania
Thailand
Togo
Tokelau
Tonga
Trinidad And Tobago
Tunisia
Turkey
Turkmenistan
Turks And Caicos Islands
Tuvalu
Uganda
Ukraine
United Arab Emirates
United States Minor Outlying Islands
Uruguay
Uzbekistan
Vanuatu
Vatican City
Venezuela
Vietnam
Virgin Islands (British)
Virgin Islands (U.S.)
Wallis And Futuna Islands
Western Sahara
Yemen
Yugoslavia
Zambia
Zimbabwe

By providing this information, you agree to the processing of your personal data by SANS as described in our Privacy Policy.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Tags:
  • Digital Forensics and Incident Response

Related Content

Blog
blog_340x340_Digital_Forensics_Salary_Skills_and_Career_Path.jpg
Digital Forensics and Incident Response
March 30, 2023
Digital Forensics Salary, Skills, and Career Path
How to become a digital forensic analyst
thomas.jpg
Thomas Wolfe
read more
Blog
N2C_Blog_Image.png
Penetration Testing and Red Teaming, Cyber Defense, Cybersecurity and IT Essentials, Open-Source Intelligence (OSINT), Digital Forensics and Incident Response
March 14, 2023
A Visual Summary of SANS New2Cyber Summit 2023
Check out these graphic recordings created in real-time throughout the event for SANS New2Cyber Summit 2023
370x370-person-placeholder.png
Alison Kim
read more
Blog
Untitled_design-43.png
Digital Forensics and Incident Response, Cybersecurity and IT Essentials, Industrial Control Systems Security, Purple Team, Open-Source Intelligence (OSINT), Penetration Testing and Red Teaming, Cyber Defense, Cloud Security, Security Management, Legal, and Audit
December 8, 2021
Good News: SANS Virtual Summits Will Remain FREE for the Community in 2022
They’re virtual. They’re global. They’re free.
370x370-person-placeholder.png
Emily Blades
read more
  • Register to Learn
  • Courses
  • Certifications
  • Degree Programs
  • Cyber Ranges
  • Job Tools
  • Security Policy Project
  • Posters & Cheat Sheets
  • White Papers
  • Focus Areas
  • Cyber Defense
  • Cloud Security
  • Cybersecurity Leadership
  • Digital Forensics
  • Industrial Control Systems
  • Offensive Operations
Subscribe to SANS Newsletters
Receive curated news, vulnerabilities, & security awareness tips
United States
Canada
United Kingdom
Spain
Belgium
Denmark
Norway
Netherlands
Australia
India
Japan
Singapore
Afghanistan
Aland Islands
Albania
Algeria
American Samoa
Andorra
Angola
Anguilla
Antarctica
Antigua and Barbuda
Argentina
Armenia
Aruba
Austria
Azerbaijan
Bahamas
Bahrain
Bangladesh
Barbados
Belarus
Belize
Benin
Bermuda
Bhutan
Bolivia
Bonaire, Sint Eustatius, and Saba
Bosnia And Herzegovina
Botswana
Bouvet Island
Brazil
British Indian Ocean Territory
Brunei Darussalam
Bulgaria
Burkina Faso
Burundi
Cambodia
Cameroon
Cape Verde
Cayman Islands
Central African Republic
Chad
Chile
China
Christmas Island
Cocos (Keeling) Islands
Colombia
Comoros
Cook Islands
Costa Rica
Croatia (Local Name: Hrvatska)
Curacao
Cyprus
Czech Republic
Democratic Republic of the Congo
Djibouti
Dominica
Dominican Republic
East Timor
East Timor
Ecuador
Egypt
El Salvador
Equatorial Guinea
Eritrea
Estonia
Ethiopia
Falkland Islands (Malvinas)
Faroe Islands
Fiji
Finland
France
French Guiana
French Polynesia
French Southern Territories
Gabon
Gambia
Georgia
Germany
Ghana
Gibraltar
Greece
Greenland
Grenada
Guadeloupe
Guam
Guatemala
Guernsey
Guinea
Guinea-Bissau
Guyana
Haiti
Heard And McDonald Islands
Honduras
Hong Kong
Hungary
Iceland
Indonesia
Iraq
Ireland
Isle of Man
Israel
Italy
Jamaica
Jersey
Jordan
Kazakhstan
Kenya
Kiribati
Korea, Republic Of
Kosovo
Kuwait
Kyrgyzstan
Lao People's Democratic Republic
Latvia
Lebanon
Lesotho
Liberia
Liechtenstein
Lithuania
Luxembourg
Macau
Macedonia
Madagascar
Malawi
Malaysia
Maldives
Mali
Malta
Marshall Islands
Martinique
Mauritania
Mauritius
Mayotte
Mexico
Micronesia, Federated States Of
Moldova, Republic Of
Monaco
Mongolia
Montenegro
Montserrat
Morocco
Mozambique
Myanmar
Namibia
Nauru
Nepal
Netherlands Antilles
New Caledonia
New Zealand
Nicaragua
Niger
Nigeria
Niue
Norfolk Island
Northern Mariana Islands
Oman
Pakistan
Palau
Palestine
Panama
Papua New Guinea
Paraguay
Peru
Philippines
Pitcairn
Poland
Portugal
Puerto Rico
Qatar
Reunion
Romania
Russian Federation
Rwanda
Saint Bartholemy
Saint Kitts And Nevis
Saint Lucia
Saint Martin
Saint Vincent And The Grenadines
Samoa
San Marino
Sao Tome And Principe
Saudi Arabia
Senegal
Serbia
Seychelles
Sierra Leone
Sint Maarten
Slovakia
Slovenia
Solomon Islands
South Africa
South Georgia and the South Sandwich Islands
South Sudan
Sri Lanka
St. Helena
St. Pierre And Miquelon
Suriname
Svalbard And Jan Mayen Islands
Swaziland
Sweden
Switzerland
Taiwan
Tajikistan
Tanzania
Thailand
Togo
Tokelau
Tonga
Trinidad And Tobago
Tunisia
Turkey
Turkmenistan
Turks And Caicos Islands
Tuvalu
Uganda
Ukraine
United Arab Emirates
United States Minor Outlying Islands
Uruguay
Uzbekistan
Vanuatu
Vatican City
Venezuela
Vietnam
Virgin Islands (British)
Virgin Islands (U.S.)
Wallis And Futuna Islands
Western Sahara
Yemen
Yugoslavia
Zambia
Zimbabwe

By providing this information, you agree to the processing of your personal data by SANS as described in our Privacy Policy.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
  • © 2023 SANS™ Institute
  • Privacy Policy
  • Contact
  • Careers
  • Twitter
  • Facebook
  • Youtube
  • LinkedIn