SANS Digital Forensics and Incident Response Blog | Law Is Not A Science: Admissibility of Computer Evidence and MD5 Hashes

Another day... another hashing discussion:

On the SANS GIAC Alumni list the other day, the question popped up from one of the individuals on the list:

"I'm assuming that this group has had the pleasure to consume the latest research focused on MD5 hash collisions. Discussions about hash collisions seems to carry the same energy as religion and politics. My question is regarding digital evidence and the use of MD5 hashes to establish digital evidence integrity. The use of hashes to ensure digital evidence integrity has legal precedence. However, as more research companies introduce concerns related to MD5 hashes, the courts will at some point, no longer consider this as a valid technology to ensure integrity.

Has anyone heard of a successful attempt to dismiss evidence due to concerns that MD5 is no longer considered tamper proof?"

This topic pops up from time to time inour Computer Forensics classes at SANS (er... pretty much every time...)

The answer:

First off, as of today, using MD5 algorithm as a form of hashing for digital forensic work is completely acceptable.

You can use additional means of hashing, but honestly, choose which algorithm you feel is best. As long as you are accomplishing hashing of evidence you are fine and your evidence will usually see its day in court.

Why?

First off, admissibility guidelines do not differentiate between physical and electronic evidence. The Federal Rules of Evidence (FRE rules 901 and 902) guide authentication of evidence for admissibility (http://federalevidence.com/advisory-committee-notes). No where does it state that electronic evidence will be treated differently than physical evidence for authentication purposes.

Could you get electronic evidence admitted without hashing? Yep.
Will hashing help admissibility of my evidence? Certainly, but it is not legally required.
What if someone brings up collisions in court? Again, usually an attempt to confuse the jury. But you can turn this on them by stating that it is more likely that before showing up for jury duty, all the jurors randomly put the same 7 numbers into the Powerball Lottery and won. That has a much greater chance of happening than a naturally occurring collision. (Thanks to Scott Moulton for that great analogy). With folks being prosecuted on partial fingerprint matches or eye witness testimony from a guy driving by in a car at 30 MPH, do we really think this is a show stopper for courts?
Interesting Rob, but anyone with some legal credentials to back up what you are telling us? Yes, our very own author/senior instructor Richard Salgado for Computer Forensics at SANS wrote a wonderful paper on the topic several years ago for Harvard Law Review(http://www.harvardlawreview.org/forum/issues/119/dec05/salgado.pdf) that states "...there is more than reasonable assurance that two different inputs will not have the same hash value." ( see footnotes 7 & 8 )
If hashing is not legally required to prove authenticity, why do we use hashing, chain of custody, and proper storage of evidence in case of pending litigation? Two point five reasons:

1. Expert Witness:

Best practices are tested if you are deposed as an expert. Hashing (any form) is considered a best practice for digital forensic practitioners. If you take yourself seriously in this line of work and you do not perform any type of hashing then you open yourself up for a cross examination as an expert that would not be fun to sit through. "The court is called upon to reject testimony that is based upon premises lacking any significant support and acceptance within the scientific community," (http://federalevidence.com/advisory-committee-notes#Rule702). If you would like your testimony to hold greater weight, HASH. 'nuff said.

2. Tampering.

Tampering can only be brought up if the opposing council has a strong argument that the evidence has been deliberately modified. Tampering can not just be brought up because of it is digital evidence and easily modified... the opposing side has to prove it happened. The burden is on the side claiming that tampering happened not the side entering the evidence (see http://www.usdoj.gov/criminal/cybercrime/s&smanual2002.htm and do a search for "Authenticity and the Alteration of Computer Records"). With hashing (even using an algorithm such as MD5), you can reduce the threat that someone will claim the evidence has been tampered with if you can prove over time it has not changed. Which in this case, collisions are really not a big deal at all as long as you get the same hash every time you calculate it against the evidence.

Why is MD5 still ok? From the cited website: "The existence of an air-tight security system [to prevent tampering] is not, however, a prerequisite to the admissibility of computer printouts. If such a prerequisite did exist, it would become virtually impossible to admit computer-generated records; the party opposing admission would have to show only that a better security system was feasible."

One last thought from Eoghan Casey on this topic: "On May 24, 2006, the DFRWS posted a challenge asking for anyone to produce actual files (or evidence) that have produced a collision and nobody has succeeded yet!"

2.5. Law Is Not A Science:

I tell students this regularly... We (you and I) are technical. We grew up loving math. We feel that if we add1+1 we will always get 2. This is why it is a science. 1+1=2 Repeatable. 1+1=2 Satisfying. Feels good doesn't it? 1+1=2

Well, lets take that same formula from our nice scientific world and put it in the legal world.

Court 1: 1+1=2

Court 2: 1+1=2

Court 3: 1+1=3

See what happened there? We ended up with some bizarre result. This drives us crazy. Well, in reality, this is not exactly what happens. What does happen? What if you take the SAME evidence, the SAME analysis, the SAME conclusions... you drop that into TEN separate courts, you will probably end up with the same verdict 9 times out of 10.

HOWEVER, (comma, space, pause for additional dramatic effect) there is always at least one jury/judge that will think differently and rule the other way given the SAME evidence, arguments, and testimony. We need to realize that we cannot force our mindset onto a system that is not a science, but rather, is an art. As a result, like the core question asks about MD5 hashing, we think we need to "fix" the courts or come up with a system that is FAIL proof.

In the instances where we might find that MD5 is attacked in court and subsequently not used for authentication in a courtroom, we can point to variety of reasons. In the several cases my peers and I have reviewed, it appeared that the prosecution failed to produce an expert to discuss hashing. Generally all the expert would need to accomplish is to discuss the true likelihood of a collision... which is far less likely than even a collision with DNA evidence. It isn't whether the hashing standard has a fault, but whether it is GOOD enough... 1+1=3. DNA analysis, fingerprinting, and eye witness testimony all have their faults... but are they good enough to convict? YEP. Have criminals been let off due to the fact that the prosecution could not produce a DNA expert to discuss the likelihood of a false positive? Even worse, the judge/jury listens to the explanation and still reject it. You don't have to dig far to find cases where individuals are not convicted despite the fact compelling scientific evidence points to the contrary. 1+1=3

And here is the kicker... even though one or two courts rule against the scientific facts such as DNA evidence (or countless others), it does not set precedence and invalidate DNA evidence for here to the end of time.

So... what do the lawyers think?

The best way to see why law and science do not mix well is to view it from a lawyer's perspective. This is an excerpt from one of my favorite legal blogs on the subject written by Ralph Losey who has a wonderful book called e-Discovery Current Trends and Cases (worth a read if you deal with litigation and you work in IT). It is a rather long blog entry, but read it if you have the time. Doesn't directly discuss MD5 hashing, but you will see why such a discussion about MD5 hashing being admissible or not due to collisions probably drives the lawyers crazy... just like it drives us crazy when we ended up with 1+1=3 in their world.

From the blog: (http://ralphlosey.wordpress.com/2008/08/24/tech-v-law-a-plea-for-mutual-respect/)

...the practice of law is an art, not a science, and the human element can never be replaced by technology.

Unlike computer code, the rules of law are malleable and there are always exceptions. This in turn is one of the key reasons the two cultures of Law and IT have such a hard time understanding one another. It is also the reason a few inexperienced engineer types are delusionary and arrogant enough to think that e-discovery can be "fixed" with the right software algorithms. It cannot because law is not a science, it is far too complex and chaotic for that. Or if it is a science, it is more like Quantum Physics, where electrons are unpredictable and can be in two places at once, not the orderly world of Newtonian Science that most engineers live in.

Yes, there are many computer programs that can be used as effective tools in the pursuit of justice. We lawyers need to wake up to that fact. But so too do the technologists who think the right software alone will fix everything. The human element is key in Law which is one reason that training is so important.

Rob has over 15 years experience in computer forensics, vulnerability discovery, intrusion detection and incident response. Rob is the lead course author and faculty fellow for the computer forensic courses at the SANS Institute.