Goolgle Web Site Malware Statistics
Google began publishing some interesting statistics recently - showing the number of sites it detects each week that are either compromised and attempting to deliver malware, or are being used in phishing attacks. Those numbers include both bad guy sites and compromised good guy sites that the bad guys use to launch attacks - the vast majority are compromised legitimate sites.
A couple of observations on the data above:
- Netcraft says it see about 672 million active websites out there, so if I assume Google indexes that same number, then Google is saying that somewhere around .007% of web sites are actively malicious, a pretty small percentage. This means 99.993% of sites are not actively being used maliciously.
- Google's data shows a lot of seasonality, and many spikes that are 50-75% higher than the running average. This is because there are a lot of web sites with easily exploited vulnerabilities that can used by attackers as needed. Both Whitehat Security and Veracode have shown that more than 80% of websites online today have exposed vulnerabilities. The fact that at any given time a small percentage of sites are actively malicious really only indicates that the attackers are using smart "just in time" attack provisioning to reduce the chances of being detected.
- Since the attackers actively steer their targets towards their attack sites (through spear phishing and the like), or compromise the sites their targets already go to (watering hole attacks), the malicious potential of a small number of attack sites gets highly amplified.
Bottom line: Ben Franklin and Shakespeare are credited with variations of "One bad apple spoils the whole bunch" which is sort of the problem here. The percentages may seem low in abstract, but practice shows us this level is much too high.
Now, thousand of elementary school science fair projects have proven that Franklin and Shakespeare were right - ripe fruit gives off a substance called ethylene that causes nearby pieces of fruit to accelerate their own ripening and potentially go bad. The analog in the web security problem is reducing the number of bad web sites releasing "cyber ethylene" into the Internet.
Now, a recent incident in China provides a more apt analogy I think: "a few thousand dead pigs (and ducks) dumped in the river can spoil the water for everyone downstream." The upstream pig dumpers don't feel the pain and have little incentive to fix the problem, just as the person who keeps the good apple and throws their rotten apple back into the apple barrel doesn't feel the pain.
So, how to we change this? Well, we have to more directly connect the pain to the problem. In the world of rivers and water pollution, there are actually laws around the responsibilities and liabilities of upstream polluters. But, in security we've talked for years about "upstream liability" and "attractive nuisance" and all kinds of laws that don't really seem to stick when applied to technology.
Over the years, bigger leaps in progress seem to have been made against polluters by the public backlash after their polluting has been exposed - think the BP oil spill disaster or the reaction to periodic scares about mad cow disease in the food supply.
I'd like to find a way to have a similar effect on compromised and long vulnerable websites. The Google data is a start, but it is essentially anonymized. It really is time to start naming names before the cyber-ethylene spreads.