John Pescatore - SANS Director of Emerging Security Trends
Show How Security Improvements Reduce Business Outages
This week's Drilldown will focus on an item (included below) from NewsBites Issue 16 reporting that the U.S. Federal Reserve System had an outage that was not a cybersecurity incident. Rather, the outage was caused by "human error."
Over the years I've collected data from a lot of studies of the causes and cost of downtime. While the cost of downtime data has been all over the place, the causes have been pretty consistent. Human/admin operational error is the most common cause, generally cited between 60 and 70% of the time. Cybersecurity causes are usually the second- or third-highest factor in recent years, usually between 25 and 30%.
But, important to note: A large percentage of cybersecurity incidents are also enabled by "human error." Sensitive data breaches occur because someone mistakenly emailed a file containing customer information, because a laptop was lost, or because someone reused a document and didn't realize what was in it. I could go further and call most software vulnerabilities and overprivileged accounts to be human error--developers and IT admins making mistakes that they had been warned about.
The point is that there is a lot of investment in technology in the name of security and "resiliency" while small investments in reducing "human error" can have a huge impact on reducing outages overall and in reducing the percentage of outages that are security-related.
Does your organization track downtime? If so, do you know what percentage is determined to be human error and/or security related? If not, find out. Take advantage of the data to convince management of training areas or process changes that will reduce security incidents while being funded out of the IT or operations budgets chasing "resiliency."
US Federal Reserve Outage
The U.S. Federal Reserve System experienced an outage on Wednesday, February 24. This outage affected multiple services, including the Federal Reserve's Accounting Services, Central Bank, Check 21, Check Adjustments, FedACH, FedCash, FedLine Advantage, FedLine Command, FedLine Direct, FedLine Web, Fedwire Funds, Fedwire Securities, and National Settlement Services. The issue was determined to be an operational error and was largely resolved on Wednesday afternoon.
Typically, close to 60% of system downtime is caused by human error (less than 30% is caused by security-related events). Good to remind management of this. Most of the investment for Continuity of Operations, Business Continuity/Disaster Recovery, and the newest flavor of the month--"Resiliency"--should be routinely funded out of the IT budget. It will also provide benefit when ransomware or DDoS-type incidents occur.
This story and the TD Bank story remind us it's easy to forget we do get service interruptions for non-cyber security mechanisms and that even with regression testing, a change may still be impactful when deployed to production. Make sure your rollback capabilities are still within your MTD. While customer notification is important, having a customer-reachable status dashboard like the Federal Reserve's allows responders and other staff to remain focused on recovery. If you're an FI using these services, you should have verified all transactions completed as expected.
Read more in
Bleeping Computer: Federal Reserve nationwide outage impacts US banking system
GovInfosecurity: Federal Reserve's Money Transfer Services Suffer Outage
FRB Services: Service Status