This is Part 1 of 4-part series. Read the other parts here:
Rekt Casino is kind of a wreck right now and is scrambling to respond to the successful ransomware attack they suffered late last year. In reviewing the report from the outside incident response team and responding to the flood of auditors and regulators that are now keenly interested in what Rekt Casino is doing to secure their organization and protect customer data and payment systems, it has become clear that they are woefully understaffed and lack some of the most basic security controls.
This post will focus on one of the core building blocks of any security program that can help prevent or reduce the impact of breaches. Managing our vulnerabilities is an essential security function. It is number 3 in the Center for Internet Security’s Critical Security Controls and is only outranked by hardware and software asset inventory which are also crucial to the success of any vulnerability management program (or really any other security program for that matter).
Vulnerability management is one of the oldest security functions for most organizations, yet we continue to struggle to keep up with the demands as our technology continues to evolve and expand and as we get stuck supporting and maintaining legacy components. There is no silver bullet for vulnerability management, but there are certain actions that we can all take to improve and through consistent, incremental improvement we can eventually get out of this mess we have made for ourselves.
In order for Rekt Casino or any other organization to build or improve their vulnerability management program, we have a few suggestions:
- Take time to understand
- Craft the story
- Commit to be measured
- Don’t wait for the perfect solution
- Be agile
- Strategically simplify
Take Time to Understand
In order to build a successful vulnerability management program, it is best to start by opening the lines of communication with the technology teams and business stakeholders that will be active participants in the program. We need to understand what is driving their decisions on a daily basis and where their priorities are now and in the future. Chances are, for Rekt Casino, that our priorities are now very much aligned. However, that wasn’t the case a few months ago and most likely won’t be the case months or years from now when the breach is a distant memory.
We will need a lot of support from these technologists and stakeholders and if we don’t take the time to understand their current processes, technological capabilities, and business drivers, our solution to the problem may not be in alignment which inevitably leads to pushback, lack of engagement, and rework.
Craft the Story
In the case of Rekt Casino, it may seem like the story or business case writes itself. While in many respects this is true, we need to be careful to not rely to heavily on these events to push forward our agendas or we may lose support over time or even sooner if something else happens that shifts the organizations focus from the breach to something else of equal or greater importance or cost to the organization. I am not saying we shouldn’t leverage the breach to influence change, but we need to build a story that survives and maintains validity even if the breach is quickly forgotten or less costly than originally estimated.
Commit to be Measured
Because of the urgency surrounding the breach, once we build our case and get approval, we may feel like we need to or want to dive right in and start making changes. While it is important to move quickly, something many organizations fail to consider is how success will be measured. If we don’t think about this early, it may be difficult or impossible to gather the information we need later. By determining metrics up front, it allows us to collect baselines for comparisons and trends and will heavily influence our process and technology decisions. We should not only think about what metrics we will track within our own processes and technology, but also think about outside metrics and data points that may improve with the increased focus and effort on vulnerability management. For example, if over time more focused effort in vulnerability management leads to faster turn times on fixes, decreased failure rates, increased automation, or even decreased costs due to standardization and consolidation, all of these can be used to continue to build and support our story over time.
Don’t Wait for the Perfect Solution
First of all, the perfect solution is fantasy. There are no perfect solutions in vulnerability management. Even if there were, the perfect solution for one organization would almost never be the perfect solution for any other organization. Just like people, all organizations are unique and have different culture, needs, and constraints. Part of taking time to understand is determining what capabilities already exist that we can leverage to get started. Rekt doesn’t have much in place right now from a security perspective. It will take time to procure, install, and adequately configure the technologies that many of us would expect to exist within a robust vulnerability management program. Rekt doesn’t have a good understanding of all of the systems and software that exist within the organization, let alone technology to automatically identify all of the vulnerabilities on their servers and workstations.
We don’t need to wait for these technology acquisitions to get started. Rekt like most businesses is using technology to patch and configure their systems. We don’t need to wait for a vulnerability scan to tell us there are vulnerabilities when our patch and configuration management tools already highlight many of these issues. Eventually, we will need robust identification tools to find anything we missed and provide validation of our efforts, but we can make significant progress by mining the data sources that already exist within our organization for information.
We can cobble together some hardware and software inventories by leveraging our virtualization and cloud APIs and both validating and supplementing that information with data from the many other technology management and monitoring capabilities we have deployed in the environment. It won’t be perfect, but it’s a start.
We can also start to identify the operating system and software patches and configuration changes that seem to be the most problematic in our environment and dig into the why or the root cause for these issues. Chances are installing a vulnerability identification tool is not going to remove these roadblocks and this will allow us to jumpstart the conversations and efforts to resolve these larger issues which tends to result in the greatest reductions in vulnerabilities.
While we need to have a long-term strategy for improving vulnerability management, we need to take an agile approach to achieving these strategic objectives. Too many times, I have seen organizations spend months or years planning and executing large projects only to experience major changes mid-project that completely derail their efforts. Consider a huge project to procure and implement vulnerability scanning, patch, and configuration management within our traditional data centers only to find out halfway through that we are moving everything into the cloud or moving from physical or virtual servers to containers. Either of these changes will required different identification and remediation practices and different technologies to help support those practices.
Being agile is not just about iterating quickly, but also about opening lines of communication. Both of these agile concepts will help us have more immediate impact, but also receive more consistent and timely feedback. Chances are as we are implementing some of the more traditional processes and technologies, if we are consistently communicating and demoing these new processes and technologies with our stakeholders, they may ask how this will work in the cloud or what we might need to do differently for containers if those changes are on the horizon. They will most likely know about these coming changes well before they are communicated to the broader organization.
Vulnerability management has simultaneously never been harder, but also never been easier. We have so much more to assess and manage these days that it is overwhelming. But, we also have more technologies and services than ever before to help us succeed. While the cloud can cause the number of systems we have to manage to balloon quickly, it also offers platform, functions, and software as a service which allow us to leverage the shared responsibility model to be less responsible for managing the vulnerabilities on the hosts that support these services. Maybe we are not comfortable with giving up this responsibility for certain workloads, but there must be certain asset classifications where the benefit outweighs the risk.
Both the cloud and containers encourage us to follow a more immutable approach to our assets by forcing us to use images, which can lead to greater standardization and fewer vulnerabilities if these images are managed and vetted properly and if teams are required to upgrade to new images within a certain timeframe.
On the custom application side, we are adding more and more apps and features, but developers are writing less and less custom code and leveraging more and more third-party and open-source libraries. Many of these libraries have built-in security capabilities and features that help reduce common vulnerabilities in our code. We need to train our developers on how to use these libraries correctly and make sure we are vetting them and monitoring them for vulnerabilities, but this tends to take less time and effort than auditing and assessing code written by our developers.
By following these principles as we develop or improve our vulnerability management programs, we can have continuous, incremental success which will lead to big gains over time. It won’t happen overnight and some of our efforts will fail, but with the small, incremental approach, failure is much easier to tolerate and recovery is possible. Especially if all of our stakeholders have been involved, have bought into our story, and we have been able to consistently demonstrate and highlight the improvements we have achieved through relevant and accurate metrics.
During the webcast last week, we had a lot of great questions and were not able to adequately respond to all of these questions and so we wanted to take the time to answer those in this blog post. Here are the questions and responses:
- How do we address assets in cloud environment that are setup based on autoscaling rules? (servers that spin off and shut down based on autoscaling rules)
- Autoscaling makes traditional vulnerability identification and reporting processes much more difficult. Autoscaling is one way of following immutable design principles. Another would be containers. While identification and reporting becomes more difficult, remediation can be easier if teams are required to use an approved image and update their launch configurations or templates on a regular basis. My clients that are successfully managing vulnerabilities in this fashion are creating, vetting, and approving new images every 1 or 2 weeks and then requiring teams to update every couple of weeks to couple of months. Instead of or in addition to traditional vulnerability scanning they are scanning the images or an instance created from the image to identify vulnerabilities then associating those vulnerabilities with the image. They are then using automation or tooling to determine which cloud instances are still running vulnerable or unapproved images and notifying users or shutting those cloud instances down.
- Can you apply immutable infrastructure to traditional environment, such as those using Active Directory?
- Immutable design principles are more typically employed when the application functionality is de-coupled from the state information and data being leveraged by the application. Since certain technologies like corporate directories, email platforms, and databases typically store the state information and data, immutable design is more difficult to achieve. In these situations, I find that many of my clients are leveraging platform or software as a service options for these technologies unless they are able to set up and manage their own highly-available clusters that allow them to update in a timely manner.
- What do you recommend for asset discovery of unauthorized/unofficial IPv6 devices? The address space is too large to do a ping sweep.
- For discovery of IPv6, I find that passive discovery is most common due to the issue that you have highlighted. While there are some built-in protocol options for discovery like neighbor discovery, unauthorized or unofficial devices may not respond and discovery would not be comprehensive. These devices can be identified through passive network monitoring of network taps and the logging mechanisms of our network and perimeter devices like firewalls, routers, proxies, etc.
- Does the industry have a timeframe for Critical application securty bugs?
- It is a lot more difficult to come up with a standard measure for how quickly application security bugs should be remediated for a few reasons. First, some bugs take longer to fix than others. While certain bugs require minor changes like encoding or whitelisting values, others required more extensive changes to the architecture and design. While this is almost certainly true of network and host vulnerabilities as well, the time it takes to create the patch is not typically what we are measuring for our remediation timeframes. We are typically only looking at the time it takes to apply a patch or configuration change. With application vulnerabilities, we have to first create the patch and then apply it. The other issue that makes coming up with a standard timeframe for application vulnerabilities is the frequency changes are released for the affected applications. Remediation timelines must be achievable and so the remediation timeframe should not be shorter than the release timeframe. This means that development teams that are releasing multiple times a day may be able to support remediating critical vulnerabilities in a few days, but those releasing once a quarter will need more time. One way to set this up in a more flexible manner, is to set the remediation timeframe at the number of release cycles instead of days. So, for critical vulnerabilities, you may only get 1 release cycle, but for high you may get 2. If more agile teams complain that they are being punished for moving faster, then you could give something like 7 days or release cycle (whichever is longer). You could also have different timeframes for the easy to resolve vs. harder to resolve vulnerabilities.
About the Author
David is a security consultant based in Salt Lake City, Utah focused on vulnerability management, application security, cloud security, and DevOps. David has 20+ years of broad, deep technical experience gained from a wide variety of IT functions held throughout his career, including: Developer, Server Admin, Network Admin, Domain Admin, Telephony Admin, Database Admin/Developer, Security Engineer, Risk Manager, and AppSec Engineer. David is a co-author and instructor for MGT516: Managing Security Vulnerabilities: Enterprise and Cloud, an instructor for and contributor to SEC540: Cloud Security and DevOps Automation, and has also developed and led technical security training initiatives at many of the companies for which he has worked. Read David's full profile here.