SEC595: Applied Data Science and AI/Machine Learning for Cybersecurity Professionals


Experience SANS training through course previews.
Learn MoreLet us help.
Contact usBecome a member for instant access to our free resources.
Sign UpWe're here to help.
Contact Us
In ICS environments, detection is getting faster. Recovery, however, is not.
According to the recent SANS State of ICS/OT Security 2025 Survey, nearly 50% of ICS incidents are now detected within 24 hours, yet almost 20% still take more than a month to remediate.

From a detection standpoint, the industry is making significant progress in closing the gap from months to days to hours before realizing that adversary activity is in their environment. But scratch deeper, beyond detection, and a more sobering reality emerges: almost one in five incidents still takes more than a month to remediate, and some incidents stretch on for many months or longer. In other words, we are getting better at raising an incident, but far less effective at safely and confidently restoring operations once the incident response plan is set in motion. In industrial environments, resilience is not defined by how quickly you detect an intrusion. It is determined by how well you can reestablish the plant's mission under various impact scenarios within the optional recovery time, which is the intersection between the cost of recovery and the cost of disruption, as shown in Figure 2.

The industry has spent the last decade investing in visibility, monitoring, and detection. Asset inventory, network behavior detection monitoring, and SOC integration now dominate security roadmaps, and for extremely good reasons. You cannot respond to what you cannot see. As a leader, you must be able to answer the question, “Are we compromised?” This very question was posed to me by my manager while I was working as an OT cybersecurity lead in a Refinery, and one which I could not answer at the time. This haunting reality of not knowing whether someone was in my OT network fundamentally changed my career path, and I eventually ended up working for Dragos, a Network Security Monitoring company, so that I could help asset owners and operators answer that question. Detection closes the first gap in an incident timeline and supports containment and remediation actions by seeing when threat behaviors subside. However, more work is needed in the remediation area.
As illustrated in the SANS ICS Report, incidents unfold across three phases:
Most organizations have made real gains in the first two phases. The third, containment to remediation, is where recovery stalls. Why? Because remediation in ICS is not just a technical “restoration” activity. Shorter time-to-containment alone is not enough; organizations must invest in planned, documented, and rehearsed recovery processes to reduce real operational risk and indirect business loss. It is a cyber-physical problem that extends the process automation system into the physical domain of the process installation. Restoring a workstation is not the same as restoring a PLC or SIS. Reloading a server image is not the same as validating control logic, alarm limits, interlocks, or safety dependencies. Recovery actions that are routine in IT often fall short and can even introduce process risk in OT if performed incorrectly or out of sequence. ICS systems are actually “systems of systems” with many internal dependencies. Dependency analysis, as shown in Figure 3, must be understood before restoration activities are carried out.

Without understanding the nuances of restoration, teams hesitate in the rush to restore. Systems remain isolated longer than necessary. Recovery decisions get escalated repeatedly. Engineers wait for certainty that never fully arrives. Meanwhile, production losses accumulate, and the bottom line is ever more impacted. Detection may be fast, but restoration remains slow, cautious, and often improvised.
Across industries, several patterns keep recurring:
If you want to improve recovery, measure it. Most organizations track detection and containment metrics, but far fewer track restoration performance. Meaningful recovery metrics are those that demonstrate recovery activities consistently meet defined RTO, RPO, and MTD thresholds.
Closing the remediation gap requires treating cyber recovery as an operational discipline, not a security afterthought. That shift starts by embedding ICS cyber response directly into existing business continuity and disaster recovery (BC/DR) programs, rather than running cyber recovery in parallel or as a standalone effort. Resources such as the SANS OT Disaster Recovery Quick Start Guide can help organizations establish this foundation, but the real challenge lies in execution.
Effective ICS recovery programs explicitly define how cyber incidents intersect with established BC/DR structures, operating philosophies, and process safety constraints. Cyber recovery actions must align with the plant's emergency shutdown states, startup procedures, and existing safety boundaries. Yet, as shown in Figure 4, the SANS ICS Survey results highlight that cybersecurity is still poorly integrated into BC/DR planning across much of the industry, leaving recovery decisions improvised, escalated, or delayed when incidents occur.

Key elements of mature programs include clearly defined cyber recovery playbooks, agreed-upon safe states prior to restoration, and explicit authority for restoration decisions. These playbooks must be tightly aligned with process safety artifacts such as operating philosophies, alarm management strategies, and startup and shutdown procedures. Cyber recovery actions that bypass or contradict these controls introduce unacceptable risk, even if systems are technically restored. Treating recovery as a safety-critical activity ensures that restored systems re-enter service in a known, validated state that protects people, assets, and the environment. When cyber recovery is embedded into BC/DR, teams spend less time debating whether it is safe to restore and more time executing validated procedures.
Organizations that recover fastest are not those with the most comprehensive documentation, but those that have already rebuilt their systems under controlled conditions. In ICS environments, effective rehearsal goes far beyond tabletop discussions or theoretical walkthroughs. It requires restoring real PLC logic, HMI projects, historian data, firmware, and network configurations, all in the correct sequence.
Hands-on recovery exercises expose the realities that rarely appear in plans: hidden dependencies, version mismatches, incomplete backups, and validation gaps between cyber restoration and process readiness. Just as importantly, they force teams to make recovery decisions collaboratively across engineering, operations, and security, building confidence that only comes from having done the work before. That confidence becomes decisive when time pressure, safety concerns, and business impact collide during a real incident.

Rehearsing the full restoration sequence in this way transforms recovery from an improvised, high-risk activity into a repeatable operational capability. This is where courses like ICS612 fit. Participants are placed in realistic ICS environments and pushed beyond containment, working through dependency sequencing, firmware compatibility, configuration validation, and safe restart decisions under pressure, the same friction points that routinely derail recovery efforts in the field.
While detection continues to improve across the industry, resilience is ultimately decided during the recovery phase, not the alerting phase. Detection speed will continue to improve as more organizations leverage and operationalize NSM detection products. Visibility, analytics, and threat intelligence are advancing rapidly. But resilience will not be defined by detection alone. True ICS resilience is earned in the recovery phase, when teams must rebuild complex cyber-physical systems under pressure, without compromising safety or reliability. That capability does not emerge during an incident. It is developed through planning, rehearsal, and hard-earned operational discipline. If your incident response program stops at containment, you are only halfway there.
The fastest path to closing the remediation gap is practice, and this is where SANS ICS can support your critical infrastructure operational needs.
ICS612: ICS Cybersecurity In-Depth allows teams to rehearse full restoration in a realistic ICS environment, restoring systems, validating functionality, and understanding dependencies under pressure.
ICS418: ICS Security Essentials for Leaders helps leaders build site-specific recovery playbooks that align cyber response with operations, safety, and business priorities.
Detection gets you much-needed situational awareness. Restoration gets you resilience. As my wife, Saltanat Mashirova, likes to quote Vince Lombardi in DR discussions, “it’s not whether you get knocked down, it’s whether you get up.”


Michael Hoffman teaches ICS410 and ICS612 with a plant floor mindset, turning complex ICS/OT concepts into clear, repeatable practices. Students leave with practical skills that enable them to protect essential services without compromising safety or uptime.
Read more about Michael Hoffman