With the evolution of cloud-based protections and secure application development frameworks, fewer organizations are susceptible to having their databases dumped with server-side exploits. Faced with this, data thieves are getting more sophisticated with their techniques. One novel approach is abusing the Domain Name System (DNS) protocol to quietly exfiltrate data. Although DNS traffic is often overlooked, the major cloud providers have made it easier than ever to examine it, detect data loss, and lock down the network to prevent similar attacks in the future. This post will illustrate this process using a Node.js application deployed to Microsoft Azure.
DATA LOSS DETECTION
Our first step is to identify anomalies in the network traffic within our Azure Virtual Network (VNet). Prior to doing so, we must enable flow logging within our Network Security Group (NSG). We will store the flow logs in an Azure Storage account. However, to simplify the process of querying these logs, we will want to aggregate them to an Azure Log Analytics Workspace. All of this can be accomplished using the following snippet of Terraform code:
Enables flow logging for the specified NSG, stores it in a storage account, aggregates the logs to a Log Analytics workspace on a 10-minute interval, and retains the logs for 365 days.
Azure Log Analytics uses a query language called “Microsoft Kusto”, which is used for a variety of services within Azure. The following query can be used to find outbound traffic sent from an Azure VM with an IP of 10.0.2.4 over a non-system port (ports 1024-65535):
We ran this query in the Azure Portal and received the following results:
Tip: Although it might initially appear that these queries can only be performed in the Azure Portal, there is a little-known extension that you can use to run these queries using the Azure Command-Line Interface (CLI). This can enable your organization to run these queries automatically and generate reports on an interval. For more security tips and tricks with the cloud CLIs, check out the new SANS Cloud Security Multicloud Command-Line Interface Cheat Sheet.
These results should raise several red flags. A tremendous amount of traffic is being sent to a specific host over an atypical port (UDP 9999). Yet, this alone is not conclusive. We must now go beyond the traffic metadata and analyze the data itself. Flow logs do not capture this data as storing all traffic within an organization’s VNet would be a liability. Ideally, we would instead use something like AWS’s Traffic Mirroring feature. This would copy a subset of the data, deemed suspicious by a set of filter rules, to a separate VM running network analysis and monitoring software such as Zeek or Surricata. Unfortunately, Azure’s analogous feature, the Virtual Network Terminal Access Point (TAP), is currently on-hold for all regions.
A less elegant, but still functional solution is to perform traffic analysis on the affected VM itself. For simplicity, we are going to do this by connecting to the VM via SSH and running tcpdump, a popular CLI-based tool for traffic sniffing. While sniffing for all outbound traffic to port 9999, we observed the following bizarre-looking message:
What could this mean?
To the untrained eye, this might seem unintelligible. However, the equals sign at end of the text might tip off some encoding experts. After some research, you might determine that the data sent in these DNS queries can be combined, capitalized, and Base32 decoded. Here is the CyberChef recipe that will decode the combined text:
This proves that something is sending a secret message to an external server containing the server’s MAC addresses (Nimbus Inmutable is the name of our sample application). What is causing this?
It turns out that one of the packages used by the application was shipped with a trojan horse. This malware, which was obfuscated with this utility, is the culprit:
“I wouldn't worry about it, though. It's innocuous. Says so right in the name.” – SEC510 Student
This is custom malware created for SANS to demonstrate the dangers of trusting the third-party packages. Ironically, it helped illustrate how many supply chains are taking measures to ensure that malware is not distributed through their services. We originally published the malicious package (check-readstream) to npmjs.com late May of 2020. On July 20th, 2020, the npm security team contacted us informing us that this package was reported. Despite it being for educational purposes, it broke their Terms of Service, and they believed that some users might install it by accident, so they removed it. Fortunately, they allowed us to republish the package under a longer name that no reasonable developer would trust (check-readstream-really-awesome-totally-secure-i-think).
An example of how the supply chains are fighting back.
Despite npm’s commendable actions, the malware was available in their registry for roughly two months. It is impossible for them to catch all malware in real-time. We cannot rely on their preventative solutions alone.
Fortunately, Azure provides many guardrails that would have thwarted this attack. For starters, there is no reason why we need to allow outbound traffic to be sent over any possible port. Our application only needs to communicate with HTTP (port 80), HTTPS (port 443), and MySQL (port 3306) servers. We can lock down all other ports by adding the following outbound port rules to our NSG:
The lowest priority rule that the traffic matches determines how it will be handled. The rules with priorities 65000-65500 are provided by Azure and cannot be deleted. These rules effectively block all outbound traffic to the internet unless it is going over TCP port 80, 443, or 3306.
This only restricts our network search space. Smarter malware would look for a communication channel that the application is using and transfer data in a similar fashion. We cannot block TCP ports 80, 443, or 3306 while keeping our application functional. So, we will need to heavily monitor these ports and block everything else.
ENHANCED PROTECTION WITH PRIVATE ENDPOINTS
Though the transfer of MAC addresses illustrate that the malware was able to transfer data that is private to the Azure VM, this data is not particularly sensitive. An attacker would derive little value from obtaining them. A smarter attack would generate and transfer credentials for the VM’s managed identity by querying the Instance Metadata Service. The following Node.js code will send and receive credentials for the Azure Key Vault in a similar fashion to the malware we previously explored:
The client on the left-hand side calls the IMDS over a link-local address to generate credentials for the Azure Key Vault. Due to limits with DNS queries, the Base32-encoded access token is split into chunks of no more than 63 characters. Each chunk is sent as a DNS query. The server will compile all of the queries until it receives one that is not 63 characters long, indicating the end of the message. It is then decoded and logged.
In addition to monitoring for our remaining open ports, we can limit the usefulness of these exfiltrated credentials by using an Azure Private Link Network Service Endpoint. This allows an Azure Platform as a Service (PaaS), like the Key Vault, to be accessed directly from the VNet instead of via the internet. We can then setup a Network Access Control List (ACL) for the Key Vault to only accept requests from this private endpoint. This will render Key Vault requests with valid credentials useless unless they originated from within the VNet. All of this can be accomplished with the following Terraform code:
Create a private endpoint for the Vault and prevent network access from the public endpoint. Traffic is unconditionally accepted from the private endpoint, so we do not need to add any explicit exceptions to our default-deny rule.
Applying this change alone will prevent our application from accessing the Key Vault as well. It calls vault.azure.net, which currently resolves to the public endpoint’s IP address. One solution is to make a code change to use the private endpoint. Alternatively, we can create a private DNS Zone to override the DNS record of vault.azure.net to resolve to our private endpoint when queries are made within the VNet:
With this change, all DNS queries made within the VNet for vault.azure.net will resolve to the Key Vault's private endpoint.
When we can block attacks, we should. When we cannot, our next best recourse is to identify them and limit the damage they produce. Each cloud provider has powerful tools to collect and visualize potential indicators of compromise. However, these are useless if we do not use them. With the right tools, training, and personnel, security engineering and operations can thrive in the cloud.
This writeup is based on the labs for SEC510: Multicloud Security Assessment and Defense. The class overs how to detect and prevent the this attack in Azure, Amazon Web Services (AWS), and the Google Cloud Platform (GCP), as well as a myriad of other topics in each cloud. It also comes with Terraform code that you can pull into your organization to automatically apply the hardening steps mentioned above and more.
The massive web of a dependency graph for the SEC510 lab environment generated using Blast Radius.
About the Author: Brandon Evans is the lead author of SEC510: Multicloud Security Assessment and Defense and an instructor for SEC540: Cloud Security and DevOps Automation. His full-time role is as a Senior Application Security Engineer at Asurion, where he provides security services for thousands of his coworkers in product development across several global sites responsible for hundreds of web applications. This includes performing secure code reviews, conducting penetration tests, developing secure coding patterns, and evangelizing the importance of creating secure products. Read his full bio here.