Building a cloud security roadmap: Tools by layer and when you need them (pt.1)
Understanding the key security tools at each layer of the cloud and what matters most for your organization’s maturity
TL;DR
This post covers:
Quick rundown of each cloud layer, from control plane to application
What kinds of security tools fit where, when you’ll need them, and vendor examples
A practical roadmap to help you prioritize tools as your org matures (pt. 2)
Cloud environments present adversaries with vast attack surfaces through their dynamic, multi-layered architectures. Security teams face continuous defensive challenges as threats target every level—from management interfaces and container orchestration to runtime workloads and hosted applications. Attack vectors span numerous categories: exposed APIs, configuration errors, malicious code deployments, and compromised privileged accounts. The resulting breach indicators prove equally diverse and difficult to detect. Securing cloud assets demands comprehensive incident detection and response capabilities addressing this entire threat landscape.
This blog series will explore cloud security in depth. We’ll discuss the layers to protect and define the tools available for each. In pt. 2, we’ll offer a roadmap to reduce cloud security risks.
Layers of cloud infrastructure
Cloud environments encompass an ever-changing mix of resources extending all the way from the provider’s data center to your own end-user devices. That complexity makes the cloud especially hard to secure.
Effective protection requires comprehensive knowledge of each infrastructure tier's operations and their associated threat vectors.
Think of your cloud infrastructure as a restaurant kitchen. Each layer plays a different role in preparing the meals—or business applications—you provide to customers, your end users.
Cloud infrastructure layer (aka the control plane)
This is where services like AWS, Azure, Google Cloud, and Oracle Cloud Infrastructure (OCI) handle tasks such as resource provisioning, configuration management, and permissions setting for users.
You can think of the control plane as the executive chef who oversees the entire kitchen and its operations. When planning for each week’s business, the control plane ensures the kitchen has all the needed cooking tools, ingredients, and staff (or compute, storage, and networking). With the kitchen layout set (or, the environment configured) for optimal quality, efficiency, and safety, the control plane oversees resource allocation to keep things running efficiently. In high-demand periods—like the dinner rush (or, a spike in end-user activity)—the control plane makes high-level decisions about what to prioritize and how to adjust.
Common threats at this level include:
Compromised API keys: Attackers use compromised API keys to manipulate cloud resources by creating or deleting virtual machines (VMs), for example.
Misconfigured access controls: Overly permissive or poorly configured identity and access management (IAM) policies give attackers access to critical resources.
Unauthorized configuration changes: Adversaries modify control plane settings, such as permissions or network configurations, to enable lateral movement or persistent access.
To stop these threats, organizations monitor logs, traffic, and configuration changes for unusual activity. Fortunately, many of these threats can trigger real-time alerts. You can learn more about specific types of cloud alerts in a previous blog:
Orchestration layer
The orchestration layer is where SecOps teams manage their containerized workloads. (Note that these workloads actually run in the platform layer, which we’ll discuss next.) Orchestration platforms like Kubernetes—and related services such as Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), and Google Kubernetes Engine (GKE)—operate at this layer to deploy, scale, and move cloud-native applications.
The orchestration layer is like a sous chef who coordinates food prep for each meal service. It assigns tasks to chefs, makes sure ingredients are prepped at the right time, and manages the order flow from the dining room to the kitchen. If there’s a dinner special, the sous can reassign line cook tasks—or scale and move cloud-native applications across the environment to support changing requirements.
The role of Kubernetes in orchestrating communications and moving resources across the network makes it an attractive target for cybercriminals as part of a broader attack. Potential security threats include:
Container misconfigurations: Attackers can take advantage of weak Kubernetes configurations—such as overly permissive network policies—to gain unauthorized access or take over the entire cluster.
Vulnerable container images: The deployment of container images with unpatched vulnerabilities allows attackers to bypass security controls, gain elevated permissions, or access the host system.
Exposed secrets: Poorly managed security for Kubernetes secrets allows attackers to access sensitive data like passwords, tokens, API keys, and certificates.
As with the cloud infrastructure layer, DevOps teams must carefully analyze the logs for these platforms, monitor their traffic, and track configuration changes related to containers and VMs. Learn more about the nuances and challenges of Kubernetes security in a previous blog:
Platform layer
The platform layer is where cloud workloads actually run, and it sits between the cloud infrastructure layer and the application layer. The platform layer hosts the virtual machines, containers, and serverless functions that execute the business logic powering end-user services. The platform layer is often provided as a platform-as-a-service (PaaS), helping developers focus on building applications without worrying about the underlying resources or setup.
In our kitchen analogy, the platform layer corresponds to the line cooks preparing dishes. Each cook has all the equipment they need to perform assigned tasks, such as knives, cookware, and appliances (or, frameworks and runtime environments). In the simplest terms, the platform layer gets you from a cold kitchen full of raw ingredients to a piping-hot meal ready to eat.
For attackers, the platform layer can provide access to both compute resources and adjacent workloads. This offers opportunities for a broad range of attack types, including:
Malware installation: Threat actors install malware on compute instances to establish command-and-control communication, execute malware, or exfiltrate data.
VM escape attacks: Attackers exploit hypervisor vulnerabilities to break out of a VM and access other tenants’ workloads.
Credential theft: Stolen secure shell (SSH) keys or passwords allow unauthorized access to compute resources.
The cloud alerts that help teams monitor their control plane for signs of threats can serve a similar role at the platform layer. Again, you can refer to our previous blog series on cloud alerts to learn more.
Application layer
The cloud-based software that end users rely on—such as web browsers, SaaS apps, and custom web apps—all reside at the application layer.
In a restaurant, these applications are the meals that emerge from the kitchen and land on patrons’ tables. These customers don’t need to know what happens in the kitchen—how its operations are managed, where the raw ingredients are stored, or what the line cooks do. They just consume the final product.
The application layer can be as useful for bad actors as it is for legitimate users. In addition to misconfigured access controls, which pose as much danger here as in the control plane, additional threats include:
Application-layer distributed denial-of-service (DDoS) attacks: Botnets launch a flood of bogus requests to overwhelm application resources and render services unavailable.
Other application-layer attacks: Tactics like malformed packets, buffer overflows, and unauthorized function calls can all allow attackers to manipulate or damage applications.
API exploits: Insecure APIs allow attackers to gain unauthorized access to application data or services.
SQL injection: Attackers inject malicious SQL queries into applications to retrieve or manipulate sensitive data.
Cross-site scripting (XSS): Malicious scripts injected into web pages execute in the browsers of users who view them.
Credential stuffing: Attackers use automated tools like botnets to try stolen user names and passwords across large numbers of websites and services.
Detecting and preventing threats like these requires a full arsenal of tools—from firewalls and web application firewalls (WAFs) to runtime application self-protection (RASP) and application detection and response (ADR).
From CNAPP to KSPM: demystifying acronyms at every layer
A whole alphabet soup of cloud security tools is on the market, from CNAPP to KSPM. The best way to understand them is to group them by the cloud infrastructure layer they help secure.
Infrastructure layer/control plane
The definitive security solution at this layer is an unwieldy construct called the cloud-native application protection platform (CNAPP). An “all-in-one solution:, the CNAPP incorporates an extensive roster of security tools to protect cloud-native applications throughout the software development lifecycle (SDLC), from development to production. We’ll explore each of these components in detail, but at a high level, a CNAPP:
Provides continuous security monitoring, alerting, and governance across clouds.
Gives SecOps teams deep visibility into cloud environments, including user behaviors, data flows, and resource interactions.
Streamlines security operations by automating tasks like vulnerability scanning and compliance checks.
Enforces consistent security policies across multi-cloud environments.
Accelerates incident response and threat mitigation.
CNAPP incorporates the following infrastructure-layer tools (CDR, CSPM, CIEM, DSPM), as well as platform-layer tools (CWPP), and potentially orchestration-layer tools (KSPM, KDR):
Cloud detection and response (CDR)
A CDR tool continuously monitors and analyzes cloud resources and activities to help teams detect and respond effectively to potential threats. Integrations with myriad tools enable the CDR to gather log and real-time data on network traffic and user activity across the cloud control plane and workloads. The tool can then share this with the organization’s security information and event management (SIEM). By correlating this data with threat intelligence on current indicators of compromise (IOCs) and indicators of attack (IOAs), the CDR can quickly identify threats, take automated actions to limit their impact, and help teams rapidly resolve the incident. A CDR’s forensic capabilities can facilitate investigations of an incident and reporting for compliance audits and risk assessments.
Example use case:
A retailer keeps sensitive customer data in an AWS storage bucket.
An attacker gains access to the environment using compromised credentials.
They invoke an S3 API operation—such as `PutObjectAcl’—from a known malicious IP address. This API call is commonly associated with tampering tactics aimed at modifying or destroying data within the AWS environment.
Monitoring AWS CloudTrail logs, the CDR detects the API call.
Because the API call deviates from the organization’s baseline behavior and is associated with known malicious activity, it triggers a high-severity alert.
The security team uses the CDR’s investigation workbench to trace the activity and learn that the account the attacker used had been compromised in a phishing attack.
Meanwhile, the CDR has automatically reverted the change to restore the bucket’s original permissions, and disabled the compromised user account.
Post-incident, some CDR tools will generate a full report and timeline on the event with recommended actions, such as implementing multifactor authentication (MFA) to prevent a recurrence.
Examples of CDR tools include Wiz Defend, Orca Security, Amazon GuardDuty, Lacework, Microsoft Defender, Palo Alto Networks Cortex Cloud, and Google Security Command Center.
Cloud security posture management (CSPM)
CSPM focuses on proactive threat prevention. By monitoring, detecting, and remediating cloud security risks and compliance violations such as misconfigurations, CSPM helps organizations address these issues before they can be exploited.
Example use case:
In the course of a routine deployment, a manufacturer’s DevOps team accidentally configures a storage bucket to allow public access.
This exposes its intellectual property to unauthorized users.
The company’s CSPM automatically detects this misconfiguration, generates a fully detailed alert, and can—either automatically or with approval—automatically update the bucket’s access policy to restrict public access.
A log of the incident helps the organization demonstrate compliance during future audits.
Examples of CPSMs include Wiz, Prisma Cloud, Aqua Security, AWS Config, and CrowdStrike Falcon Cloud Security.
Cloud infrastructure entitlement management (CIEM)
CIEM solutions manage and control access rights, permissions, and privileges across cloud environments. By automating the management of these entitlements in real time, a CIEM helps the organization adjust permissions as cloud environments evolve. Continuous monitoring detects risky overpermissions or misconfigurations in real time so teams can correct them to maintain least privilege.
Example use case:
A financial services organization’s multi-cloud environment has grown to encompass a large number of human and nonhuman identities.
Admins are generally prompt in granting the permissions these identities require to perform their functions. But they’re not always as fast to revoke permissions that are no longer needed.
As a result, unnecessary permissions have accumulated as roles changed, projects ended, and employees left the organization. This situation is known as privilege creep.
A scan by the CIEM identifies these overly permissive roles and unused or dormant entitlements, and automatically trims their privileges accordingly.
Examples of CIEMs include Microsoft Entra Permissions Management, Wiz - CIEM, and Orca - CIEM.
Data security posture management (DSPM)
DSPM plays a similar and complementary role to CSPM. While CSPM focuses on securing cloud infrastructure, DSPM focuses on securing sensitive data within the cloud. Like CSPM, DSPM does this by continually monitoring, detecting, and remediating data security risks across the environment.
In particular, DSPM:
Monitors and evaluates:
The sensitivity of data.
Its level of exposure.
How it flows through the environment to identify and proactively mitigate data breach risks.
Automates data discovery and classification to provide visibility into where sensitive data resides.
Generates insights into data access patterns, usage, and potential policy violations to help teams refine and strengthen their data protection tactics.
Example use case:
A healthcare insurer working with regulated data—including protected health information (PHI) and personally identifiable information (PII)—needs clear visibility into where it’s stored, how it’s accessed, and how it’s secured. This is a real challenge because its multi-cloud environment encompasses databases, data lakes, and SaaS applications.
The company uses a DSPM tool to:
Scan these resources to discover all stored data.
Classify the data based on its regulatory requirements (e.g., see if it falls under HIPAA or GDPR).
Assess the security posture of each data store.
Provide recommendations to correct potential compliance violations, or perform these actions automatically.
Generate detailed compliance reports to guide remediation and facilitate audits.
Examples of DSPM tools include Varonis, BigID, and Sentra.
Privileged access management (PAM)
Privileged access management (PAM) focuses on securing, controlling, and monitoring access to accounts with elevated permissions. These are the accounts that administrators, developers, vendors, and automated processes use to perform sensitive tasks like modifying critical systems, accessing sensitive data, or managing other user accounts. And they can be misused by bad actors for the same purposes.
PAM helps organizations enforce least privilege by maintaining strict policies around who can access these accounts, and under what conditions. Real-time monitoring can detect suspicious behavior during privileged sessions and capture audit trails for compliance.
Beyond its role in the control plane, PAM can also support use cases in other layers of cloud infrastructure, such as:
Orchestration layer: Securing access to tools and APIs that automate cloud workload deployment and scaling.
Platform layer: Protecting system-level access to VMs or containers running workloads.
Application layer: Managing access to applications with elevated privileges, such as those working with sensitive data or involved in critical operations.
Example use case:
In the past, a media company’s DevOps team embedded API keys and database credentials directly in source code or configuration files.
This is a highly risky practice—attackers have many ways to extract hard-coded credentials, and can then use them to gain legitimate access to the corresponding system.
Seeking to strengthen its security practices, the team now uses a PAM tool to store these credentials in a centralized, encrypted vault instead.
The credentials are retrieved only at runtime, and are rotated regularly to further reduce risk.
Examples of PAM tools include CyberArk, BeyondTrust, and Delinea.
Intrusion detection systems (IDS) and intrusion prevention systems (IPS)
An IDS monitors and analyses network traffic for suspicious activities and potential security threats, and alerts security teams when they’re detected. IDS tools use a combination of techniques based in signature, anomaly, and behavioral analysis to detect both known and unknown attacks.
IPS provides the same baseline functionality as IDS, plus it can neutralize detected threats by blocking malicious traffic and terminating dangerous connections.
Example use case:
In a brute-force login attack, a threat actor fires a rapid stream of username and password guesses at a bank’s website.
Beyond putting customer accounts at risk, this also consumes server resources, degrading service for other users.
The bank’s IPS quickly recognizes this behavior based on signals such as multiple failed login attempts from the same IP address within a short time frame.
The tool blocks the IP address, alerts the security team, and updates its threat database to help it detect similar attacks in the future.
Given the capabilities of cloud-native firewalls, this may be less relevant in the cloud.
Examples of IDS/IPS tools include Check Point, Darktrace, and ExtraHop.
Audit logs
Although separate from the CNAPP toolset, audit logs are part of the raw data used by the CNAPP to write rules and generate alerts. AWS CloudTrail logs, Azure Monitor activity logs, Google Cloud audit logs, and Oracle Cloud Infrastructure (OCI) audit logs play a central role in infrastructure-level cloud security. They capture and document everything that happens within their respective cloud environments, including:
User and system actions.
Authentication and login attempts.
Data access and modification.
System configuration changes.
This data helps both SecOps teams and automated security tools detect security threats, unauthorized access attempts, and suspicious behavior patterns. Detailed timelines of events before, during, and after security incidents support rapid response and post-incident forensic investigations. During normal operations, audit logs aid accountability by tracking user actions and administrative changes so that these activities can be attributed to specific individuals or processes.
Example use case:
Audit logs can provide visibility into all phases of MITRE ATT&CK tactics—for example, lateral movement.
In one scenario, an attacker has gained initial access to a hospitality company’s reservation system using compromised credentials.
They’re now attempting to exploit the corresponding permissions to access additional resources and escalate privileges across the cloud environment.
Audit logs capture this behavior and detect unusual patterns, such as:
Authenticated users accessing resources they don’t typically use.
Multiple failed access attempts to sensitive resources.
Attempted privilege escalation by an unauthorized user.
API calls to list out permissions or resources available to the account.
When this data flows into the company’s integrated security monitoring tool, the tool recognizes that these actions align with MITRE ATT&CK techniques for lateral movement.
It triggers an alert, initiates an automated response, and provides further guidance to the security team.
Following the incident, the audit logs help the company’s analysts understand exactly what went wrong and how to prevent similar incidents in the future.
Orchestration layer
The orchestration layer gains partial protection from tools focusing primarily on the control plane. For example, a CSPM can identify misconfigurations in the control plane that can expose the orchestration layer to threats, such as open Kubernetes API endpoints or overly permissive IAM roles. Nearly all CNAPPs offer runtime protection, workload scanning, and identity management for containerized environments - some may not cover the orchestration layer. This is when you might need tools designed specifically for the security needs of the orchestration layer itself.
Kubernetes detection and response (KDR)
KDR plays a similar role to CDR, but for Kubernetes environments. This includes:
Continuous monitoring and threat detection across Kubernetes clusters, nodes, pods, and containers.
Visibility into Kubernetes services, APIs, and workloads.
Behavioral analysis to detect anomalous activities and potential security threats.
Automated actions to contain and mitigate threats, such as isolating compromised pods.
Actionable alerts and integration with other security tools and processes to speed incident response.
Example use case:
An attacker gains access to an automaker’s Kubernetes cluster by exploiting a misconfigured API server.
They attempt to create unauthorized pods as part of a cryptomining scheme.
The company’s KDR detects the creation of the pod in a namespace that doesn’t usually host that kind of activity.
It also determines that the request came from an unusual IP address or user account.
Based on these red flags, the KDR automatically terminates the malicious pod, revokes the permissions of the compromised account, and isolates the affected node.
Expel provides KDR for Amazon Elastic Kubernetes Service (EKS), Azure Kubernetes Service (AKS), and Google Kubernetes Engine (GKE).
Kubernetes security posture management (KSPM)
Again, KSPM is similar to CSPM, but for Kubernetes environments, including:
Continuous monitoring and assessment of security configurations across Kubernetes.
Real-time visibility into Kubernetes clusters to identify misconfigurations, vulnerabilities, and policy violations.
Remediation guidance and automated fixes for identified security issues.
Example use case:
An energy company’s developer has accidentally assigned an overly privileged role to a service account, giving it admin-level permissions across a Kubernetes cluster.
If an attacker accesses the account, they could use it to manipulate the entire cluster and its resources.
The company’s KSPM quickly detects this misconfiguration, recognizing that a service account with admin-level permissions is highly risky and unusual.
The tool verifies that the access is unnecessary for the account’s intended purpose.
It then generates an alert with full details on the affected namespace, associated risks, and suggested steps for remediation.
This may be offered as part of a CNAPP platform, for example: Aqua Security, AccuKnox, or Armo.
Or as a stand-alone offering, for example: Tigera.
Platform layer
Attackers target the platform layer to access both the workloads it runs and the orchestration-layer services it connects with. Security teams need tools that can help them close vulnerabilities before they can be exploited—and detect any threats that do slip through.
Read (pt. 2) for tools, vendor examples, and a practical roadmap.
Special thanks to
for reviewing and providing comments!