Migrating on-premise infrastructure to the cloud is a top priority for many organizations today. The cloud offers a number of benefits, including scalability, flexibility and cost savings. However, many organizations are unprepared for the security challenges that come with cloud adoption. In this blog post, we will discuss some of the cloud security challenges that organizations need to overcome.
Christina Harker, PhD
Migrating on-premise infrastructure to the cloud is a top priority for many organizations today. The cloud offers a number of benefits, including scalability, flexibility and cost savings. However, many organizations are unprepared for the security challenges that come with cloud adoption. Operating in unfamiliar architectural territory means it's hard to "know what isn't known".
In this blog post, we will discuss some of the cloud security challenges that organizations need to overcome, as well as the various implementation considerations they must make when deploying their own software and system infrastructure.
One of the cloud security challenges that organizations have to confront is the network topology; it is very unlike what infrastructure teams may be used to dealing with in on-premise architecture. Strictly defined network borders are not the norm, and incorrect configuration can leave critical resources and data exposed to compromise.
In a traditional, on-premise topology, engineers can reasonably expect the network to look something like the following:
Network segments are clearly defined. Most new resources are provisioned within the local network (LAN), which has all ingress and egress traffic strictly controlled and monitored via router and firewall. By default, any new resource will automatically be subject to this level of traffic control; there is no expectation that a server provisioned within the LAN will suddenly have all of it's ports open and reachable via public internet. If engineers need to provision resources in a less restrictive boundary zone, they can make use of the DMZ network segment without putting LAN resources at risk.
Now, consider the network topology of a typical cloud platform, like Amazon Web Services. A simple implementation might look like this:
If the default VPC is being used (which is more typical than it might seem), then the default subnets are reachable by the public internet immediately and by default. Security groups, which act as basic firewall rules, must be configured per-instance. Deploying a new resource requires defining which subnet and region it will be placed in, and what, if any security group rules will be applied. Organizations with mature cloud deployments and a strong operational culture will typically handle this via infrastructure-as-code and automated configuration policy, but what about organizations making their first foray into the cloud? They usually lack the organizational knowledge, staffing, and technical infrastructure to properly support cloud deployments at any kind of meaningful scale.
In addition, the public-facing nature of most cloud deployments mean Denial-of-service (DOS) attacks are a much bigger threat. A significant attack can leave a web-application unusable and unreachable by regular users, potentially costing the business significant revenue. Teams are forced to manually integrate some kind of protection or firewall, or ideally utilize a managed offering from the hosting platform.
Even in engineering teams with ample cloud experience, there is a constant balancing act taking place; addressing additional network complexity in a secure way with configuration guardrails will still providing connectivity and usability that allows developers and administrators to remain productive.
Supply-chain security has seen renewed interest in software development in light of recent high-profile attacks. External software dependencies or operating system packages may contain security vulnerabilities that originated outside of an organization, but can still leave them vulnerable. However, in the ever-growing landscape of cloud usage, supply-chain risk now extends to entire 3rd-party platforms, as opposed to just the software.
As organizations migrate to the cloud, or grow their existing cloud footprint, they come to rely more and more on disparate Software-as-a-Service (SaaS) and Platform-as-a-Service (PaaS) tools and services. A typical end-to-end application environment will be using different vendors for cloud computing, event management, monitoring, version control, data storage and analytics, and ticketing. Some of these services may be used for critical data storage or processing. Each offer a variety of integrations with other services, often depending on secret tokens or sensitive credentials to enable access to the other APIs.
In the above hypothetical environment, what happens when one of these services is itself the victim of an attack or compromise? They may have utilized ineffective security controls in their environment. They themselves may have been the victim of a supply-chain attack through one of their own application dependencies. Unfortunately, the blast radius for such an attack multiplies quickly in a cloud environment. Imagine a data analytics platform is compromised. These tools typically require access to sensitive data stores to provide useful functionality; now every customer of that platform could potentially have their own data at risk if they provided their own access credentials. Organizations are now forced to spend long cycles negotiating NDA agreements and reviewing security audit data of every vendor they adopt, a process that can be burdensome for small teams and a productivity drag as it scales. Some teams may find all-in-one PaaS attractive, as they can host the entirety of the application stack through one platform.
Another cloud security challenge that can quickly explode with complexity: data management. Data access controls and encryption are already non-trivial to address with critical data, and cloud environments now add the additional dimension of considering network/transit access as well.
In any particular cloud environment, engineering teams will need to consider the security of multiple elements:
Access and authorization to encryption keys.
Access and authorization to administration and configuration of the parent platform.
Access and authorization to the data itself.
Security of ingress/egress network traffic.
While these may initially read as simple problems to solve, in a cloud environment they can be anything but. In the previous architecture diagram for the on-premise network, the same conditions apply for data, for example a PostgreSQL server. Administrators know that when they provision a database within the boundaries of the LAN, there is already configuration in place that limits who could potentially access the system. Long-lived credentials don't present a significant security risk. In contrast, a typically cloud setup might look like:
Addressing access and management securely and correctly now requires accounting for public network transit, as well as a zero-trust operating environment. Long-lived credentials now represent a serious target for attackers, requiring automation to either rotate them, or handle ephemeral session-based authentication. Granular policies must be maintained for each identity: administrator, user, and service account. Additionally, this diagram doesn't even represent management of encryption keys!
Granular cloud security policy is not an easy concept to implement; it often requires longer cycles of trial-and-error to find the best compromise between security and usability. It's easy to imagine an organization migrating to the cloud, and with the pressure of stakeholder deadlines looming, engineers taking dangerous shortcuts to meet them. It's much less complex, for instance, to just issue one set of administrator credentials and deploy them everywhere. In doing so, the security posture of the entire system is compromised.
Organizations migrating from their on-premise will need to consider how to bridge access and authorization mechanisms for their internal identity management systems. One of the major cloud security challenges is effectively and secure translating access and authorization from on-premise systems into a cloud architecture.
Managing access and authorization can be time-consuming and error-prone with just a few identities to manage. Users and service accounts should be issued permissions based on Principle-of-least-privilege, which often requires creation and ongoing management of granular policy definitions not only for each identity, but the systems and APIs they need to access as well. Managing this across thousands of users and potentially hundreds of unique systems can be a difficult ask for administrators used to on-premise architecture.
Most organizations, and especially larger enterprises typically depend on some kind of directory infrastructure to manage identities. Microsoft Active Directory is common in environments with Windows OS deployments. When migrating to the cloud, these enterprises will want to avoid cumbersome, manual management of identities. Management at scale will require bridging between the on-premise directory system and the cloud identity and access management service. Correctly implementing this requires careful coordination between infrastructure ops and corporate IT teams. Some organizations may have never required these teams to collaborate before, either or both of them probably lack significant cloud implementation knowledge. Deadlines are often a fertile breeding ground for tech debt and security shortcuts, and IAM systems can quickly spiral out of control.
One cloud security challenge gives rise to another: access sprawl will inevitably lead to infrastructure sprawl. Infrastructure usage that is not monitored or controlled will inevitably continue to grow the attack surface. Keeping track of the cloud resources being used, who has access to them, and for what purpose can be a challenge for organizations transitioning from on-premise.
Tools such as Infrastructure-as-Code (IaC) are essential for any organization that wishes to effectively manage modern cloud infrastructure at scale. IaC, if implemented correctly, can potentially eliminate manual configuration errors, and significantly reduce cloud sprawl due when used in combination with automated configuration management and some form of policy engine. Having infrastructure defined in code and automatically deployed also drastically reduces the amount of time required for cloud security audit and assessment operations
Implementing IaC without a team that is well-versed in DevOps philosophies can be a challenging endeavor. Not only does it require an understanding of cloud security best practices, cloud architecture, and cloud infrastructure components, but also the ability to write code to define the cloud resources and automate their deployment. Enterprises migrating from on-premise may lack such staff, and hiring and scaling up these teams could potentially add months to the migration plan.
Cloud adoption brings a number of security challenges that must be addressed in order to ensure safety and integrity of critical data. Organizations often aren't prepared and lack the institutional knowledge and culture to properly address the Increased complexity of an unfamiliar network topology. Technical challenges abound, everything is an API and existing staff may not have the training to properly build and secure the environment.