Netflix's Identity First - AWS Cloud Security Evolution!

How Netflix Scaled AWS Security by breaking Traditional Account Boundaries & starting with Identity instead of Infrastructure. Netflix's Bold Take on re-thinking AWS Multi-Account Security at Scale beyond Least Privilege using identity isolation allowed them to be developer friendly at their scale.

Hello from the Cloud-verse!

This week’s Cloud Security Newsletter Topic is Netflix's Identity First Architecture Evolution! (continue reading) 

Netflix's Identity First Architecture Evolution (Image Credit - Dalle)

Incase, this is your 1st Cloud Security Newsletter! You are in good company!
You are reading this issue along with your friends and colleagues from companies like Netflix, Citi, JP Morgan, Linkedin, Reddit, Github, Gitlab, CapitalOne, Robinhood, HSBC, British Airways, Airbnb, Block, Booking Inc & more who subscribe to this newsletter, who like you want to learn what’s new with Cloud Security each week from their industry peers like many others who listen to Cloud Security Podcast & AI CyberSecurity Podcast every week.

Welcome to this week's edition of the Cloud Security Newsletter!

In this week's newsletter, we dive deep into an innovative approach to AWS multi-account security architecture while being developer friendly, featuring insights from Netflix's Cloud Security Engineering team. Their multi-year journey from a monolithic AWS account structure to a sophisticated identity-first architecture offers valuable lessons for organizations of all sizes struggling with AWS security at scale.

  • Patrick Sanders - Cloud Security Engineer at Netflix

  • Joseph Kjar - Cloud Security Engineer at Netflix

Definitions and Core Concepts 📚

Before we dive into the insights, let's clarify some key terms that will be referenced throughout:

  • IMDS (Instance Metadata Service) - A service in AWS that applications running on EC2 instances use to securely access instance metadata and credentials.

  • Right-sized Access - An alternative approach to “least privilege” that focuses on isolation boundaries rather than granular permissions, providing broader access within well-defined security boundaries.

  • AWS RAM (Resource Access Manager) - A service that lets you share AWS resources across AWS accounts, making it easier to centrally manage resources like VPC subnets.

  • Service Control Policies (SCPs) - Organization-level policies that help you ensure AWS account compliance with your security guidelines.

This week's Issue is sponsored by Vanta

Live event: AI & Security Maturity with John Hammond & Vanta

Join John Hammond—cybersecurity researcher, practitioner, and content creator with nearly two million YouTube subscribers—and Matt Cooper, Vanta’s Director of GRC, for a fireside chat on AI, security maturity, and the top security risks in 2025.

They’ll explore the evolving landscape of cyber risks and share insights drawn from their work with organizations at every stage of security maturity.

Tune in on Feb 18th at 12pm PT to get:

A deep dive into 2025’s top cyber risks, including the impact of AI

Actionable insights to refine your security priorities

Strategies tailored to your organization’s security maturity level

A live Q&A at the end

Don’t miss this chance to future-proof your approach to cybersecurity with advice from two leading voices in the industry

💡Our Insights from these Practitioners 🔍

1. The Case Against Single-Account Architecture

Netflix's experience highlights why starting with multiple AWS accounts from day one is crucial. As Patrick Sanders emphasizes:

"there's no reason an organization should start with a single AWS account unless that single account is your organization root and you immediately start making more accounts to actually do things in."

Key reasons include:

  • Impossible or difficult resource migration (like S3 buckets) 

  • Inability to apply Service Control Policies to the org root at scale

  • Resource contention and noisy neighbor issues

  • Rate limiting and service quota challenges

2. Identity-First Migration Strategy

Instead of attempting full workload migrations, Netflix developed an innovative approach focusing on identity separation. Joseph Kjar explains their thinking:

"The IAM pieces are decentralized... each application, when you spin it up, we would give it its own AWS account one per environment test and prod and when your app starts the proxy will use that OIDC workflow to fetch a role from that remote account and serve the credentials to your application."

Their approach offers several benefits:

  • Zero application code changes required by the developers

  • Reduced blast radius for compromised credentials as it is local that AWS account only

  • Improved developer experience

  • Simplified access management

3. Netflix approach - Implement an IMDS Proxy based Solution

The Netflix team shared several practical insights for implementing this approach:

IMDS Proxy Solution: Patrick Sanders describes their solution:

"We decided to develop a new IMDS proxy that will support our container platform and EC2. And it uses OIDC with IAM so it does STS assume role with web identity to get credentials and deliver them to the application."

Migration Prioritization with this approach: Joseph Kjar provides a framework for prioritizing migrations:

"It's helpful to estimate a couple of things, right? Migration complexity, the security risk of the application in question and the operational risk of the application. So if you have an application that's low complexity, but it's high security risk, and it's a simple migration you found a golden target."

4. Developer Experience & Security Balance

A key insight from Netflix's approach is how they've managed to improve security while enhancing developer experience. As Patrick Sanders notes:

"Our philosophy of security at Netflix is guardrails over gates. We don't want to be the team blocking somebody from getting their work done, because that makes a lot of work for us."

5. Evolution of Their Approach from (2023) to today (2025)

In the two years since first introducing their identity-first architecture, the Netflix team has made significant progress and learned valuable lessons. Joseph Kjar reflects on their journey:

"The main thing that's different now is that we've gone through and we've built all of that migration tooling and we've really ironed out all of the core technical foundations that we need to move real life workloads."

Key learnings include:

Resource Discovery Challenges: The team found that while identifying AWS service usage was straightforward using IAM Last Accessed Info API, discovering specific resource dependencies was more complex. As Joseph explains:

"It's easy to discover which AWS services an application uses, it's not so easy to discover all of the resources that it uses... you can see from the last accessed info that, oh, this application is using S3, but which bucket?"

They solved this through:

  • Custom AWS SDK instrumentation for tracking specific resource access

  • Integration with existing Netflix telemetry systems

  • Automated resource mapping tools

6. Scaling Migrations Successfully

The team has evolved from manual migrations to a highly automated process. Joseph shares their current capabilities:

"We can curate a batch and say, we want to migrate this set of applications. Hit the button. And we go get a drink and a hundred plus real life migrations for workloads happen self sufficiently."

Key recommendations for others:

  • Start with the migration feedback loop earlier

  • Don't aim for perfection in initial tooling

  • Focus on one thing at a time rather than splitting attention

  • Classify applications based on migration complexity and security risk

7. Considerations for Modern Architectures

The team also provided insights for organizations starting fresh or using modern architectures:

For Container-Native Organizations: Patrick notes:

"A lot of companies are starting out in a container world and they don't have this baggage of EC2 that we have. And, if you're using EKS, this becomes a lot easier because you can use IAM roles for service accounts."

For Greenfield Environments: Joseph emphasizes the importance of leveraging newer AWS capabilities:

"If you're starting fresh, there are so many new tools in the AWS toolbox that we just didn't have that Netflix didn't have available over the years. And it's too hard to retrofit."

8. Real-World Impact and Limitations

The team is candid about what their approach does and doesn't solve. Patrick Sanders notes:

"This doesn't solve all the problems, right? It doesn't solve the network perimeter problems. It doesn't solve, lateral movement on the network. It doesn't solve, like multi tenancy kind of things in a container platform perspective."

However, the benefits have been significant:

  • Reduced operational overhead for security teams

  • Improved developer autonomy

  • Better isolation between workloads

  • Easier risk management through account boundaries

These insights from their three-year journey provide valuable perspective on how to implement and scale such an architecture, along with realistic expectations about what it can achieve. Their experience shows both the evolution of their approach and how different organizations might adapt these principles to their own environments.

Question for you?

Would you use an IMDS Proxy for authenticating Application in AWS - or is there a better way?

Next week, we'll explore another critical aspect of cloud security. Stay tuned!

We would love to hear from you📢 for a feature or topic request or if you would like to sponsor an edition of Cloud Security Newsletter.

Thank you for continuing to subscribe and Welcome to the new members in tis newsletter community💙

Peace!

Was this forwarded to you? You can Sign up here, to join our growing readership.

Want to sponsor the next newsletter edition! Lets make it happen

Have you joined our FREE Monthly Cloud Security Bootcamp yet?

checkout our sister podcast AI Cybersecurity Podcast