- Cloud Security Newsletter
- Posts
- 🚨 Gemini Prompt Injection + Copilot Reprompt: Why LLMs Can’t Tell Instructions from Data
🚨 Gemini Prompt Injection + Copilot Reprompt: Why LLMs Can’t Tell Instructions from Data
This week's newsletter examines critical prompt injection vulnerabilities across Microsoft Copilot, Google Gemini, and GitHub Copilot, alongside AWS CodeBuild's supply-chain risks. Learn from Ramp's Principal Security Engineer Antoinette Stevens about building engineering-led detection programs that scale with AI while maintaining human oversight, managing false positives, and balancing build-versus-buy decisions in 2026's threat landscape.
Hello from the Cloud-verse!
This week’s Cloud Security Newsletter topic: Engineering-Led Detection Programs in the AI Era (continue reading)
Incase, this is your 1st Cloud Security Newsletter! You are in good company!
You are reading this issue along with your friends and colleagues from companies like Netflix, Citi, JP Morgan, Linkedin, Reddit, Github, Gitlab, CapitalOne, Robinhood, HSBC, British Airways, Airbnb, Block, Booking Inc & more who subscribe to this newsletter, who like you want to learn what’s new with Cloud Security each week from their industry peers like many others who listen to Cloud Security Podcast & AI Security Podcast every week.
Welcome to this week’s Cloud Security Newsletter
As enterprises scale AI across security and engineering workflows, a hard truth is emerging: LLMs fundamentally cannot distinguish instructions from data.
This week’s Cloud Security Newsletter connects the dots between prompt injection vulnerabilities across Google Gemini, Microsoft Copilot, and GitHub Copilot, CI/CD trust boundary failures in AWS CodeBuild, and what detection programs must look like in an AI-first threat landscape.
🎙️ This week’s practitioner deep dive features Antoinette Stevens, Principal Security Engineer at Ramp, sharing how engineering-led detection programs scale with AI without outsourcing judgment to models.[Listen to the episode]
đź“° TL;DR for Busy Readers
Prompt injection is now OWASP LLM01 — models cannot reliably separate instructions from data
CI/CD pipelines are auth surfaces: AWS CodeBuild regex flaw shows how supply-chain attacks scale
Semantic attacks beat string-based defenses (Gemini + Calendar invites)
Microsoft Copilot “Reprompt” enabled silent, persistent data exfiltration
Engineering-led detection treats alerts as production code, not rules
AI SOC agents help at L1 — but hallucinate without human validation
đź“° THIS WEEK'S TOP 5 SECURITY HEADLINES
Each story includes why it matters and what to do next — no vendor fluff.
1. "CodeBreach" Vulnerability in AWS CodeBuild Exposed Supply-Chain Risk
Wiz Research disclosed a critical vulnerability pattern in AWS CodeBuild's GitHub integration where improperly anchored regex checks could enable attackers to impersonate authorized maintainers and trigger privileged build workflows. The flaw, dubbed "CodeBreach," demonstrates how authorization weaknesses in CI/CD pipelines can escalate into software supply-chain compromise events. AWS remediated the issue within approximately 48 hours and confirmed no customer environments were impacted.
Why This Matters: This vulnerability exemplifies a classic CI/CD trust boundary failure where seemingly minor authorization logic can become a blast radius multiplier. For cloud security teams, every build trigger represents a production authentication surface—particularly those that can mint credentials, publish artifacts, or merge to protected repositories. As Antoinette Stevens noted in our conversation, detection engineering must extend beyond traditional security boundaries to include developer workflows and automation systems.
Recommended Actions: Audit all CodeBuild and GitHub integration patterns for regex anchoring and identity mapping issues. Implement manual approval gates for privileged PR-comment workflows. Restrict build roles to least-privilege, short-lived credentials. Deploy detections for anomalous trigger identities and unusual PR-comment patterns.
Source: IT Pro
2. Prompt Injection Officially Named Top AI Threat - OWASP LLM01 for 2026
Multiple January 2026 research publications and incidents confirmed prompt injection as the primary attack vector against AI systems, with OWASP formally designating it as LLM01 in their updated Top 10 for LLM Applications. A comprehensive academic review analyzed 45 sources documenting real-world attacks including GitHub Copilot's CVE-2025-53773 (CVSS 9.6 remote code execution), ChatGPT's Windows license key exposure, and research demonstrating that just five carefully crafted documents can manipulate AI responses 90% of the time through RAG poisoning. IEEE Security & Privacy 2026 research revealed that 8 of 17 third-party chatbot plugins fail to enforce conversation history integrity, enabling client-side message manipulation.
Why This Matters: This represents a fundamental architectural vulnerability in LLM systems with no complete technical solution models cannot reliably distinguish instructions from data. For cloud security teams deploying AI-powered tools including code assistants, chatbots, and automation agents, prompt injection creates risks across the entire attack chain: data exfiltration, credential theft, unauthorized actions, and supply-chain compromise. The OWASP update added new categories for System Prompt Leakage and Vector/Embedding Weaknesses, reflecting the maturation of attack techniques. Organizations must implement defense-in-depth strategies including input sanitization, output validation, privilege minimization for AI agents, sandboxing for tool execution, monitoring for anomalous behavior, and strict separation between trusted system prompts and untrusted user or external content. Reports from Palo Alto Network’s Unit 42 were referred for this news coverage.
Sources: MDPI Research, OWASP, Palo Alto Unit 42
3. Microsoft Copilot "Reprompt" Vulnerability Enabled Silent Data Exfiltration
Varonis Threat Labs disclosed a critical vulnerability in Microsoft Copilot Personal that enabled attackers to silently exfiltrate sensitive user data through a single malicious link click. The "Reprompt" attack exploited the 'q' URL parameter to inject hidden prompts, enabling continuous data extraction even after the Copilot session closed—all without requiring plugins, direct user interaction with Copilot, or additional authentication. Microsoft patched the flaw on January 13, 2026, following responsible disclosure in August 2025. The vulnerability bypassed built-in safety controls through parameter injection, double-request techniques, and chain-requests that maintained persistence.
Why This Matters: Reprompt demonstrates the fundamental security challenge in LLM-based assistants: the inability to reliably distinguish between trusted instructions and untrusted data. For enterprise cloud security teams, AI assistants integrated into workflows create new attack surfaces for credential theft and data exfiltration. Notably, Microsoft 365 Copilot enterprise customers were not affected, highlighting the critical importance of enterprise-grade security controls around AI deployments. Organizations deploying AI assistants must implement input sanitization, session integrity checks, and continuous monitoring—treating LLM interfaces as untrusted user input vectors equivalent to traditional web applications.
Sources: The Hacker News, Varonis, SecurityWeek
4. Google Gemini Calendar Integration Exploited for Semantic Attacks
Miggo Security reported a prompt injection and authorization bypass vulnerability where malicious payloads embedded in Google Calendar invites could be interpreted by Gemini-integrated workflows, exposing private meeting data and enabling creation of deceptive events. Google confirmed and mitigated the issue following responsible disclosure.
Why This Matters: This vulnerability represents the clearest manifestation of the cloud productivity-to-AI agent risk pattern: trusted business objects like calendar invites, documents, and tickets become instruction channels for AI systems. Traditional security controls looking for malicious strings cannot defend against semantic attacks. Defense requires tool-permission governance, data provenance tracking, and runtime policy enforcement to prevent models from exfiltrating sensitive context into attacker-visible fields.
Recommended Actions: For enterprise AI assistants and agents, restrict tool scopes particularly for write actions and cross-tenant sharing. Implement allowlists for sensitive actions including create, share, and export operations. Deploy monitoring for unusual automated edits and creates in Calendar and Drive. Treat AI integrations as new privileged applications requiring threat modeling, comprehensive logging, and incident response playbooks.
Source: Miggo Security
5. AWS Publishes Updated SOC Reports: 185 Services Now In Scope
AWS announced availability of Fall 2025 SOC 1/2/3 reports with 185 services now in scope, providing customer assurance, control mapping capabilities, and audit readiness support.
Why This Matters: SOC scope expansions reduce control gaps for regulated cloud programs and can accelerate onboarding of additional managed services—provided teams update their control mappings accordingly. This is particularly relevant for organizations standardizing evidence collection across multi-region, multi-account environments.
Recommended Actions: Refresh GRC evidence packages. Validate which newly in-scope services your organization relies on. Align internal control narratives to the updated SOC boundary. Leverage this expansion to simplify vendor questionnaires and reduce bespoke audit work.
Source: AWS Security Blog
🎯 Cloud Security Topic of the Week:
Engineering-Led Detection Programs in the AI Era
As AI systems integrate deeper into enterprise security operations, the traditional approach to detection engineering faces fundamental challenges.
This week's topic explores how engineering principles testing suites, validation frameworks, and architectural thinking—are becoming essential for building detection programs that scale effectively while maintaining accuracy and business context.
We examine the reality of AI-augmented security operations, the critical distinction between AI as a force multiplier versus a replacement for human expertise, and practical strategies for maturing detection capabilities in 2026's threat landscape.
Featured Experts This Week 🎤
Antoinette Stevens – Principal Security Engineer at Ramp
Ashish Rajan - CISO | Co-Host AI Security Podcast , Host of Cloud Security Podcast
Definitions and Core Concepts 📚
Before diving into our insights, let's clarify some key terms:
Prompt Injection: A vulnerability class where attackers manipulate AI system behavior by embedding malicious instructions within user inputs or external data sources. Unlike traditional injection attacks, prompt injection exploits the fundamental architectural limitation that LLMs cannot reliably distinguish between trusted system instructions and untrusted user data.
Detection as Code: An engineering approach to detection engineering where detection rules are managed as source-controlled code with testing suites, validation frameworks, and automated deployment pipelines. This methodology brings software engineering disciplines to security operations.
RAG Poisoning: An attack technique targeting Retrieval-Augmented Generation systems where attackers inject malicious content into knowledge bases or vector databases that AI systems query to enhance their responses. Research demonstrates that as few as five carefully crafted documents can manipulate AI responses 90% of the time.
Model Evaluation (Evals): A framework for measuring AI model accuracy and reliability by testing outputs against known-good examples or validation criteria. In security contexts, evals measure how accurately AI agents perform investigations, triage alerts, and make security decisions.
Model Memory: The capability of AI systems to retain and reference previous interactions within a session or across sessions. Modern AI assistants use conversation history to inform future responses, which is critical for maintaining investigation context but also creates new attack surfaces for memory poisoning.
CI/CD Trust Boundary: The authentication and authorization perimeter around continuous integration and deployment systems. Failures in these boundaries can enable supply-chain attacks where unauthorized actors trigger privileged automation, mint credentials, or publish malicious artifacts.
Alert Bloom: A condition where detection rules generate excessive false positives, overwhelming security teams and reducing the signal-to-noise ratio. Engineering-led detection programs treat alert bloom as a code quality issue requiring systematic refactoring.
This week's issue is sponsored by Push Security
Want to learn how to respond to modern attacks that don’t touch the endpoint?
Modern attacks have evolved — most breaches today don’t start with malware or vulnerability exploitation. Instead, attackers are targeting business applications directly over the internet.
This means that the way security teams need to detect and respond has changed too.
Register for the latest webinar from Push Security on February 11 for an interactive, “choose-your-own-adventure” experience walking through modern IR scenarios, where your inputs will determine the course of our investigations.
💡Our Insights from this Practitioner 🔍
The Engineering Foundation: Why Testing and Validation Matter More Than Ever (Full Episode here)
One of the most striking insights from Antoinette Stevens centers on what distinguishes engineering-led detection programs from traditional security operations approaches. The difference isn't merely technical—it's philosophical. Engineering-trained practitioners bring disciplines that seem obvious until you realize they're not universally practiced: testing, validation, reliability, and observability.
"I think there are certain things that you learn when you've been trained to do engineering work, especially around testing and validation and reliability and observability," Stevens explained. "Some of those things that some people might take for granted—it feels obvious until you realize you haven't been doing it."
This engineering mindset fundamentally changes how detection programs scale. Traditional approaches often treat detection rules as one-off implementations: identify a threat, write a rule, deploy it, and move on. Engineering-led programs instead view detection rules as production software requiring lifecycle management. Stevens described this evolution: "We've seen a rise in detection as code becoming more popular. That's an engineering-led approach where you are source controlling something, you might make a test suite to make sure it works. You have validations, you do various things before you move things to production."
The practical implications are substantial. At Ramp, Stevens built validation directly into the detection pipeline. Her detection engineer created GitHub-based rule validation, while their detection platform provides built-in testing frameworks where mock logs validate that alerts fire as expected. This isn't theoretical testing—it's automated validation that runs before any detection rule reaches production.
Practical Application: For security leaders building or maturing detection programs, the first step is treating detection rules as production code. This means implementing source control for all rules, creating testing frameworks that validate detection logic against both historical and synthetic data, building automated validation into your deployment pipeline, and establishing observability for detection effectiveness including false positive rates and coverage gaps.
The Build vs. Buy Calculation Changes with Engineering Capability
The traditional security vendor landscape assumes teams lack engineering capability. Stevens challenges this assumption, arguing that engineering-led teams should fundamentally reconsider their approach to tooling procurement. "If you have engineers on your team—people who can build software—your approach to buying tooling changes such that my approach is now: can I build it myself? And if so, is the cost of me building it myself more or less than the cost of buying it?"
This isn't about blanket build-versus-buy recommendations. Stevens applies a nuanced framework considering maintenance burden, support requirements, and long-term sustainability. "There are platforms that help you with log ingestion—I'm not paying for that. I could write a script and then never touch it again." But for more complex capabilities like AI agent evaluation frameworks, "I don't want to run that myself. I want to stay away from building an entire product internally that won't outlast my tenure."
The decision matrix centers on complexity and required ongoing investment. Simple, stable functionality that rarely requires updates becomes a build candidate. Complex systems requiring continuous tuning, evaluation frameworks, and vendor support become buy candidates. This is particularly relevant for AI-powered security tools where evaluation infrastructure, model rotation capabilities, and observability platforms represent significant ongoing engineering investments.
Practical Application: When evaluating security tools, assess your team's engineering capability honestly. For teams with strong engineering skills, conduct build-versus-buy analyses for each major capability area. Calculate total cost of ownership including development time, maintenance burden, and opportunity cost of not focusing on core security problems. Reserve building for stable, low-maintenance capabilities where vendor solutions add limited value. Buy complex platforms requiring continuous evaluation, model management, or specialized expertise your team lacks.
AI as Force Multiplier: The Reality Check on AI SOCs
The promise of fully autonomous AI security operations centers has captured significant vendor marketing attention. Stevens provides a grounded reality check based on actual implementation experience. At Ramp, AI handles first-level triage, conducting initial investigations before human analysts review and make final decisions. But Stevens is emphatic about the limitations: "I do not trust it to close out alerts. I have watched ChatGPT just lie to me continuously. I am definitely not fully on board with just letting it close out things."
The challenges are fundamental, not merely teething problems with immature technology. Stevens identified several critical issues. First, AI agents make logical inferences that may be incorrect: "If you're not clear with it in how an investigation should be run, it tends to try to fill in the gaps for you. It likes to make a lot of logical summarizations where it says 'and this happened because of X' and 'the result of this is that'... I don't need you to guess at why someone did something. I just need the facts of the situation."
Second, AI agents lack business context that's obvious to human analysts. Stevens cited an example where their AI flagged a subnet opened to the internet as legitimate because "this action was taken by an engineer, and so it is legitimate." The agent missed the critical point: "It doesn't matter if it's legitimate. We should always know and want to do something if a resource is open to the internet."
Third, model variability poses operational risks. "A new version of GPT could come out and be completely wrong," Stevens noted, highlighting that AI SOC implementations must account for model regression risks. This requires robust evaluation frameworks to catch degradation in investigation quality across model updates.
Despite these limitations, AI has delivered measurable value. "AI should be a force multiplier, but it should not be your brain," Stevens emphasized. For her team, AI has been "helpful with noise reduction" and "helpful with tuning." The key is appropriate human oversight: "I still have a base philosophy that if an alert is not useful, it should not fire," meaning even AI-triaged alerts require human validation to ensure quality.
Practical Application: When implementing AI for security operations, deploy it for L1 triage with mandatory human review before any automated response actions. Implement evaluation frameworks to continuously measure investigation accuracy and catch model regression. Develop clear investigation procedures that constrain AI to factual reporting rather than inferential reasoning. Maintain business context through well-defined prompts and investigation playbooks. Track false positive rates separately for AI-triaged versus human-triaged alerts to measure actual effectiveness. Plan for model variability by designing systems that can gracefully handle model updates or degradation.
The Prerequisites: Why AI Doesn't Replace Fundamentals
Perhaps Stevens' most important insight concerns who can effectively leverage AI for detection engineering. The answer challenges popular narratives about AI democratizing technical capabilities. "If you don't know how to write code, AI won't help you get anywhere faster because it's going to write slop and then things will break and you won't know how to fix it," Stevens stated bluntly.
This isn't gatekeeping—it's recognition that AI amplifies existing capabilities rather than creating them. Stevens explained: "For people who already know how to build software, I think it accelerates. But for almost anyone else, it likely slows you down." The reasoning is straightforward: without understanding code architecture, developers cannot evaluate whether AI-generated code is over-engineered, missing critical functionality, or introducing vulnerabilities.
The security implications are particularly concerning. "If you don't understand how architecture works or the basics of writing code, then if your AI generates code that is over-engineered or missing something, especially if you're prompting it and you don't know how... you end up with a product that down the line isn't sustainable. At best isn't sustainable, at worst is vulnerable to something."
This extends to understanding both security and cloud domains. Using AWS as an example, Stevens noted: "You already have AWS experience, you have generic cloud computing understanding. You could easily walk into an Azure environment and go, show me the equivalent of object storage. I'm looking for this specific type of thing. What's possible here?" But that contextual knowledge is prerequisite—without it, you cannot effectively prompt AI or evaluate its responses.
Practical Application: When building detection engineering teams, prioritize fundamental skills: coding ability, cloud architecture understanding, and security domain expertise. AI training should be additive, not foundational. For existing team members adopting AI tools, invest in prompt engineering training that emphasizes contextual expertise—teaching people to validate AI outputs rather than blindly trust them. For detection programs considering AI augmentation, ensure team members can review and modify AI-generated code, understand the architecture of systems being monitored, and possess security domain knowledge to catch hallucinations or incorrect inferences.
Multi-Agent Architectures: The Future of AI Security Operations
As detection programs mature their AI implementations, architectural sophistication increases. Stevens discussed exploring multi-agent architectures where specialized agents handle specific tasks with an orchestrator coordinating overall investigation workflow. "Individual agents do very specific things and there's an orchestrator agent pulling data from each of them," she explained.
This approach addresses a fundamental limitation of general-purpose AI agents: they perform better with narrow, well-defined responsibilities. "They do really well when they do a very specific job," Stevens noted. She referenced vendor implementations with specialized agents for legal analysis, penetration testing, and validation that collaborate to reach consensus before taking actions.
The multi-agent pattern also improves accuracy through specialization and cross-validation. Rather than a single agent making investigation decisions, specialized agents provide domain-specific analysis that an orchestrator synthesizes. This architectural approach mirrors how security operations centers organize human analysts by specialty areas, suggesting it may represent a sustainable pattern for AI-augmented security operations.
Practical Application: For organizations with mature AI implementations, consider transitioning from monolithic AI agents to multi-agent architectures. Design specialized agents for distinct investigation tasks such as log analysis, threat intelligence lookup, compliance checking, and business context validation. Implement orchestrator patterns that coordinate specialized agents and synthesize their outputs. Develop evaluation frameworks that measure both individual agent accuracy and overall investigation quality. Start with lower-risk investigation areas before expanding to critical alerts.
Career Implications: The Shift in Entry-Level Security Positions
Stevens raised a sobering reality about AI's impact on security career paths. "I am really nervous for the entry-level positions in security," she admitted. "A lot of the entry-level work that we would hire people for can be done now with AI." This isn't a distant future concern—it's happening now.
The implications extend beyond technical skills. Stevens offered crucial advice for early-career professionals: "If you're early on in your career, your job is to be a personality hire. That is your goal. If you don't do anything, you contribute almost no value at that stage. Your job is to be a good person and to learn as much as you can."
This represents a fundamental shift in what organizations value in junior security professionals. Technical tasks that previously provided entry points into security careers are increasingly automated. What remains irreplaceable are interpersonal skills, adaptability, composure under pressure, and the ability to collaborate effectively—the very skills that enable someone to be, as Stevens put it, "cool as a cucumber" during incidents when everyone else is panicking.
For those entering security now, Stevens recommended targeting larger, established enterprises rather than startups. "Finding an older, larger company is going to be your best bet," she advised, noting that these organizations are slower to adopt AI and still maintain traditional career progression paths. "The realistic possibilities of getting a job at a smaller startup without some sort of deeply technical skill to go with it is very slim."
Practical Application: For hiring managers, recalibrate entry-level requirements to emphasize interpersonal effectiveness, learning agility, and composure under pressure alongside technical fundamentals. For early-career professionals, focus on developing differentiated skills that AI cannot replicate: incident response composure, cross-functional collaboration, business context understanding, and technical communication. Target internships and entry-level positions at larger enterprises with established security programs. Build foundational technical skills—particularly coding ability—that enable effective AI utilization rather than replacement by AI.
The Emerging Threat Landscape: Back to Basics
Despite AI's prominence, Stevens emphasized that effective security in 2026 still requires fundamentals. "A lot of this is just getting the basics right," she stressed. Her specific recommendations reflect current threat patterns: "If you don't have a good endpoint detection program, you should probably consider getting one now, considering all the NPM packages that are getting compromised and the move to targeting engineering and developer machines."
Shadow IT management has become increasingly critical as threat actors exploit AI hype. "Getting a Shadow IT program going, if you don't have one, is a good move. We've seen a lot of malware be propagated through tooling claiming to be AI or like different AI software," Stevens noted.
This advice grounds AI security concerns in actionable defensive measures. While prompt injection and AI agent vulnerabilities represent real threats, they layer atop traditional attack vectors that remain highly effective. The solution isn't choosing between AI-focused or traditional defenses—it's ensuring foundational controls are robust before adding AI-specific protections.
Practical Application: Audit your endpoint detection coverage, particularly for developer and engineering workstations that are increasingly targeted through compromised development tools and packages. Implement or strengthen Shadow IT programs with specific focus on unapproved AI tools that may introduce data exfiltration or prompt injection risks. Review software supply chain security including NPM, PyPI, and other package repositories your developers rely on. Establish baseline security controls before investing heavily in AI-specific security tools. Remember that threat actors are also learning AI—they're not yet expert adversaries, providing a window to strengthen fundamentals.
OWASP Top 10 for LLM Applications 2025 — Comprehensive guide to AI security risks including prompt injection, supply-chain vulnerabilities, and system prompt leakage
Detection Engineering Maturity Matrix — Framework for assessing and improving detection program maturity with focus on engineering practices
AWS CodeBuild Security Best Practices — Official guidance on securing CI/CD pipelines including least-privilege build roles and trigger authentication
MITRE ATT&CK for ICS: AI/ML Attack Techniques — Taxonomy of adversarial techniques targeting AI systems including data poisoning and model evasion
Detection as Code: A Practical Guide — Open-source resources for implementing source-controlled, tested, and validated detection rules
Prompt Injection Defenses: A Comprehensive Guide — Simon Willison's authoritative resource on understanding and mitigating prompt injection attacks
Cloud Security Alliance: AI Security Guidance — Industry consortium guidance on securing AI deployments in cloud environments
Cloud Security Podcast
Question for you? (Reply to this email)
🤔 Is your detection program treating AI as a force multiplier or a replacement — and what’s the first capability you’d automate with human oversight still in the loop?
Next week, we'll explore another critical aspect of cloud security. Stay tuned!
📬 Want weekly expert takes on AI & Cloud Security? [Subscribe here]”
We would love to hear from you📢 for a feature or topic request or if you would like to sponsor an edition of Cloud Security Newsletter.
Thank you for continuing to subscribe and Welcome to the new members in tis newsletter communityđź’™
Peace!
Was this forwarded to you? You can Sign up here, to join our growing readership.
Want to sponsor the next newsletter edition! Lets make it happen
Have you joined our FREE Monthly Cloud Security Bootcamp yet?
checkout our sister podcast AI Security Podcast


