AI Browsers boom vs. Prompt Injection
AI browsers are the new hotness. OpenAI's ChatGPT Atlas launched yesterday (Oct 21), putting a full assistant inside the browser and kicking off a fresh round of "agentic web" excitement.
Security news landed just as fast:
-
Oct 21: Brave published research on "unseeable" prompt injections against Perplexity Comet and other AI browsers. Two attack vectors:
Screenshot injection (Comet): Attacker hides malicious text in images using near-invisible colors (light blue on yellow background). When a user screenshots the page, Comet's OCR extracts the hidden text and treats it as commands instead of untrusted content. User never sees the malicious instructions.
Navigation injection (Fellou browser): Just asking the browser to visit a URL sends that page's content to the LLM. Visible malicious text on the attacker's site gets mixed with the user's query and can override user intent. No explicit summarization needed.
Brave's diagnosis: browsers are failing to maintain boundaries between trusted user input and untrusted web content when building LLM prompts. Their recommendation until there's a categorical fix: isolate agentic browsing from regular browsing, only activate it when users explicitly invoke it.
-
Oct 22: Perplexity responded with their defense-in-depth plan for Comet. They're layering four protections:
Layer 1 – Real-time classifiers that scan content before Comet acts. Catches hidden HTML/CSS tricks (white-on-white text, zero-font-size), image-based injections invisible to humans, and goal-hijacking attempts. Runs in parallel so there's no latency hit. Models get updated continuously from their bug bounty program and red team discoveries.
Layer 2 – Structured prompts at every decision point reminding the model "this is external content, stick to what the user asked for." Each tool gets explicit guardrails and clear boundaries between user intent and untrusted data.
Layer 3 – User confirmation for anything sensitive: sending email, calendar changes, placing orders, filling in personal details. Human stays in the loop on high-impact actions.
Layer 4 – Transparent notifications when something gets blocked. Shows you what was flagged, why, and how to report false positives.
They're upfront that prompt injection remains unsolved industry-wide. Their approach: overlapping defenses so even if one layer fails, others catch it.
-
Oct 23: MongoDB Field CTO Pete Johnson posted that he found unencrypted OAuth tokens in Atlas. The tokens sit in a SQLite database with 644 permissions (readable by any process on your Mac). Took him 90 minutes to write a script that extracts the tokens and pulls profile data plus full conversation history from the OpenAI API. Not just Atlas conversations. All your ChatGPT history. Update: Another user confirmed the script works on their install too. Some users apparently get keychain encryption, others don't—no one knows why yet.
My take:
This is just the tip of the iceberg. Agentic browsing rewrites old web assumptions. Once an assistant can fill forms, read your mail, or move money, any page becomes a potential control channel. Could be an image. A PDF. Canvas element. Transcript.
This isn't a one-patch problem. It's about building durable intent isolation, scoped permissions, action gating, and auditable logs across the entire category.
What Do you think?
- Do Atlas, Comet, and peers clearly separate user intent from page content in their prompts?
- Are image/OCR pathways treated as untrusted input with default-deny behavior?
- Is there a visible "agent mode" with domain allowlists, explicit confirms, and per-task scoping?
- Are credentials and tokens encrypted at rest using platform keychains?
What would you need to trust an AI browser on a work account? Per-task domain allowlists? Human-in-the-loop for payments/email? Signed action logs? Something else?
Sign in to join the discussion.
Hey, great write-up. Could you share, in plain terms, what an untrusted page or screenshot can make the assistant do today without me explicitly clicking OK? Like can it just navigate, read DOM, auto-fill/submit forms, send an email, or kick off a purchase? and in those cases what cookies/tokens (if any) get sent along? If you’ve tested Comet and Atlas on the latest public builds, a quick note on which actions still happen implicitly vs. which now require a confirmation would be super helpful
So currently on Comet I can do read-only things (navigate, read/summarize) without a confirmation; high-impact actions (email, calendar edits, checkout, filling personal details) now prompt, and their latest update adds classifiers meant to catch hidden/vision-based injections before any action is taken. I haven’t reproduced an action firing from screenshot/OCR after the 10/22 changes, but I’m still poking at edge cases. On Atlas I’m also seeing confirmations for obvious write actions; storage/credential handling looks inconsistent across machines and I don’t yet have a confident read on what cookies ride along for cross-domain fetches. If anyone has clean repros for cookie scope or
default-denybehavior on OCR, please share.