deep-research
Verifiedby bytedance · Updated Apr 16, 2026
Use this skill instead of WebSearch for ANY question requiring web research. Trigger on queries like "what is X", "explain X", "compare X and Y", "research X", or before content generation tasks. Provides systematic multi-angle research methodology instead of single superficial searches. Use this proactively when the user's question needs online information.
See It In Action
I'll research all three platforms in parallel across multiple angles.
Here's the brief for your architecture review.
Browser Agent Comparison Brief — April 2026
1. Browser Use
Architecture: Open-source Python/TypeScript framework built on Playwright. Converts DOM into structured format for LLM consumption. Supports multiple LLMs (GPT-4o, Claude, Gemini, open-source models). Self-hosted or managed via Browser Use Cloud.
Capabilities:
- 89.1% on WebVoyager (best among the three)
- Web navigation, form filling, multi-step workflows
- Can use real browser profiles with existing logins
- Cloud version adds stealth proxies, CAPTCHA solving, persistent filesystem
Access & Pricing:
- Open-source (free, self-hosted)
- Cloud: pay-as-you-go — $0.01/task init + per-step LLM cost (~$0.05/step)
- Plans: Free, Starter ($100/mo), Business ($500/mo), Scaleup ($2,500/mo) with volume discounts
Limitations: Requires technical skill to set up self-hosted. Security is developer's responsibility — no built-in safety layer. Cloud version is newer and less battle-tested than the incumbents.
Risks: No built-in guardrails for sensitive actions. Prompt injection defense depends entirely on the integrator. Running with real browser profiles exposes session cookies and credentials to the agent.
2. OpenAI Operator (now "Agent Mode" in ChatGPT)
Architecture: Powered by CUA (Computer-Using Agent) — GPT-4o with vision + reinforcement learning for GUI interaction. Screenshot-based iterative loop: capture → reason → act. Runs in a sandboxed browser instance (not your desktop).
Capabilities:
- 87% WebVoyager, 58.1% WebArena, 38.1% OSWorld
- Strong error recovery and user-friendly UX
- Multi-task via parallel conversations
- Custom instructions per site; saved prompts
Access & Pricing:
- Bundled into ChatGPT subscriptions (not standalone)
- Plus ($20/mo): 40 agent tasks/month
- Pro ($200/mo): 400 agent tasks/month
- API: CUA model via OpenAI API, token-based pricing
Limitations: Web-only (sandboxed browser, no desktop access). Task caps on subscription tiers. No model flexibility — locked to OpenAI's CUA. Limited customization for developers.
Risks: "Takeover mode" for credentials means the user must manually enter sensitive data — but the agent still sees the page context. Three-layer safety system (user approval for significant actions, adversarial defenses). Still vulnerable to indirect prompt injection from page content.
3. Anthropic Computer Use
Architecture: Full desktop control via screenshot-analyze-act loop. Claude takes periodic screenshots, reasons multimodally, then executes mouse/keyboard actions. Works across native apps, not just browsers.
Capabilities:
- 72.5% OSWorld (best among the three — near human-level)
- Superior visual comprehension and coding/dev tasks
- Works across any desktop app (spreadsheets, terminals, browsers)
- 56% WebVoyager (weakest for pure web tasks)
Access & Pricing:
- Consumer: Claude Pro ($20/mo) or Max ($100/mo) via Claude Cowork
- API: token-based — Sonnet 4.6 at $3/$15 per M tokens, Opus 4.6 at $5/$25
- macOS research preview currently; broader availability coming
Limitations: Research preview — macOS only for now. Struggles with scrolling, dragging, zooming. Weakest pure web-navigation performance. Requires desktop environment (not cloud-sandboxed by default).
Risks: Full desktop access = largest blast radius if compromised. Built-in safety classifiers + permission prompts before accessing new apps. Anthropic has published prompt injection defense research, but the attack surface is the entire desktop, not just a browser tab.
Head-to-Head Summary
| Dimension | Browser Use | OpenAI Operator | Anthropic Computer Use |
|---|---|---|---|
| Scope | Browser only | Sandboxed browser | Full desktop |
| WebVoyager | 89% | 87% | 56% |
| OSWorld | N/A | 38% | 72.5% |
| Model lock-in | None (multi-LLM) | GPT-4o CUA only | Claude only |
| Open source | Yes | No | No |
| Entry price | Free (self-hosted) | $20/mo (40 tasks) | $20/mo (Pro) |
| Blast radius | Browser session | Sandboxed browser | Entire desktop |
| Safety layer | DIY | 3-layer + takeover | Classifiers + permissions |
Cross-Cutting Risk: Prompt Injection
All three are vulnerable to indirect prompt injection — malicious instructions embedded in web page content, ads, or documents. Recent incidents:
- HashJack (Cato CTRL, 2026): hidden prompts in URL fragments exploited by agentic browsers for data exfiltration, credential theft, phishing callbacks
- PleaseFix (Zenity Labs, March 2026): Perplexity's Comet browser compromised via calendar invite injection — local file access and 1Password credential theft without user interaction
- The fundamental issue: every webpage, ad, and embedded document is a potential injection vector when the agent has autonomy + data access
Recommendation for review: Evaluate (a) whether the agent needs full desktop or just browser scope, (b) acceptable blast radius if injection succeeds, (c) whether model flexibility or safety guarantees matter more for your use case.
Sources:
- Browser Use vs Computer Use vs Operator — Helicone
- Anthropic Computer Use vs OpenAI CUA — WorkOS
- Agentic Browser Landscape 2026 — No Hacks
- Top 5 Agentic Browsers: Capabilities and Security Risks — Seraphic
- Introducing Operator — OpenAI
- Computer Use Tool — Anthropic Docs
- Browser Use GitHub
- Browser Use Cloud Pricing
- ChatGPT Operator Pricing 2026 — o-mega
- Anthropic Prompt Injection Defenses
- HashJack Indirect Prompt Injection — Cato Networks
- PleaseFix / Perplexity Comet Vulnerability — Aviatrix
- AI Agents Undermine Browser Security — Dark Reading
- Building Browser Agents: Architecture, Security — arXiv
Permissions
| Scope | Description |
|---|---|
| network:outbound |
SKILL.md
Deep Research Skill
Overview
This skill provides a systematic methodology for conducting thorough web research. Load this skill BEFORE starting any content generation task to ensure you gather sufficient information from multiple angles, depths, and sources.
When to Use This Skill
Always load this skill when:
Research Questions
- User asks "what is X", "explain X", "research X", "investigate X"
- User wants to understand a concept, technology, or topic in depth
- The question requires current, comprehensive information from multiple sources
- A single web search would be insufficient to answer properly
Content Generation (Pre-research)
- Creating presentations (PPT/slides)
- Creating frontend designs or UI mockups
- Writing articles, reports, or documentation
- Producing videos or multimedia content
- Any content that requires real-world information, examples, or current data
Core Principle
Never generate content based solely on general knowledge. The quality of your output directly depends on the quality and quantity of research conducted beforehand. A single search query is NEVER enough.
Research Methodology
Phase 1: Broad Exploration
Start with broad searches to understand the landscape:
- Initial Survey: Search for the main topic to understand the overall context
- Identify Dimensions: From initial results, identify key subtopics, themes, angles, or aspects that need deeper exploration
- Map the Territory: Note different perspectives, stakeholders, or viewpoints that exist
Example:
Topic: "AI in healthcare"
Initial searches:
- "AI healthcare applications 2024"
- "artificial intelligence medical diagnosis"
- "healthcare AI market trends"
Identified dimensions:
- Diagnostic AI (radiology, pathology)
- Treatment recommendation systems
- Administrative automation
- Patient monitoring
- Regulatory landscape
- Ethical considerations
Phase 2: Deep Dive
For each important dimension identified, conduct targeted research:
- Specific Queries: Search with precise keywords for each subtopic
- Multiple Phrasings: Try different keyword combinations and phrasings
- Fetch Full Content: Use
web_fetchto read important sources in full, not just snippets - Follow References: When sources mention other important resources, search for those too
Example:
Dimension: "Diagnostic AI in radiology"
Targeted searches:
- "AI radiology FDA approved systems"
- "chest X-ray AI detection accuracy"
- "radiology AI clinical trials results"
Then fetch and read:
- Key research papers or summaries
- Industry reports
- Real-world case studies
Phase 3: Diversity & Validation
Ensure comprehensive coverage by seeking diverse information types:
| Information Type | Purpose | Example Searches |
|---|---|---|
| Facts & Data | Concrete evidence | "statistics", "data", "numbers", "market size" |
| Examples & Cases | Real-world applications | "case study", "example", "implementation" |
| Expert Opinions | Authority perspectives | "expert analysis", "interview", "commentary" |
| Trends & Predictions | Future direction | "trends 2024", "forecast", "future of" |
| Comparisons | Context and alternatives | "vs", "comparison", "alternatives" |
| Challenges & Criticisms | Balanced view | "challenges", "limitations", "criticism" |
Phase 4: Synthesis Check
Before proceeding to content generation, verify:
- Have I searched from at least 3-5 different angles?
- Have I fetched and read the most important sources in full?
- Do I have concrete data, examples, and expert perspectives?
- Have I explored both positive aspects and challenges/limitations?
- Is my information current and from authoritative sources?
If any answer is NO, continue researching before generating content.
Search Strategy Tips
Effective Query Patterns
# Be specific with context
❌ "AI trends"
✅ "enterprise AI adoption trends 2024"
# Include authoritative source hints
"[topic] research paper"
"[topic] McKinsey report"
"[topic] industry analysis"
# Search for specific content types
"[topic] case study"
"[topic] statistics"
"[topic] expert interview"
# Use temporal qualifiers — always use the ACTUAL current year from <current_date>
"[topic] 2026" # ← replace with real current year, never hardcode a past year
"[topic] latest"
"[topic] recent developments"
Temporal Awareness
Always check <current_date> in your context before forming ANY search query.
<current_date> gives you the full date: year, month, day, and weekday (e.g. 2026-02-28, Saturday). Use the right level of precision depending on what the user is asking:
| User intent | Temporal precision needed | Example query |
|---|---|---|
| "today / this morning / just released" | Month + Day | "tech news February 28 2026" |
| "this week" | Week range | "technology releases week of Feb 24 2026" |
| "recently / latest / new" | Month | "AI breakthroughs February 2026" |
| "this year / trends" | Year | "software trends 2026" |
Rules:
- When the user asks about "today" or "just released", use month + day + year in your search queries to get same-day results
- Never drop to year-only when day-level precision is needed —
"tech news 2026"will NOT surface today's news - Try multiple phrasings: numeric form (
2026-02-28), written form (February 28 2026), and relative terms (today,this week) across different queries
❌ User asks "what's new in tech today" → searching "new technology 2026" → misses today's news
✅ User asks "what's new in tech today" → searching "new technology February 28 2026" + "tech news today Feb 28" → gets today's results
When to Use web_fetch
Use web_fetch to read full content when:
- A search result looks highly relevant and authoritative
- You need detailed information beyond the snippet
- The source contains data, case studies, or expert analysis
- You want to understand the full context of a finding
Iterative Refinement
Research is iterative. After initial searches:
- Review what you've learned
- Identify gaps in your understanding
- Formulate new, more targeted queries
- Repeat until you have comprehensive coverage
Quality Bar
Your research is sufficient when you can confidently answer:
- What are the key facts and data points?
- What are 2-3 concrete real-world examples?
- What do experts say about this topic?
- What are the current trends and future directions?
- What are the challenges or limitations?
- What makes this topic relevant or important now?
Common Mistakes to Avoid
- ❌ Stopping after 1-2 searches
- ❌ Relying on search snippets without reading full sources
- ❌ Searching only one aspect of a multi-faceted topic
- ❌ Ignoring contradicting viewpoints or challenges
- ❌ Using outdated information when current data exists
- ❌ Starting content generation before research is complete
Output
After completing research, you should have:
- A comprehensive understanding of the topic from multiple angles
- Specific facts, data points, and statistics
- Real-world examples and case studies
- Expert perspectives and authoritative sources
- Current trends and relevant context
Only then proceed to content generation, using the gathered information to create high-quality, well-informed content.
FAQ
What does deep-research do?
Use this skill instead of WebSearch for ANY question requiring web research. Trigger on queries like "what is X", "explain X", "compare X and Y", "research X", or before content generation tasks. Provides systematic multi-angle research methodology instead of single superficial searches. Use this proactively when the user's question needs online information.
When should I use deep-research?
Use it when you need a repeatable workflow that produces text response.
What does deep-research output?
In the evaluated run it produced text response.
How do I install or invoke deep-research?
Ask the agent to use this skill when the task matches its documented workflow.
Which agents does deep-research support?
Agent support is inferred from the source, but not explicitly declared.
What tools, channels, or permissions does deep-research need?
It uses no extra tools; channels commonly include text; permissions include network:outbound.
Is deep-research safe to install?
Static analysis marked this skill as low risk; review side effects and permissions before enabling it.
How is deep-research different from an MCP or plugin?
A skill packages instructions and workflow conventions; tools, MCP servers, and plugins are dependencies the skill may call during execution.
Does deep-research outperform not using a skill?
About deep-research
When to use deep-research
When a user asks for explanations, comparisons, or investigations that require current online information. Before drafting articles, presentations, reports, or other content that should be grounded in external sources. When a topic needs multiple perspectives, examples, data points, and validation from authoritative sources.
When deep-research is not the right choice
When the task can be completed from the existing codebase or provided files without web research. When the user wants a quick opinion or purely internal reasoning rather than externally researched information.
What it produces
Produces text response.