deep-research

Verified

by bytedance · Updated Apr 16, 2026

opus

2.1Kin3.0Kout

$0.581m 45s

Use this skill instead of WebSearch for ANY question requiring web research. Trigger on queries like "what is X", "explain X", "compare X and Y", "research X", or before content generation tasks. Provides systematic multi-angle research methodology instead of single superficial searches. Use this proactively when the user's question needs online information.

researchwebcontent-generationfact-findingmethodology

See It In Action

1 turns·claude

The project codebase is in your current working directory. Explore the files to understand the code before making changes. need a quick brief before architecture review comparing Browser Use, OpenAI Operator, and Anthropic computer-use for browser agents, with current capabilities, limits, access/pricing, risks, and sources

I'll research all three platforms in parallel across multiple angles.

Here's the brief for your architecture review.

Browser Agent Comparison Brief — April 2026

1. Browser Use

Architecture: Open-source Python/TypeScript framework built on Playwright. Converts DOM into structured format for LLM consumption. Supports multiple LLMs (GPT-4o, Claude, Gemini, open-source models). Self-hosted or managed via Browser Use Cloud.

Capabilities:

89.1% on WebVoyager (best among the three)
Web navigation, form filling, multi-step workflows
Can use real browser profiles with existing logins
Cloud version adds stealth proxies, CAPTCHA solving, persistent filesystem

Access & Pricing:

Open-source (free, self-hosted)
Cloud: pay-as-you-go — $0.01/task init + per-step LLM cost (~$0.05/step)
Plans: Free, Starter ($100/mo), Business ($500/mo), Scaleup ($2,500/mo) with volume discounts

Limitations: Requires technical skill to set up self-hosted. Security is developer's responsibility — no built-in safety layer. Cloud version is newer and less battle-tested than the incumbents.

Risks: No built-in guardrails for sensitive actions. Prompt injection defense depends entirely on the integrator. Running with real browser profiles exposes session cookies and credentials to the agent.

2. OpenAI Operator (now "Agent Mode" in ChatGPT)

Architecture: Powered by CUA (Computer-Using Agent) — GPT-4o with vision + reinforcement learning for GUI interaction. Screenshot-based iterative loop: capture → reason → act. Runs in a sandboxed browser instance (not your desktop).

Capabilities:

87% WebVoyager, 58.1% WebArena, 38.1% OSWorld
Strong error recovery and user-friendly UX
Multi-task via parallel conversations
Custom instructions per site; saved prompts

Access & Pricing:

Bundled into ChatGPT subscriptions (not standalone)
Plus ($20/mo): 40 agent tasks/month
Pro ($200/mo): 400 agent tasks/month
API: CUA model via OpenAI API, token-based pricing

Limitations: Web-only (sandboxed browser, no desktop access). Task caps on subscription tiers. No model flexibility — locked to OpenAI's CUA. Limited customization for developers.

Risks: "Takeover mode" for credentials means the user must manually enter sensitive data — but the agent still sees the page context. Three-layer safety system (user approval for significant actions, adversarial defenses). Still vulnerable to indirect prompt injection from page content.

3. Anthropic Computer Use

Architecture: Full desktop control via screenshot-analyze-act loop. Claude takes periodic screenshots, reasons multimodally, then executes mouse/keyboard actions. Works across native apps, not just browsers.

Capabilities:

72.5% OSWorld (best among the three — near human-level)
Superior visual comprehension and coding/dev tasks
Works across any desktop app (spreadsheets, terminals, browsers)
56% WebVoyager (weakest for pure web tasks)

Access & Pricing:

Consumer: Claude Pro ($20/mo) or Max ($100/mo) via Claude Cowork
API: token-based — Sonnet 4.6 at $3/$15 per M tokens, Opus 4.6 at $5/$25
macOS research preview currently; broader availability coming

Limitations: Research preview — macOS only for now. Struggles with scrolling, dragging, zooming. Weakest pure web-navigation performance. Requires desktop environment (not cloud-sandboxed by default).

Risks: Full desktop access = largest blast radius if compromised. Built-in safety classifiers + permission prompts before accessing new apps. Anthropic has published prompt injection defense research, but the attack surface is the entire desktop, not just a browser tab.

Head-to-Head Summary

Dimension	Browser Use	OpenAI Operator	Anthropic Computer Use
Scope	Browser only	Sandboxed browser	Full desktop
WebVoyager	89%	87%	56%
OSWorld	N/A	38%	72.5%
Model lock-in	None (multi-LLM)	GPT-4o CUA only	Claude only
Open source	Yes	No	No
Entry price	Free (self-hosted)	$20/mo (40 tasks)	$20/mo (Pro)
Blast radius	Browser session	Sandboxed browser	Entire desktop
Safety layer	DIY	3-layer + takeover	Classifiers + permissions

Cross-Cutting Risk: Prompt Injection

All three are vulnerable to indirect prompt injection — malicious instructions embedded in web page content, ads, or documents. Recent incidents:

HashJack (Cato CTRL, 2026): hidden prompts in URL fragments exploited by agentic browsers for data exfiltration, credential theft, phishing callbacks
PleaseFix (Zenity Labs, March 2026): Perplexity's Comet browser compromised via calendar invite injection — local file access and 1Password credential theft without user interaction
The fundamental issue: every webpage, ad, and embedded document is a potential injection vector when the agent has autonomy + data access

Recommendation for review: Evaluate (a) whether the agent needs full desktop or just browser scope, (b) acceptable blast radius if injection succeeds, (c) whether model flexibility or safety guarantees matter more for your use case.

Sources:

Claude Sonnet 4.6

Permissions

Scope	Description
network:outbound

SKILL.md

Deep Research Skill

Overview

This skill provides a systematic methodology for conducting thorough web research. Load this skill BEFORE starting any content generation task to ensure you gather sufficient information from multiple angles, depths, and sources.

When to Use This Skill

Always load this skill when:

Research Questions

User asks "what is X", "explain X", "research X", "investigate X"
User wants to understand a concept, technology, or topic in depth
The question requires current, comprehensive information from multiple sources
A single web search would be insufficient to answer properly

Content Generation (Pre-research)

Creating presentations (PPT/slides)
Creating frontend designs or UI mockups
Writing articles, reports, or documentation
Producing videos or multimedia content
Any content that requires real-world information, examples, or current data

Core Principle

Never generate content based solely on general knowledge. The quality of your output directly depends on the quality and quantity of research conducted beforehand. A single search query is NEVER enough.

Research Methodology

Phase 1: Broad Exploration

Start with broad searches to understand the landscape:

Initial Survey: Search for the main topic to understand the overall context
Identify Dimensions: From initial results, identify key subtopics, themes, angles, or aspects that need deeper exploration
Map the Territory: Note different perspectives, stakeholders, or viewpoints that exist

Example:

Topic: "AI in healthcare"
Initial searches:
- "AI healthcare applications 2024"
- "artificial intelligence medical diagnosis"
- "healthcare AI market trends"

Identified dimensions:
- Diagnostic AI (radiology, pathology)
- Treatment recommendation systems
- Administrative automation
- Patient monitoring
- Regulatory landscape
- Ethical considerations

Phase 2: Deep Dive

For each important dimension identified, conduct targeted research:

Specific Queries: Search with precise keywords for each subtopic
Multiple Phrasings: Try different keyword combinations and phrasings
Fetch Full Content: Use web_fetch to read important sources in full, not just snippets
Follow References: When sources mention other important resources, search for those too

Example:

Dimension: "Diagnostic AI in radiology"
Targeted searches:
- "AI radiology FDA approved systems"
- "chest X-ray AI detection accuracy"
- "radiology AI clinical trials results"

Then fetch and read:
- Key research papers or summaries
- Industry reports
- Real-world case studies

Phase 3: Diversity & Validation

Ensure comprehensive coverage by seeking diverse information types:

Information Type	Purpose	Example Searches
Facts & Data	Concrete evidence	"statistics", "data", "numbers", "market size"
Examples & Cases	Real-world applications	"case study", "example", "implementation"
Expert Opinions	Authority perspectives	"expert analysis", "interview", "commentary"
Trends & Predictions	Future direction	"trends 2024", "forecast", "future of"
Comparisons	Context and alternatives	"vs", "comparison", "alternatives"
Challenges & Criticisms	Balanced view	"challenges", "limitations", "criticism"

Phase 4: Synthesis Check

Before proceeding to content generation, verify:

Have I searched from at least 3-5 different angles?
Have I fetched and read the most important sources in full?
Do I have concrete data, examples, and expert perspectives?
Have I explored both positive aspects and challenges/limitations?
Is my information current and from authoritative sources?

If any answer is NO, continue researching before generating content.

Search Strategy Tips

Effective Query Patterns

# Be specific with context
❌ "AI trends"
✅ "enterprise AI adoption trends 2024"

# Include authoritative source hints
"[topic] research paper"
"[topic] McKinsey report"
"[topic] industry analysis"

# Search for specific content types
"[topic] case study"
"[topic] statistics"
"[topic] expert interview"

# Use temporal qualifiers — always use the ACTUAL current year from <current_date>
"[topic] 2026"   # ← replace with real current year, never hardcode a past year
"[topic] latest"
"[topic] recent developments"

Temporal Awareness

Always check <current_date> in your context before forming ANY search query.

<current_date> gives you the full date: year, month, day, and weekday (e.g. 2026-02-28, Saturday). Use the right level of precision depending on what the user is asking:

User intent	Temporal precision needed	Example query
"today / this morning / just released"	Month + Day	`"tech news February 28 2026"`
"this week"	Week range	`"technology releases week of Feb 24 2026"`
"recently / latest / new"	Month	`"AI breakthroughs February 2026"`
"this year / trends"	Year	`"software trends 2026"`

Rules:

When the user asks about "today" or "just released", use month + day + year in your search queries to get same-day results
Never drop to year-only when day-level precision is needed — "tech news 2026" will NOT surface today's news
Try multiple phrasings: numeric form (2026-02-28), written form (February 28 2026), and relative terms (today, this week) across different queries

❌ User asks "what's new in tech today" → searching "new technology 2026" → misses today's news ✅ User asks "what's new in tech today" → searching "new technology February 28 2026" + "tech news today Feb 28" → gets today's results

When to Use web_fetch

Use web_fetch to read full content when:

A search result looks highly relevant and authoritative
You need detailed information beyond the snippet
The source contains data, case studies, or expert analysis
You want to understand the full context of a finding

Iterative Refinement

Research is iterative. After initial searches:

Review what you've learned
Identify gaps in your understanding
Formulate new, more targeted queries
Repeat until you have comprehensive coverage

Quality Bar

Your research is sufficient when you can confidently answer:

What are the key facts and data points?
What are 2-3 concrete real-world examples?
What do experts say about this topic?
What are the current trends and future directions?
What are the challenges or limitations?
What makes this topic relevant or important now?

Common Mistakes to Avoid

❌ Stopping after 1-2 searches
❌ Relying on search snippets without reading full sources
❌ Searching only one aspect of a multi-faceted topic
❌ Ignoring contradicting viewpoints or challenges
❌ Using outdated information when current data exists
❌ Starting content generation before research is complete

Output

After completing research, you should have:

A comprehensive understanding of the topic from multiple angles
Specific facts, data points, and statistics
Real-world examples and case studies
Expert perspectives and authoritative sources
Current trends and relevant context

Only then proceed to content generation, using the gathered information to create high-quality, well-informed content.

FAQ

What does deep-research do?

When should I use deep-research?

Use it when you need a repeatable workflow that produces text response.

What does deep-research output?

In the evaluated run it produced text response.

How do I install or invoke deep-research?

Ask the agent to use this skill when the task matches its documented workflow.

Which agents does deep-research support?

Agent support is inferred from the source, but not explicitly declared.

What tools, channels, or permissions does deep-research need?

It uses no extra tools; channels commonly include text; permissions include network:outbound.

Is deep-research safe to install?

Static analysis marked this skill as low risk; review side effects and permissions before enabling it.

How is deep-research different from an MCP or plugin?

A skill packages instructions and workflow conventions; tools, MCP servers, and plugins are dependencies the skill may call during execution.

Does deep-research outperform not using a skill?

About deep-research

When to use deep-research

When a user asks for explanations, comparisons, or investigations that require current online information. Before drafting articles, presentations, reports, or other content that should be grounded in external sources. When a topic needs multiple perspectives, examples, data points, and validation from authoritative sources.

When deep-research is not the right choice

When the task can be completed from the existing codebase or provided files without web research. When the user wants a quick opinion or purely internal reasoning rather than externally researched information.

What it produces

Produces text response.