gpt-researcher

Verified

by assafelovic · Updated Apr 16, 2026

opus

419in2.2Kout

$0.362m 31s

GPT Researcher is an autonomous deep research agent that conducts web and local research, producing detailed reports with citations. Use this skill when helping developers understand, extend, debug, or integrate with GPT Researcher - including adding features, understanding the architecture, working with the API, customizing research workflows, adding new retrievers, integrating MCP data sources, or troubleshooting research pipelines.

gpt-researcherpythonfastapinextjsresearch-agentintegration

See It In Action

1 turns·claude

The project codebase is in your current working directory. Explore the files to understand the code before making changes. gpt researcher upgrade broke our research endpoint today — need a same-day debug memo on why retriever + streaming status stopped working and exactly where to fix it

I'll explore the codebase and investigate the retriever and streaming issues in parallel.

Let me verify the exact bug locations by reading the critical files.

Now let me check the remaining files to round out the memo.

Here's the debug memo.

Debug Memo: GPT Researcher v4.2.0 Regression (2026-04-11)

Root cause: The v4.2.0 upgrade (2026-04-08) refactored two systems — retriever dispatch and streaming event types — but left gaps in both.

Bug 1: Semantic Scholar retriever unreachable (CRITICAL)

Symptom: Setting RETRIEVER=semantic_scholar silently falls back to Tavily. No error raised — just a log warning.

Why: The new SemanticScholarRetriever class was added at retrievers/semantic_scholar/semantic_scholar.py and config keys were added to config/variables/default.py:20-21, but no case "semantic_scholar": arm was added to the match/case dispatcher in actions/retriever.py. The wildcard case _: on line 50 catches it and logs a fallback.

Fix — gpt_researcher/actions/retriever.py, insert before line 50:

        case "semantic_scholar":
            from gpt_researcher.retrievers.semantic_scholar.semantic_scholar import SemanticScholarRetriever
            return SemanticScholarRetriever

Bug 2: Sub-query streaming status silently dropped (HIGH)

Symptom: The frontend WebSocket shows "Planning research..." and "Complete" statuses, but no per-sub-query progress ("Searching 'X' via tavily") ever appears.

Why: v4.2.0 changed the valid event types from "streaming" to "status" / "report" / "logs" (see CHANGELOG line 12-14). The _emit_status() method on researcher.py:91 was updated correctly, but _emit_streaming() on line 100 still emits "type": "streaming" — which the frontend now silently discards.

Fix — gpt_researcher/skills/researcher.py:110, change:

"type": "streaming",   # ← LEGACY, should be "status"

to:

"type": "status",

After this fix, _emit_streaming becomes functionally identical to _emit_status, so you could also just replace the call on line 76:

# Line 76: replace _emit_streaming with _emit_status
await self._emit_status("subquery", f"Searching '{sub_query}' via {retriever_name}")

...and delete _emit_streaming() entirely (lines 100-113).

Bug 3 (minor): Dead retriever imports

The match/case block references duckduckgo, bing, and searx retrievers (lines 30-40), but those module directories are empty or missing. If anyone sets RETRIEVER to one of those values, they'll get a hard ModuleNotFoundError instead of the graceful Tavily fallback. This isn't new to v4.2.0 (the old if/elif had the same references), but it's worth noting.

Summary of fixes needed

File	Line	Change
`actions/retriever.py`	49 (before `case _:`)	Add `case "semantic_scholar":` arm
`skills/researcher.py`	110	`"streaming"` → `"status"`
`skills/researcher.py`	76 (optional cleanup)	Call `_emit_status` instead, delete `_emit_streaming`

Both fixes are one-liners. Want me to apply them?

Claude Sonnet 4.6

External Tools

Tool	Type
python3	binary
node	binary
npm	binary

Permissions

Scope	Description
filesystem:read
filesystem:write
network:outbound
process:spawn
env:read

SKILL.md

GPT Researcher Development Skill

GPT Researcher is an LLM-based autonomous agent using a planner-executor-publisher pattern with parallelized agent work for speed and reliability.

Quick Start

Basic Python Usage

from gpt_researcher import GPTResearcher
import asyncio

async def main():
    researcher = GPTResearcher(
        query="What are the latest AI developments?",
        report_type="research_report",  # or detailed_report, deep, outline_report
        report_source="web",            # or local, hybrid
    )
    await researcher.conduct_research()
    report = await researcher.write_report()
    print(report)

asyncio.run(main())

Run Servers

# Backend
python -m uvicorn backend.server.server:app --reload --port 8000

# Frontend
cd frontend/nextjs && npm install && npm run dev

Key File Locations

Need	Primary File	Key Classes
Main orchestrator	`gpt_researcher/agent.py`	`GPTResearcher`
Research logic	`gpt_researcher/skills/researcher.py`	`ResearchConductor`
Report writing	`gpt_researcher/skills/writer.py`	`ReportGenerator`
All prompts	`gpt_researcher/prompts.py`	`PromptFamily`
Configuration	`gpt_researcher/config/config.py`	`Config`
Config defaults	`gpt_researcher/config/variables/default.py`	`DEFAULT_CONFIG`
API server	`backend/server/app.py`	FastAPI `app`
Search engines	`gpt_researcher/retrievers/`	Various retrievers

Architecture Overview

User Query → GPTResearcher.__init__()
                │
                ▼
         choose_agent() → (agent_type, role_prompt)
                │
                ▼
         ResearchConductor.conduct_research()
           ├── plan_research() → sub_queries
           ├── For each sub_query:
           │     └── _process_sub_query() → context
           └── Aggregate contexts
                │
                ▼
         [Optional] ImageGenerator.plan_and_generate_images()
                │
                ▼
         ReportGenerator.write_report() → Markdown report

For detailed architecture diagrams: See references/architecture.md

Core Patterns

Adding a New Feature (8-Step Pattern)

Config → Add to gpt_researcher/config/variables/default.py
Provider → Create in gpt_researcher/llm_provider/my_feature/
Skill → Create in gpt_researcher/skills/my_feature.py
Agent → Integrate in gpt_researcher/agent.py
Prompts → Update gpt_researcher/prompts.py
WebSocket → Events via stream_output()
Frontend → Handle events in useWebSocket.ts
Docs → Create docs/docs/gpt-researcher/gptr/my_feature.md

For complete feature addition guide with Image Generation case study: See references/adding-features.md

Adding a New Retriever

# 1. Create: gpt_researcher/retrievers/my_retriever/my_retriever.py
class MyRetriever:
    def __init__(self, query: str, headers: dict = None):
        self.query = query
    
    async def search(self, max_results: int = 10) -> list[dict]:
        # Return: [{"title": str, "href": str, "body": str}]
        pass

# 2. Register in gpt_researcher/actions/retriever.py
case "my_retriever":
    from gpt_researcher.retrievers.my_retriever import MyRetriever
    return MyRetriever

# 3. Export in gpt_researcher/retrievers/__init__.py

For complete retriever documentation: See references/retrievers.md

Configuration

Config keys are lowercased when accessed:

# In default.py: "SMART_LLM": "gpt-4o"
# Access as: self.cfg.smart_llm  # lowercase!

Priority: Environment Variables → JSON Config File → Default Values

For complete configuration reference: See references/config-reference.md

Common Integration Points

WebSocket Streaming

class WebSocketHandler:
    async def send_json(self, data):
        print(f"[{data['type']}] {data.get('output', '')}")

researcher = GPTResearcher(query="...", websocket=WebSocketHandler())

MCP Data Sources

researcher = GPTResearcher(
    query="Open source AI projects",
    mcp_configs=[{
        "name": "github",
        "command": "npx",
        "args": ["-y", "@modelcontextprotocol/server-github"],
        "env": {"GITHUB_TOKEN": os.getenv("GITHUB_TOKEN")}
    }],
    mcp_strategy="deep",  # or "fast", "disabled"
)

For MCP integration details: See references/mcp.md

Deep Research Mode

researcher = GPTResearcher(
    query="Comprehensive analysis of quantum computing",
    report_type="deep",  # Triggers recursive tree-like exploration
)

For deep research configuration: See references/deep-research.md

Error Handling

Always use graceful degradation in skills:

async def execute(self, ...):
    if not self.is_enabled():
        return []  # Don't crash
    
    try:
        result = await self.provider.execute(...)
        return result
    except Exception as e:
        await stream_output("logs", "error", f"⚠️ {e}", self.websocket)
        return []  # Graceful degradation

Critical Gotchas

❌ Mistake	✅ Correct
`config.MY_VAR`	`config.my_var` (lowercased)
Editing pip-installed package	`pip install -e .`
Forgetting async/await	All research methods are async
`websocket.send_json()` on None	Check `if websocket:` first
Not registering retriever	Add to `retriever.py` match statement

Reference Documentation

Topic	File
System architecture & diagrams	references/architecture.md
Core components & signatures	references/components.md
Research flow & data flow	references/flows.md
Prompt system	references/prompts.md
Retriever system	references/retrievers.md
MCP integration	references/mcp.md
Deep research mode	references/deep-research.md
Multi-agent system	references/multi-agents.md
Adding features guide	references/adding-features.md
Advanced patterns	references/advanced-patterns.md
REST & WebSocket API	references/api-reference.md
Configuration variables	references/config-reference.md

FAQ

What does gpt-researcher do?

When should I use gpt-researcher?

Use it when you need a repeatable workflow that produces text response.

What does gpt-researcher output?

In the evaluated run it produced text response.

How do I install or invoke gpt-researcher?

Ask the agent to use this skill when the task matches its documented workflow.

Which agents does gpt-researcher support?

Agent support is inferred from the source, but not explicitly declared.

What tools, channels, or permissions does gpt-researcher need?

It uses python3, node, npm; channels commonly include text; permissions include filesystem:read, filesystem:write, network:outbound, process:spawn, env:read.

Is gpt-researcher safe to install?

Static analysis marked this skill as medium risk; review side effects and permissions before enabling it.

How is gpt-researcher different from an MCP or plugin?

A skill packages instructions and workflow conventions; tools, MCP servers, and plugins are dependencies the skill may call during execution.

Does gpt-researcher outperform not using a skill?

About gpt-researcher

When to use gpt-researcher

You need to add features or retrievers to GPT Researcher. You are debugging the backend, frontend, or async research pipeline. You want to run or integrate the GPT Researcher API and web app locally.

When gpt-researcher is not the right choice

You only want to use GPT Researcher as an end-user without modifying or running the codebase. You need a generic web research skill unrelated to the GPT Researcher project.

What it produces

Produces text response.