read-arxiv-paper
Verifiedby karpathy · Updated Apr 16, 2026
Use this skill when asked to read an arxiv paper given an arxiv URL
See It In Action
Let me read the paper and explore the codebase in parallel.
Good, main.tex is the entrypoint. Let me read it and the sections.
Now let me read all the section files in parallel.
Now let me check existing knowledge files and read the relevant nanochat source to make concrete connections.
Now I have everything I need. Let me write the summary.
Summary written to knowledge/summary_conditional_memory_engram.md.
Here's the sprint-planning readout:
Paper: "Conditional Memory via Scalable Lookup" (DeepSeek-AI, arXiv 2601.07372)
TL;DR: Transformers waste early layers reconstructing static patterns (entities, idioms) through expensive computation. Engram adds a parallel O(1) N-gram lookup table — hash bigrams/trigrams into a large embedding table, gate the result against the current hidden state, inject via residual. When ~20-25% of spare parameter budget goes to Engram instead of MoE experts, you get +3-5 points across knowledge, reasoning, and code/math benchmarks for free (iso-FLOPs).
Why it matters for nanochat specifically:
-
Low-hanging architectural win. nanochat is a vanilla GPT-2 decoder — no MoE, no fancy embeddings. Adding a small Engram table (100K entries x 64-dim = ~25MB) after layer 2 would let the model offload common bigram/trigram patterns and reclaim its limited depth (4-12 layers) for harder reasoning. Implementation is straightforward: hash function +
nn.Embedding+ sigmoid gate + residual add. -
Effectively extends context window. The paper shows Engram frees attention from local pattern matching, making multi-query NIAH jump from 84.2 to 97.0. For nanochat's tight 1024-token context, this could make the effective window "feel" significantly longer without changing
max_context_tokens. -
Feeds directly into memory.py TODOs. The gating signal (
alpha_t) identifies which tokens complete static patterns — natural candidates for landmark/sink tokens in KV-cache retention. This is a concrete step toward the "importance-based retention" already on the roadmap. -
Orthogonal to FlashAttention work. Both are valuable; Engram improves what the model does with its depth, FlashAttention improves how efficiently it computes attention. They compose well.
What it doesn't solve: cross-session episodic memory, dynamic summarization of old turns — those remain separate problems.
Permissions
| Scope | Description |
|---|---|
| filesystem:read | |
| filesystem:write | |
| network:outbound |
SKILL.md
You will be given a URL of an arxiv paper, for example:
https://www.arxiv.org/abs/2601.07372
Part 1: Normalize the URL
The goal is to fetch the TeX Source of the paper (not the PDF!), the URL always looks like this:
https://www.arxiv.org/src/2601.07372
Notice the /src/ in the url. Once you have the URL:
Part 2: Download the paper source
Fetch the url to a local .tar.gz file. A good location is ~/.cache/nanochat/knowledge/{arxiv_id}.tar.gz.
(If the file already exists, there is no need to re-download it).
Part 3: Unpack the file in that folder
Unpack the contents into ~/.cache/nanochat/knowledge/{arxiv_id} directory.
Part 4: Locate the entrypoint
Every latex source usually has an entrypoint, such as main.tex or something like that.
Part 5: Read the paper
Once you've found the entrypoint, Read the contents and then recurse through all other relevant source files to read the paper.
Part 6: Report
Once you've read the paper, produce a summary of the paper into a markdown file at ./knowledge/summary_{tag}.md. Notice that 1) use the local knowledge directory here (it's easier for me to open and reference here), not in ~/.cache, and 2) generate some reasonable tag like e.g. conditional_memory or whatever seems appropriate given the paper. Probably make sure that the tag doesn't exist yet so you're not overwriting files.
As for the summary itself, remember that you're processing this paper within the context of the nanochat repository, so most often we will be interested in how to apply the paper and its lessons to the nanochat project. Therefore, you should feel free to "remind yourself" of the related nanochat code by reading the relevant parts, and then explicitly make the connection of how this paper might relate to nanochat or what are things we might be inspired about or try.
FAQ
What does read-arxiv-paper do?
Use this skill when asked to read an arxiv paper given an arxiv URL
When should I use read-arxiv-paper?
Use it when you need a repeatable workflow that produces text report.
What does read-arxiv-paper output?
In the evaluated run it produced text report.
How do I install or invoke read-arxiv-paper?
Ask the agent to use this skill when the task matches its documented workflow.
Which agents does read-arxiv-paper support?
Agent support is inferred from the source, but not explicitly declared.
What tools, channels, or permissions does read-arxiv-paper need?
It uses no extra tools; channels commonly include text; permissions include filesystem:read, filesystem:write, network:outbound.
Is read-arxiv-paper safe to install?
Static analysis marked this skill as medium risk; review side effects and permissions before enabling it.
How is read-arxiv-paper different from an MCP or plugin?
A skill packages instructions and workflow conventions; tools, MCP servers, and plugins are dependencies the skill may call during execution.
Does read-arxiv-paper outperform not using a skill?
About read-arxiv-paper
When to use read-arxiv-paper
When you have an arXiv URL and want the agent to read the source files rather than the PDF. When you want a markdown summary of a paper saved into the local repository knowledge folder. When you want the paper's ideas connected to the current codebase or project context.
When read-arxiv-paper is not the right choice
When the paper is not hosted on arXiv or does not have downloadable source available. When you only need a quick abstract-level summary and do not want local files created.
What it produces
Produces text report.