CLAUDE.md With 25,000 Stars: Karpathy's AI Rules

How a single Markdown file built from Andrej Karpathy's observations became the most-starred repo in the Claude Code ecosystem – and what actually holds up in practice.

A GitHub repo that contains nothing but a single CLAUDE.md file currently sits at over 25,000 stars. No framework, no CLI, no Python package. Just text. The project andrej-karpathy-skills distills Andrej Karpathy's public observations on recurring LLM failure modes into four behavioral rules for Claude Code.

That a Markdown file of all things gets this level of attention is no accident. It's a signal about where the real work with AI coding assistants is happening right now: not in whether the model can write code, but in how you shape its behavior so the code is actually usable.

Karpathy's Diagnosis

The file picks up three observations Karpathy has made in his posts on LLM coding:

"The models make wrong assumptions on your behalf and just run along with them without checking. They don't manage their confusion, don't seek clarifications, don't surface inconsistencies, don't present tradeoffs, don't push back when they should."

"They really like to overcomplicate code and APIs, bloat abstractions, don't clean up dead code... implement a bloated construction over 1000 lines when 100 would do."

"They still sometimes change/remove comments and code they don't sufficiently understand as side effects, even if orthogonal to the task."

Anyone using AI assistants in production recognizes these patterns. The interesting claim the repo makes: if the mistakes are predictable, you can prevent them with the right instructions.

The Four Principles

Think Before Coding

Make assumptions explicit. When something is ambiguous, name multiple interpretations instead of silently picking one. Push back if a simpler path exists. Stop and ask when unclear. The antidote to silent misinterpretation.

Simplicity First

Only the code that solves the problem. No features that weren't asked for. No abstractions for single-use code. No "flexibility" on speculation. No error handling for impossible cases. If 200 lines could be 50, rewrite it.

Surgical Changes

Touch only what you must. Match the existing style even if you'd do it differently. No "cleanup" at the edges of the task. Every changed line must trace back to the user's request.

Goal-Driven Execution

Success criteria before code. "Fix the bug" becomes "write a test that reproduces it, then make it pass". Strong criteria let the model iterate on its own. Weak criteria ("make it work") force constant back-and-forth.

What the File Actually Solves – and What It Doesn't

The principles address real, recurring failure modes. Rule 3 alone saves significant time in practice: once you've watched an agent fix a typo and "improve" half the file's formatting in passing, you understand why surgical precision needs to be a principle.

At the same time, a CLAUDE.md is not a cure-all. The rules are generic. They know nothing about your architecture, your tests, your deployment pipeline. For more complex tasks – PDF processing, browser automation, domain-specific workflows – four abstract principles aren't enough. That's where the more structured Claude Code Skills come in, loaded on demand with concrete tools and scripts.

CLAUDE.md vs. Skills: When to Use Which

CLAUDE.md

• Always in context
• General behavior
• Project-wide
• Best for style, discipline, do's and don'ts

Skills

• Loaded on demand
• Concrete workflows
• Task-specific
• Best for tools, scripts, specific domains

The two mechanisms aren't mutually exclusive. The opposite: a lean CLAUDE.md with behavioral rules plus a set of Skills for the actual tooling is the setup that has been working most reliably in my projects. The file keeps the agent from overstepping. The Skills give it the concrete capabilities it needs for each task.

Why 25,000 Stars

The success of this repo tells a story bigger than the file itself. Developers are shifting focus. "Use AI to write code" was the phase of the last two years. "Shape the AI's behavior so the code is actually good" is the phase we're entering now.

The tool for that turns out to be surprisingly plain: text. A clear instruction, a well-formulated principle, an explicit success test often do more than the next framework. The best tools in the Claude Code ecosystem right now aren't software. They're well-crafted instructions.

If you work with AI agents in production, don't treat CLAUDE.md as optional docs. Treat it as part of the architecture. What goes into it co-decides what ends up in the repository.

Tools used:

Claude Code GitHub Repo

Working on AI-assisted development workflows and wondering how to get your agents' behavior reliably under control? Let's talk. I help build CLAUDE.md and Skill setups that actually work in real projects.

A CLAUDE.md With 25,000 Stars