gitzette / @karpathy / Apr 13 – Apr 19 W16
← W15 W16 W17 →
gitzette.online — open-source digest Apr 13 – Apr 20, 2026
the dispatch
@karpathy
smear and backout lambdas: the variables you didn't know needed initializing
6 commits 2 PRs merged 0 releases 6 repos Two merged PRs, five open experiments, and the quiet hum of a codebase that mostly worked anyway.
FEATURE

weight initialization finally remembers the lambdas it forgot

Smear and backout parameters were living uninitialized lives until now.

nanochat

Somewhere between model construction and first forward pass, the smear and backout lambdas were just... vibes. Undefined. @karpathy's #686 moves their initialization into init_weights() where it belongs. The kind of fix that makes you wonder how it ever worked — and slightly nervous about what else might be floating in limbo.

merged: #686#706
FEATURE

CPU users can now actually run the thing

Setuptools was missing, and so was everyone without a GPU.

If you tried running nanochat on CPU, you hit an import error before you hit a single token. #706 adds setuptools as a dependency — the kind of one-liner that unblocks an entire hardware class. Now the CUDA-less among us can at least watch the model crawl.

merged: #686#706
PENDING

NCCL watchdog timeouts meet their match in a compile warmup fix

Multi-node training kept dying before it started.

Distributed training with torch.compile has a timing problem: nodes compile at different speeds, the watchdog gets impatient, NCCL throws a timeout, everyone goes home. #722 proposes a coordinated warmup phase so all nodes finish compiling before the real work begins. Still open, but if you've ever stared at a watchdog timeout at 3am, you know this matters.

merged: #686#706
PENDING

Flash Attention 2 slots in between the fast and the fallback

Not everyone has FA3 silicon, not everyone wants SDPA slowness.

FA3 is fast but hardware-gated. SDPA is portable but not exactly blazing. #721 opens the door to Flash Attention 2 as a middle tier — faster than the fallback, less demanding than the bleeding edge. A pragmatic addition for the GPU middle class.

merged: #686#706
PENDING

geometric initialization wants to kill your warmup schedule

DPI claims faster convergence without the usual learning rate dance.

Warmup schedules exist because random initialization plus high learning rates equals NaN city. #707 proposes DPI — a geometric initialization scheme that allegedly lets you skip warmup entirely. Research-tagged and unmerged, but if it holds up, that's one less hyperparameter to babysit.

merged: #686#706
PENDING

macOS joins the single-GPU research party

MPS and CPU paths for the CUDA-less researcher.

autoresearch

autoresearch assumed CUDA or nothing. #516 opens a PR adding macOS support via MPS and CPU fallbacks. Still pending, but it means your M-series laptop might actually run automated experiments instead of just warming your desk.

6
commits
12
pull requests
0
releases
commits by repo
REPO COMMITS nanochat 6
github stars
autoresearch ★★★★★★★★★★ 76,079
nanoGPT ★★★★★★★★☆☆ 57,095
nanochat ★★★★★★★☆☆☆ 52,417
minGPT ★★★☆☆☆☆☆☆☆ 24,223
llama2.c ★★★☆☆☆☆☆☆☆ 19,436
karpathy.github.io ☆☆☆☆☆☆☆☆☆ 1,332
continued2026-W16
the dispatch
PENDING

Flash Attention 2 slots in between the fast and the fallback

Not everyone has FA3 silicon, not everyone wants SDPA slowness.

FA3 is fast but hardware-gated. SDPA is portable but not exactly blazing. #721 opens the door to Flash Attention 2 as a middle tier — faster than the fallback, less demanding than the bleeding edge. A pragmatic addition for the GPU middle class.

merged: #686#706
PENDING

geometric initialization wants to kill your warmup schedule

DPI claims faster convergence without the usual learning rate dance.

Warmup schedules exist because random initialization plus high learning rates equals NaN city. #707 proposes DPI — a geometric initialization scheme that allegedly lets you skip warmup entirely. Research-tagged and unmerged, but if it holds up, that's one less hyperparameter to babysit.

merged: #686#706
PENDING

macOS joins the single-GPU research party

MPS and CPU paths for the CUDA-less researcher.

autoresearch

autoresearch assumed CUDA or nothing. #516 opens a PR adding macOS support via MPS and CPU fallbacks. Still pending, but it means your M-series laptop might actually run automated experiments instead of just warming your desk.

6
commits
12
pull requests
0
releases
commits by repo
REPO COMMITS nanochat 6
github stars
autoresearch ★★★★★★★★★★ 76,079
nanoGPT ★★★★★★★★☆☆ 57,095
nanochat ★★★★★★★☆☆☆ 52,417
minGPT ★★★☆☆☆☆☆☆☆ 24,223
llama2.c ★★★☆☆☆☆☆☆☆ 19,436
karpathy.github.io ☆☆☆☆☆☆☆☆☆ 1,332
← Apr 6 – Apr 12
gitzette @karpathy on gitzette @karpathy on GitHub
share: post on X share on LinkedIn

Your GitHub week, turned into something worth reading.

Generate your dispatch →
gitzette.online  ·  2026 © AISlopMedia, Inc.