§ Notes

Research notes.

No. 04·Jun 5, 2026·Evaluation

Project Pigeon: a small model that holds long context

Results from three internal checkpoints. The 865M v7 stays within a few points of the much larger v5 on most shared benchmarks, leads on Winogrande, and passes a long-context retrieval test that both larger checkpoints fail.

Read note

No. 03

Adversarial robustness in domain-specific models: red-teaming beyond the generic benchmark.

Generic robustness benchmarks miss the failure modes that matter in specialized domains. Robustness for these models should be measured in the learned representations, not the outputs alone.

Mar 28, 2026·Evaluation

No. 02

Interlocking specialized models: routing and merging domain experts for compound AI systems.

A compound system built from a shared pretrained trunk and interchangeable expert heads. The open problems sit in the routing and merging that keep the composition stable.

Feb 27, 2026·Architecture

No. 01

Improving synthetic data generation bounds via constrained decoding.

A decode-time bound on the error of synthetically generated training data. Constraining the generator, rather than filtering its output, keeps a growing corpus on-distribution.

Feb 24, 2026·Methods