Skip to content

§ Notes

Research notes.

No. 04··Evaluation

Project Pigeon: a small model that holds long context

Results from three internal checkpoints. The 865M v7 stays within a few points of the much larger v5 on most shared benchmarks, leads on Winogrande, and passes a long-context retrieval test that both larger checkpoints fail.

Read note
No. 03

Adversarial robustness in domain-specific models: red-teaming beyond the generic benchmark.

Generic robustness benchmarks miss the failure modes that matter in specialized domains. Robustness for these models should be measured in the learned representations, not the outputs alone.

·Evaluation
No. 02

Interlocking specialized models: routing and merging domain experts for compound AI systems.

A compound system built from a shared pretrained trunk and interchangeable expert heads. The open problems sit in the routing and merging that keep the composition stable.

·Architecture
No. 01

Improving synthetic data generation bounds via constrained decoding.

A decode-time bound on the error of synthetically generated training data. Constraining the generator, rather than filtering its output, keeps a growing corpus on-distribution.

·Methods