Discussion about this post

User's avatar
Neural Foundry's avatar

The 32B sweet spot makess a lot of sense from a practcal standpoint. Most researchers dont have access to enterprise level infrastructure, so keeping it efficient enough to run on a single node opens up a lot more experimentation possibilities. The transparencey around training data and checkpoints is really what differentiates these models.

Expand full comment

No posts