I trained a 75M parameter LLM from scratch on 18B tokens and it beats a model almost double its size

u//u/cakes_and_candles 2026-05-30 · 17:46 UTC

I trained a 75M parameter LLM from scratch on 18B tokens and it beats a model almost double its size

Article automatically generated from technical news.

I trained a small language model from scratch called KeyLM. It is 75M params, decoder-only, and there is a pretrained base, an instruction-tuned version, and a GGUF. On IFEval (instruction following) the 75M instruct model scores slightly higher than the original SmolLM-135M-Instruct at about half the parameters and a fraction of the training data. (SmolLM was pre trained on 600B tokens and SmolLM2 on 2T tokens, but KeyLM is only pretrained on 18B tokens.) Model Params IFEval -----------

Fonte originale

I trained a 75M parameter LLM from scratch on 18B tokens and it beats a model almost double its size

I trained a 75M parameter LLM from scratch on 18B tokens and it beats a model almost double its size

Related Articles

G7 agrees on shared language around open-source AI, open weights AI

Anthropic Mythos vs OpenAI GPT‑5.5‑Cyber: Architecting with Hacking‑Capable AI Models Safely

ogulcancelik /herdr

OpenBMB /VoxCPM

nvidia/Qwen3.6-35B-A3B-NVFP4 · Hugging Face