Supra-50M Launched: A Compact, High-Performance Causal Language Model for Local Deployment

SupraLabs has introduced Supra-50M, a highly efficient 50-million-parameter causal language model. Built upon a Llama-style architecture and trained on 20 billion tokens of educational web text, this model demonstrates competitive performance across several key benchmarks despite its small parameter count, positioning it as a powerful candidate for local, resource-constrained inference.

Model Overview and Architecture

Supra-50M is presented as the inaugural model in the "SupraLabs Scaling Up Plan." It is available in two configurations: Base and Instruct, providing flexibility for both fine-tuning and direct conversational use. The model adheres to a decoder-only transformer architecture, utilizing a Llama-style design for its foundation.

Technical Specifications

The compact nature of Supra-50M is achieved through careful hyperparameter tuning. Key architectural details include:

  • Architecture: Llama (decoder-only transformer)
  • Parameters: Approximately 50M
  • Vocab Size: 32,000
  • Hidden Size: 512
  • Attention Heads: 8
  • GQA (Key-Value Heads): 4
  • Max Position Embeddings: 1,024
  • Precision: bfloat16

Training Methodology and Data

The training regimen emphasized high-quality educational content. The model was trained on a massive dataset of 20 billion tokens sourced from HuggingFaceFW/fineweb-edu (specifically, the `sample-100BT` subset).

Data and Tokenization

The training process utilized a custom Byte-Level BPE tokenizer, trained from scratch on a sample of 500,000 documents from the fineweb-edu dataset. This custom tokenizer ensures optimal token representation for the specific educational corpus.

Property Value
Dataset HuggingFaceFW/fineweb-edu (`sample-100BT`)
Total Tokens 20 Billion
Sequence Length 1,024 tokens
Tokenizer Type ByteLevelBPETokenizer
Special Tokens <s>, <pad>, </s>, <unk>, <mask>

Performance Benchmarks and Efficiency

A critical aspect of Supra-50M is its efficiency. The model demonstrates competitive or superior results on several academic and logical benchmarks when compared to significantly larger open-source models.

Benchmark Comparison

The table below compares Supra-50M against larger models such as GPT-2 (124M), SmolLM-135M, and OpenELM-270M:

← Back to homepage

Automatically generated with AI News Aggregator — llama.cpp

Benchmark Supra-50M (Ours) GPT-2 (124M) SmolLM-135M OpenELM-270M