TileRT

Blog 2026-06-08

Two Leaps to 1000+ TPS on a 1T-Parameter Model

Two execution-paradigm leaps to 1000+ tokens/s — a persistent execution model, microsecond-scale bottleneck triage, and model–system co-design with Xiaomi MiMo.

Production 2026-05-22

TileRT in production — powering GLM-5.1 on Z.ai MaaS

GLM-5.1-highspeed is now live on Z.ai, powered by TileRT — from experimental prototype to real production.

Blog 2026-05-21

Speed as the Next Scaling Law

Inside TileRT and production-scale GLM-5.1 inference — persistent kernels, tile pipelines, and heterogeneous workers.

Release 2026-02-14

v0.1.3 — GLM-5 available in TileRT

Support full size GLM-5-FP8 in TileRT with up to 500+ user decode TPS.

Release 2026-01-26

v0.1.2 — Multi-token prediction (MTP)

Multi-Token Prediction (MTP) enabled in TileRT, reaching up to 600+ user TPS for DeepSeek-V3.2.

Release 2025-12-23

v0.1.1 — Performance optimization

Achieved 1.35x further speedup (3 ~ 4x speedup over baseline), reaching 250+ user decode TPS for DeepSeek-V3.2-Exp.

Release 2025-11-20

v0.1.0 — Initial public release

Initial public release, supporting DeepSeek-V3.2-Exp, achieving fastest inference speed among all available baselines.

Tokens, in a blink

Two Leaps to 1000+ TPS on a 1T-Parameter Model

Millisecond Intelligence at Frontier Scale

A prompt, streaming live

Sustained at scale

Speed is what comes after intelligence.

Blog & News

Two Leaps to 1000+ TPS on a 1T-Parameter Model

TileRT in production — powering GLM-5.1 on Z.ai MaaS

Speed as the Next Scaling Law

v0.1.3 — GLM-5 available in TileRT

v0.1.2 — Multi-token prediction (MTP)

v0.1.1 — Performance optimization

v0.1.0 — Initial public release

Ecosystem

TileLang

TileOPs

TileScale

TileRT