1 bookmark for 2025-03-11

1947.

RWKV Language Model

www.rwkv.com

RWKV (Receptance Weighted Key Value) is an open-source language-model architecture that combines the parallel training efficiency of Transformers with the low-inference-cost, constant-memory behaviour of RNNs. This entry lists the current RWKV ecosystem, centred on version 7 of the model family.

Core artefacts include:

(1) RWKV-LM – the main training framework and ongoing research branch;

(2) RWKV App – cross-platform (Android, iOS, Windows, macOS, Linux) consumer interface for local inference;

(3) Albatross – a highly-optimised inference engine that reaches >10 000 tokens s⁻¹ on an RTX 5090 for a 7-billion-parameter fp16 model at batch-size 960;

(4) RWKV-Runner – desktop GUI that exposes a REST/HTTP API;

(5) two PyPI packages: the reference implementation (slower, for compatibility) and a performance-oriented variant;

(6) RWKV-PEFT – parameter-efficient fine-tuning library that allows 7 B-parameter adaptation on a single GPU with only 9 GB VRAM;

(7) RWKV-server – WebGPU-based inference server supporting NVIDIA, AMD and Intel GPUs with quantisation formats nf4/int8/fp16.

Model weights are distributed in three flavours: raw RWKV-7 checkpoints, GGUF format for llama.cpp-style loaders, and Ollama-ready GGUF bundles. Academic references, a community wiki (AI-generated but human-curated) chronicling architectural evolution from v1 to v7, and links to over 600 third-party projects complete the landscape. The combined tooling aims to make RWKV-7 competitive with mainstream Transformer models while offering linear-time generation, modest RAM usage and frictionless local deployment on consumer hardware.