RWKV Language Model
www.rwkv.comRWKV (Receptance Weighted Key Value) is an open-source language-model architecture that combines the parallel training efficiency of Transformers with the low-inference-cost, constant-memory behaviour of RNNs. This entry lists the current RWKV ecosystem, centred on version 7 of the model family.
Core artefacts include:
(1) RWKV-LM – the main training framework and ongoing research branch;
(2) RWKV App – cross-platform (Android, iOS, Windows, macOS, Linux) consumer interface for local inference;
(3) Albatross – a highly-optimised inference engine that reaches >10 000 tokens s⁻¹ on an RTX 5090 for a 7-billion-parameter fp16 model at batch-size 960;
(4) RWKV-Runner – desktop GUI that exposes a REST/HTTP API;
(5) two PyPI packages: the reference implementation (slower, for compatibility) and a performance-oriented variant;
(6) RWKV-PEFT – parameter-efficient fine-tuning library that allows 7 B-parameter adaptation on a single GPU with only 9 GB VRAM;
(7) RWKV-server – WebGPU-based inference server supporting NVIDIA, AMD and Intel GPUs with quantisation formats nf4/int8/fp16.
Model weights are distributed in three flavours: raw RWKV-7 checkpoints, GGUF format for llama.cpp-style loaders, and Ollama-ready GGUF bundles. Academic references, a community wiki (AI-generated but human-curated) chronicling architectural evolution from v1 to v7, and links to over 600 third-party projects complete the landscape. The combined tooling aims to make RWKV-7 competitive with mainstream Transformer models while offering linear-time generation, modest RAM usage and frictionless local deployment on consumer hardware.