Search: #ai domain:epoch.ai

2 posts

2025-10-01

21882m Academic

What will AI look like in 2030?

epoch.ai/blog/what-will-ai-look-like-in-2030

AI as a key technology: By 2030, AI is predicted to be a cornerstone technology integrated into every aspect of the economy and human-computer interaction, assuming current scaling trends continue.
Accelerated Scientific R&D: AI is expected to significantly speed up scientific research by assisting with tasks like implementing complex software from natural language, formalizing mathematical proofs, and answering intricate biological questions. AI assistants are anticipated to become as common in science as coding assistants are for software engineers.
Massive Investment and Challenges: The development of advanced AI models in 2030 will demand unprecedented investments of hundreds of billions of dollars and vast amounts of electrical power (gigawatts). However, the article suggests that the economic returns from AI-driven productivity will justify these costs and that bottlenecks like data availability and scaling costs will likely be overcome.
Lagging Societal Impact: While AI capabilities will advance rapidly, their deployment and societal impact may be delayed in certain sectors. For example, the lengthy process of clinical trials and regulatory approvals in pharmaceutical R&D means that drugs approved by 2030 are unlikely to have benefited from the advanced AI of that era. In contrast, fields with shorter iteration cycles and fewer regulations, like software engineering, are expected to be dramatically transformed.

🌐 epoch.ai, ai

2025-01-30

18582m Academic

How has DeepSeek improved the Transformer architecture? | Epoch AI

epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture

DeepSeek v3, an open-weight model achieving state-of-the-art benchmark performance with significantly less training compute than comparable models, incorporates three major architectural improvements to the vanilla Transformer.

First, Multi-head Latent Attention (MLA) addresses the prohibitive cost of the Key-Value (KV) cache in long-context inference by representing key and value vectors as the product of two matrices involving a lower-dimensional latent vector, effectively implementing a low-rank compression of the KV cache across all attention heads to maintain quality while drastically reducing size, unlike less effective methods like grouped-query attention.

Second, DeepSeekMoE introduces several Mixture-of-Experts (MoE) innovations to mitigate "routing collapse": they replace auxiliary loss terms with a mechanism of expert-specific bias terms that are dynamically adjusted to ensure a balanced load without compromising model performance, and they utilize Shared Experts that are always routed to, reserving load-balancing only for the specialized "routed experts," thereby allowing the model to efficiently store common information without forcing a uniform distribution across all experts.

Third, Multi-token Prediction allows the model to predict the next token and the subsequent token in a single forward pass by feeding the first prediction's residual stream vector into an additional Transformer block, enabling a multi-token prediction objective during training for better performance and facilitating speculative decoding to nearly double inference speed.

🌐 epoch.ai, ai, architecture, transformer