Tag ai

36 bookmarks have this tag.

/

2025-03-20

2007Δ6m Academic

Yann LeCun discusses AI’s limitations

youtu.be/qvNCVYkHKfg

This summary outlines the key insights from a conversation with Yann LeCun, Meta's Chief AI Scientist, on the current state and future direction of artificial intelligence.

The Limitations of Large Language Models (LLMs)

The discussion begins by addressing why generative AI, despite having ingested a vast corpus of human knowledge, has not produced novel scientific discoveries. LeCun draws a clear distinction between current AI systems, predominantly Large Language Models (LLMs) like those powering chatbots, and the type of AI capable of genuine innovation.

LeCun argues that LLMs are fundamentally designed for retrieval and regurgitation. They excel at producing text that conforms to the statistical patterns of their training data, making them useful for summarizing and retrieving existing information. However, they are incapable of true invention or reasoning in their current form. He likens the language-producing part of the human brain (Broca's area) to an LLM—a small component that translates abstract thought into words. True intelligence and reasoning, however, occur in a different, much larger part of the brain where we build mental models of the world. Humans think in abstract representations, not language, and this is the capability current AI lacks.

Techniques like "Chain of Thought" give LLMs the appearance of reasoning by forcing them to generate more text, thus devoting more computation to a problem. However, LeCun dismisses this as a superficial trick, not a form of genuine reasoning. True reasoning often involves a search through a space of potential solutions, a mechanism that is entirely absent in LLMs and must be crudely "bolted on."

Diminishing Returns and Investment Risks

LeCun believes that the current paradigm of scaling up LLMs is hitting a point of diminishing returns. The industry has nearly exhausted the available public text data for training, and the costs of acquiring or generating new, high-quality data are ballooning for marginal improvements. He states unequivocally that simply scaling up LLMs will not lead to human-level AI.

This creates a potential "timeline mismatch" with the massive investments pouring into the field. LeCun distinguishes between two types of investment. Investment in infrastructure for inference—the computational power needed to serve existing AI models to billions of users, as Meta plans to do—is a justifiable business decision. However, investment based on the promise that current LLM-based companies will achieve AGI within a few years is misguided and risks creating a backlash or another "AI winter" if these exaggerated expectations are not met. He draws parallels to the overhyped expert systems of the 1980s and IBM Watson, which both failed to deliver on their grand promises.

The Path Forward: World Models and a New Paradigm

To overcome these limitations, LeCun outlines a new paradigm focused on building systems that can learn "world models." This requires developing AI that possesses four key characteristics currently missing from LLMs:

  • Understanding of the physical world.

  • Persistent memory.

  • The ability to reason.

  • The ability to plan.

The key to this is for AI to learn from rich, non-textual data like video, which contains vastly more information about how the world works than text alone. A child, by the age of four, has processed more sensory data (primarily visual) than the largest LLMs have processed in text tokens. This early learning builds an intuitive understanding of physics and common sense—the foundation of true intelligence.

JEPA: A Non-Generative Architecture for Learning

LeCun’s proposed solution is a non-generative architecture called the Joint Embedding Predictive Architecture (JEPA). He explains that generative models, which try to predict every single pixel in the next frame of a video, are doomed to fail because the world is too unpredictable in its details. One cannot predict the exact path of every water droplet when a glass is spilled.

Instead of predicting pixels, JEPA learns to create an abstract representation of the world and makes predictions within that abstract space. The model is shown part of an input (like a video) and tasked with predicting the abstract representation of the missing part. By ignoring irrelevant, unpredictable details, the system can learn the underlying, predictable principles of how the world functions.

This approach, demonstrated in models like V-JEPA (Video JEPA), allows a system to learn intuitive physics from observation. When shown a physically impossible event (e.g., an object vanishing), the model's prediction error spikes, indicating it has learned a coherent model of reality. This ability to model the world and predict the outcomes of actions is the foundation for genuine planning and reasoning.

The Crucial Role of Open Source

LeCun concludes by championing open source as the primary engine of progress in AI. He argues that no single company, no matter how large, has a monopoly on good ideas. Innovation is happening globally, as evidenced by foundational work like ResNet (from Beijing) and recent models like DeepSeek. The open-source community allows for a diversity of ideas to be shared and built upon, accelerating progress for everyone. Furthermore, for businesses deploying AI, open-source models like Llama are often cheaper, more secure, and more controllable than proprietary APIs, making them the preferred choice for production systems.

2025-03-19

34Δ6m Academic

Your brain does not process information and it is not a computer | Aeon Essays

aeon.co/essays/your-brain-does-not-process-information-and-it-is-not-a-computer

The essay “Your brain does not process information and it is not a computer” by Robert Epstein argues that the dominant information‑processing (IP) metaphor for human cognition is a misleading and ultimately futile analogy. Epstein begins by observing that, despite intensive research, scientists will never discover a literal copy of Beethoven’s Fifth Symphony, words, pictures, or any other environmental stimulus stored in the brain. He stresses that while the brain is certainly not empty, it does not contain the kinds of discrete data structures—memories, representations, algorithms, or symbolic registers—that characterize digital computers.

He contrasts the newborn’s innate capacities (reflexes, basic perceptual biases, and powerful learning mechanisms) with the absence of any pre‑installed ‘software’, ‘data’, or ‘hardware‑like’ components that would allow it to operate as an information processor. The argument proceeds to a brief tutorial on how computers truly work: information is encoded as bits, organized into bytes, stored in physical memory, retrieved, copied, and transformed according to explicit programs. Human cognition, by contrast, lacks such encoding, storage, and retrieval mechanisms. The brain does not hold symbolic representations of a dollar bill, a poem, or a melody that can be fetched from a memory register; instead, experience changes the brain’s structure in a way that enables future performance without the need for “retrieval”.

Epstein traces the historical lineage of metaphors for intelligence over the past two millennia: clay‑infused spirits, hydraulic humours, mechanical automata, electrical/chemical analogies, and finally the computer metaphor that emerged after the 1940s. Each metaphor reflected the most advanced technology of its era, but all were eventually superseded. He points out that the modern IP view—the idea that the brain processes symbols like a computer—originated with early cognitive scientists such as George Miller, who applied information theory to the mind, and was cemented by works like John von Neumann’s The Computer and the Brain (1958). Since then, billions of dollars and thousands of researchers have pursued a framework that assumes the brain is an information processor, producing a massive literature that seldom questions its basic premise.

To illustrate the inadequacy of the IP model, Epstein describes a classroom exercise where a student draws a dollar bill first from memory and then with the bill present. The memory‑based drawing is poor, despite the student having seen the bill countless times. This demonstrates that the brain does not store a precise visual “representation” that can be retrieved; rather, exposure to the bill altered the brain’s dynamics, making the student better able to reproduce it when the stimulus is present. He argues that memory is not a retrieval of stored data but a re‑enactment of prior experience, and that even the notion of “memory stored in individual neurons” is untenable—functional neuroimaging shows distributed, often massive, networks engaged during recall.

Epstein then outlines an alternative, “anti‑representational” or embodied cognition perspective. Experience shapes the brain in orderly ways, allowing us to perform tasks (sing a song, recite a poem, catch a baseball) without invoking internal symbolic models. The baseball example from McBeath et al. (1995) shows that a player catches a fly ball by maintaining a simple optical relationship with the ball rather than calculating trajectories via internal representations. This view aligns with scholars such as Anthony Chemero, who reject computational accounts and emphasize direct organism‑world interaction.

The essay warns that clinging to the IP metaphor not only misguides scientific research but also fuels speculative futurist claims—e.g., Ray Kurzweil, Stephen Hawking, and Randal Koene’s predictions of mind uploading and digital immortality. Since no “software” or memory banks exist in the brain, such scenarios are fundamentally impossible. Moreover, the unique, history‑dependent changes each brain undergoes mean that even identical experiences produce distinct neural configurations. This “uniqueness problem”, illustrated by Frederic Bartlett’s work on memory distortion, underscores the impossibility of a universal brain‑computer mapping.

Epstein highlights the practical consequences of the metaphor’s dominance: massive projects like the EU’s Human Brain Project, which promised a full‑brain simulation by 2023, have floundered, exposing how the IP assumption can lead to unrealistic expectations and waste of resources. He concludes by urging a shift away from the entrenched computational metaphor toward a more faithful understanding of the brain as a dynamic, embodied system that changes through interaction with its environment. The call to “hit the DELETE key” is a metaphorical plea to discard the outdated information‑processing view and to pursue neuroscience free of its intellectual baggage.

Overall, the essay challenges the foundational assumptions of contemporary cognitive neuroscience, argues for an embodied, anti‑representational framework, and cautions against the hype surrounding brain‑computer convergence.

9

The Happy Path

zelikman.me/blog/thehappypath.pdf

the happy path: on human agency and AI interfaces

eric zelikman

2025-03-11

1947Δ2m Academic

RWKV Language Model

www.rwkv.com

RWKV (Receptance Weighted Key Value) is an open-source language-model architecture that combines the parallel training efficiency of Transformers with the low-inference-cost, constant-memory behaviour of RNNs. This entry lists the current RWKV ecosystem, centred on version 7 of the model family.

Core artefacts include:

(1) RWKV-LM – the main training framework and ongoing research branch;

(2) RWKV App – cross-platform (Android, iOS, Windows, macOS, Linux) consumer interface for local inference;

(3) Albatross – a highly-optimised inference engine that reaches >10 000 tokens s⁻¹ on an RTX 5090 for a 7-billion-parameter fp16 model at batch-size 960;

(4) RWKV-Runner – desktop GUI that exposes a REST/HTTP API;

(5) two PyPI packages: the reference implementation (slower, for compatibility) and a performance-oriented variant;

(6) RWKV-PEFT – parameter-efficient fine-tuning library that allows 7 B-parameter adaptation on a single GPU with only 9 GB VRAM;

(7) RWKV-server – WebGPU-based inference server supporting NVIDIA, AMD and Intel GPUs with quantisation formats nf4/int8/fp16.

Model weights are distributed in three flavours: raw RWKV-7 checkpoints, GGUF format for llama.cpp-style loaders, and Ollama-ready GGUF bundles. Academic references, a community wiki (AI-generated but human-curated) chronicling architectural evolution from v1 to v7, and links to over 600 third-party projects complete the landscape. The combined tooling aims to make RWKV-7 competitive with mainstream Transformer models while offering linear-time generation, modest RAM usage and frictionless local deployment on consumer hardware.

2025-01-30

18582m Academic

How has DeepSeek improved the Transformer architecture? | Epoch AI

epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture

DeepSeek v3, an open-weight model achieving state-of-the-art benchmark performance with significantly less training compute than comparable models, incorporates three major architectural improvements to the vanilla Transformer.

First, Multi-head Latent Attention (MLA) addresses the prohibitive cost of the Key-Value (KV) cache in long-context inference by representing key and value vectors as the product of two matrices involving a lower-dimensional latent vector, effectively implementing a low-rank compression of the KV cache across all attention heads to maintain quality while drastically reducing size, unlike less effective methods like grouped-query attention.

Second, DeepSeekMoE introduces several Mixture-of-Experts (MoE) innovations to mitigate "routing collapse": they replace auxiliary loss terms with a mechanism of expert-specific bias terms that are dynamically adjusted to ensure a balanced load without compromising model performance, and they utilize Shared Experts that are always routed to, reserving load-balancing only for the specialized "routed experts," thereby allowing the model to efficiently store common information without forcing a uniform distribution across all experts.

Third, Multi-token Prediction allows the model to predict the next token and the subsequent token in a single forward pass by feeding the first prediction's residual stream vector into an additional Transformer block, enabling a multi-token prediction objective during training for better performance and facilitating speculative decoding to nearly double inference speed.

2024-08-30

1514Δ

Systema Robotica

www.lesswrong.com/posts/iy8XANvSr9u3czm7o/systema-robotica
2