The Large Brain Model

The first model built on what humans experience.

LLMs were trained on what humans write. They were never trained on what humans experience. The LBM is the first foundation model built on the second—15 years of healthy human neural data, researcher-collected across 4,000+ institutions in 140+ countries.

Request API Access

Request the deck

→

1–14M parameters. State-of-the-art performance. Runs on-device.

The published EEGM2 architecture outperforms every published EEG foundation model — including models with up to 110M parameters — with linear compute scaling that fits on consumer wearable silicon. arXiv:2502.17873

The Large Brain Model

The first model built on what humans experience.

Request API Access

Request the deck

→

1–14M parameters. State-of-the-art performance. Runs on-device.

The Large Brain Model

The first model built on what humans experience.

Request API Access

Request the deck

→

1–14M parameters. State-of-the-art performance. Runs on-device.

4,000+

Institutions in the network

Paying research institutions and global enterprises, including Fortune 500

140+

Countries represented

The only EEG corpus with full non-WEIRD demographic coverage

23,000+

Scholarly works indexed by Google Scholar

Across 15 years of organic institutional adoption

Years of continuous collection

245M+ channel-minutes — 41× the most-cited EEG foundation model

Built on the Only Corpus That Matters

Every major EEG dataset was collected from patients in clinical settings. Those that were suffering from the WEIRD problem — Western, Educated, Industrialized, Rich, Democratic populations only. Neither can train a general foundation model. Emotiv's can.

Dcode.AI is built on the Emotiv corpus — the result of 15 years of sensor network deployment, 4,000+ institutional partnerships, and researcher collected, event-marked data from day one. Every session event-marked through EmotivPRO — a single software protocol across every institution and every country.

Maximum population diversity. Uniform technical structure. The corpus cannot be acquired, replicated, or assembled with capital alone. Minimum replication time: 15+ years.

Model

Training hours

Channel-minutes

Subjects

LaBraM (ICLR 2024 Spotlight)

~2,500 hrs

~4.8M

3,857

EEGFormer (full TUH corpus)

—

~32.5M

3,394

REVE (2025) — largest public-only effort

~60,000 hrs

~50–70M est.

24,274

Emotiv LBM (proprietary + public)

300,000+ hrs

245M+

170,000+

* Emotiv LBM training hours (300,000+) reflects the total corpus including all major public EEG datasets plus 240,000+ hours of proprietary signal. Channel-minutes figure (245M+) reflects the same combined corpus.

What Corpus Quality Produces in the Model

Corpus property

What it produces in the model

What it means for the device

Full device-range training (2ch–32ch)

Spatial representations that transfer across channel densities

A model running on 2-channel in-ear hardware benefits from what it learned at 32-channel resolution

170,000+ unique subjects, 140+ countries

Robust inter- and intra-subject normalization

Works accurately out of the box for a global consumer base — no per-user calibration

Thousands of experimental paradigms

Generalized cognitive state representations across the full waking spectrum

Supports the full application roadmap from a single model

Large event-marked subset

The model learns cognitive transitions, not just static states

State-change detection is accurate — the device responds to the moment attention shifts

On-device architecture (1–14M parameters)

Frontier accuracy at a size that fits on-device silicon

Real-time inference on the device itself — no cloud round-trip, no latency, no raw neural data exposure

"The reason no general-purpose neural foundation model exists is not a model architecture problem. It is a data problem. The corpus required to train one has never existed — until now."

— Tan Le, CEO & Founder, Emotiv / Dcode.AI

Why Now

AGI cannot be reached by scraping the internet alone. The frontier labs have exhausted high-quality text. Synthetic data shows diminishing returns. The race is on for a new modality — and cognitive signal is the only one that requires a physical sensor network to capture.

LLMs learn from text. VLMs learn from images. The emerging class of Biological Large Models learns from the body — physiological signal from wrist-worn sensors, and cognitive signal from the brain. The LBM is the cognitive BLM: the modality that heart rate monitors, sleep trackers, and wrist-worn wearables cannot reach, no matter how much data they accumulate. Attention, emotion, fatigue, and intent require a different sensor, a different corpus, and a different model.

The investment wave

Record capital is flowing into the category

More capital has flowed into neurotech over the past five quarters than in any comparable period in the field's history — BCI hardware, neural wearables, and brain-computer interfaces are attracting category-defining investment. Every dollar going into device hardware is building demand for the intelligence layer above it. That layer is not yet built. That is the opportunity.

The benchmark

The field has its pre-ImageNet moment

Meta FAIR published NeuralBench (May 2026) — 36 EEG tasks, 14 architectures, 94 datasets. Foundation models already outperform traditional ML across most EEG tasks. Models are still small, datasets narrow. The unlock is scale and corpus diversity. That corpus exists. It is ours. The LBM will be submitted to NeuralBench when full-corpus training is complete.

The platform wave

Platform health AI creates the market

Every major AI platform now has a health product built on physiological wearable data. When heart rate, sleep, and movement flow through platform AI, the differentiated layer becomes the one they cannot reach: what the brain is doing. The platform health AI wave does not compete with the LBM — it creates the market the LBM licenses into.

What the LBM Learns

The LBM learns the patterns of what the brain does — across cognitive load, attention, fatigue, emotion, motor intent, and neurological state. Five capabilities distinguish it from every other published EEG model.

Cross-device cognitive transfer

Representations learned from one EEG hardware configuration generalize across different sensor layouts, channel counts, and wearable form factors with minimal or no retraining. Because the model is trained across Emotiv's full device range — from 2-channel in-ear through 32-channel research-grade — it learns a unified representation of cognitive state that is not tied to any single electrode configuration. A partner's device, whether a 2-channel earbud, a 6-channel headband, or a 16-channel AR headset, can pass its signal to Emotiv-FM and receive accurate cognitive state outputs without bespoke retraining.

Longitudinal baseline modelling

The LBM is trained on data from the same individuals observed over time — not just a population snapshot. This enables a qualitatively different kind of inference: not just "what is this person's cognitive state right now" but "how has this person's cognitive baseline shifted over the past six months." Intra-individual drift — gradual attention decay, sleep-architecture changes, processing speed trends — is only visible when the same brain is observed across weeks and months, not in a single session. Cross-sectional models cannot detect it. The LBM can.

Robust real-world inference

Emotiv-FM reconstructs full cognitive-state representations from incomplete or degraded channel sets, using surrounding signal context to fill gaps. The model degrades gracefully rather than failing when electrode contact is imperfect — a property that matters for any device worn in everyday life.

No per-user calibration

Inter- and intra-subject normalization is learned from 170,000+ subjects across 140+ countries. The model works accurately out of the box for a global consumer base — no onboarding friction, no per-user setup required.

On-device at frontier accuracy

The 1–14M parameter architecture runs inference directly on consumer wearable silicon without a cloud round-trip. For applications where latency, privacy, or connectivity matter, on-device inference is not a compromise — it is the design intent.

The EEG Data Fragmentation Crisis

The EEG neurotech industry is at an inflection point. Investment is at record levels, hardware form factors are maturing, and the first generation of neural wearables is reaching consumers. The constraint is no longer the hardware. It is the data layer beneath it — where every device speaks a different language, every dataset lives in isolation, and every team rebuilds the same preprocessing infrastructure from scratch.

Inconsistent & Noisy Data

EEG devices vary wildly in their specifications — from 2-channel consumer wearables to 256-channel research systems. Each uses different sensor placements, input impedances, voltage resolutions, and signal-to-noise ratios. This hardware heterogeneity means findings from one device rarely transfer to another, fragmenting the knowledge base and limiting scalability.

Task-Specific Models

Current machine learning approaches in EEG are narrowly focused, with separate models for sleep staging, emotion detection, motor imagery, and seizure prediction. Each model starts from scratch, unable to leverage representations learned from other tasks. This siloed approach wastes resources and prevents breakthroughs across applications.

Inter- and Intra-Subject Variability

Brain signals are inherently personal. Inter-subject differences — cortical folding patterns, skull thickness, individual psychological profiles — often overwhelm the neural signatures researchers seek to measure. Intra-subject variability from circadian rhythms, hormonal cycles, fatigue, and aging compounds the challenge further.

Stuck on Data Preparation

Researchers report spending 60–80% of their time on artifact removal, signal filtering, and data alignment — tedious work that must be repeated for every new dataset and device. This leaves minimal time for actual discovery and development, creating a bottleneck that affects academia and industry alike.

Introducing the Large Brain Model

We're building the industry's first universal EEG foundation model — the intelligence layer every neurotech company can build on. By unifying fragmented neural data into a single intelligent system, the LBM does for brain signals what GPT did for language.

Scientific Validation

Our work is backed by research published in top-tier academic venues, demonstrating superior performance, generalizability, and efficiency.

EEG2Rep: Enhancing Self-supervised EEG Representation Through Informative Masked Inputs

Published at the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'24)

🏆 KDD '24 Audience Appreciation Award — 1st Place

Novel EEG Architecture

First work to redesign the JEPA architecture specifically for EEG — a teacher–student model tailored to brain recording noise and variability.

Informative Masking

New masking strategy enabling rich representations without device-specific calibration.

Scalable and Noise-Resilient

+10.96% accuracy and +8.18% AUCROC gains over baselines. Strong robustness to noise.

Access Paper

EEGM2: An Efficient Mamba-2-Based Self-Supervised Framework for Long-Sequence EEG Modeling

2025 — Submitted for publication

Efficiency

Mamba-2 architecture offers linear computational complexity for long EEG sequences — outperforms transformer models that struggle with memory and speed.

SOTA Performance

State-of-the-art accuracy on long-sequence tasks.

Robustness

Spatiotemporal loss and multi-branch input embedding improve robustness across subjects and sequence lengths.

Access Paper

EEG-X: Device-Agnostic Foundation Model for EEG

2025 — Submitted for publication

Device-Agnostic Design

Generalizes across EEG devices and channel layouts using location-based channel embeddings covering 10-05, 10-10, and 10-20 systems.

Zero-Shot Inference

Pretrain and infer on completely unseen headsets — truly universal EEG analysis.

Proven Performance

+2.41% accuracy and +2.32% AUCROC over state-of-the-art baselines.

SAMBA: Toward a Long-Context EEG Foundation Model via Spatial Embedding and Differential Mamba

2025 — Submitted for publication

Scalable Long-Context Modeling

Linear time complexity for efficient modelling of long EEG recordings (e.g. 100 seconds) without high memory usage.

Device-Agnostic Spatial Embedding

SAIE uses 3D electrode coordinates to unify data from different montages and channel counts.

Superior Performance

Outperforms state-of-the-art methods across 13 datasets while maintaining low memory and fast inference speed.

SpellerSSL: Self-Supervised Learning with P300 Aggregation for Speller BCIs

2025 — Submitted for publication

First SSL framework for P300 spellers

Significantly reduces time-consuming calibration and improves robustness across subjects.

SOTA Performance

94% character recognition rate with only 7 repetitions and highest information transfer rate of 21.86 bits/min.

How It Works

Emotiv-FM is the inference layer built on the Large Brain Model. A two-stage architecture splits responsibility between the device and the cloud — on-device inference for latency-critical outputs, cloud inference for richer downstream applications.

On-device

Mamba-2 U-Net (1–4M parameters)

A lightweight state-space architecture purpose-built for streaming EEG. Linear compute scaling and a memory footprint that fits on consumer wearable silicon. Runs the latency-critical primitives — attention, cognitive load, fatigue, drowsiness, intent — directly on the device, with no cloud round-trip and no raw neural data leaving the user.

Cloud

EEGX transformer (~14M parameters)

A larger transformer trained on the full multi-channel corpus, hosted on Dcode.AI infrastructure. Handles the representational work behind richer downstream tasks — neurological state classification, longitudinal trend analysis, multi-modal fusion — and is the surface that partners access through the API.

Any compatible device streams signal in

From a 2-channel in-ear earbud to a 32-channel research cap. The model normalizes across channel counts, electrode positions, and form factors — the partner does not retrain for each new device.

Emotiv-FM produces standardised cognitive state outputs

Device-specific noise, inter-subject variability, and artifacts are suppressed. The output is a clean, model-ready representation of cognitive state — attention, load, fatigue, emotional valence, motor intent — calibrated against the full LBM corpus.

Partners build the application layer

Dashboards, classifiers, adaptive interfaces, longitudinal cohort tools — anything that consumes cognitive state. Because the upstream representation is consistent, the application layer sees brain-state differences, not inter-subject differences or hardware noise.

How Partners Access the LBM

Dcode.AI is designed to serve four distinct partner types. The intelligence layer is the same — the access model is calibrated to the partner's hardware, scale, and use case.

Device Manufacturers

Embedded SDK licensing

The LBM runs inside your hardware via an embedded inference SDK — on-device, no cloud dependency, no raw neural data leaving the device. Designed for consumer wearables, AR glasses, hearables, and industrial hardware. Per-unit licensing scales with shipment volume.

Platform & Enterprise

Annual platform licensing

Annual licensing for neurotech platforms, pharmaceutical research, clinical research organizations, enterprise productivity tools, and defense and government contractors. Private on-premises deployment available for data-sovereignty requirements.

Developers & Researchers

Usage-based API

One endpoint. Standardized outputs: attention, cognitive load, fatigue, emotional valence, motor intent. Hardware-agnostic — Dcode.AI handles EEG preprocessing, artifact rejection, and cross-device normalization. No domain expertise required to get started.

Direct & Consumer

Direct platform access

Users who want more than their device manufacturer exposes access Dcode.AI directly — personal cognitive data, custom protocols, and the ability to build personal applications on their own neural signal.

AI-powered neuro research

The LBM's training across thousands of experimental paradigms enables predicted neural response distributions for stimuli — packaging designs, content concepts, product formulations — without participant recruitment or hardware deployment. Results in hours rather than weeks, at a fraction of conventional EEG study costs. Built for consumer insights, pharmaceutical research, and advertising effectiveness.

Enquire

What the LBM Enables

The LBM is a general-purpose foundation model for cognitive state. The applications below are not aspirational — they are the categories of product that the corpus and the model architecture make tractable today.

Application category

What the model provides

Why the LBM corpus matters

Consumer wearables — attention, focus, fatigue

On-device cognitive state stream usable in earbuds, headbands, AR glasses

No per-user calibration; works across 140+ countries of demographic variation

Workplace safety and operator monitoring

Real-time drowsiness and cognitive overload detection in industrial settings

Event-marked corpus produces accurate state-transition detection, not lagging averages

Clinical and neurological assessment

Standardized cognitive markers for screening, monitoring, and longitudinal studies

15 years of paradigm diversity covers the full waking cognitive spectrum

Adaptive learning and education

Continuous attention and comprehension signal for learning platforms

Subject diversity ensures generalization across learner populations, not WEIRD-only

Gaming, XR, and human-computer interaction

Intent and engagement signal as a control surface for next-generation interfaces

Cross-device transfer means a single model serves the full hardware ecosystem

AI-powered neuro research

Predicted neural response distributions for stimuli — packaging, content, formulations — without participant recruitment

Training across thousands of paradigms produces response predictions that generalize across the full waking cognitive spectrum

Research infrastructure for academic and pharma

A foundation model the field can fine-tune against, replacing one-off pipelines

The 4,000+ institution network is already the de facto standard substrate

Multi-modal AI fusion

EEG as a complementary signal to vision, language, and physiological models

Standardized, model-ready representations make EEG composable with other AI stacks

Request API Access

What we bring

Partners do not need to build the corpus, the model, or the infrastructure. We bring the full stack — and a 15-year head start that cannot be replicated on a product timeline.

Layer

What Dcode.AI provides

Corpus

300,000+ hours of EEG signal across 170,000+ subjects in 140+ countries — the largest and most demographically diverse training corpus in the field

Foundation model

Emotiv-FM: an on-device Mamba-2 U-Net (1–4M parameters) paired with a cloud EEGX transformer (~14M parameters)

Inference infrastructure

On-device runtime SDK plus the cloud inference API, with the engineering organization to maintain both at production scale

Hardware reference

15 years of EEG hardware design (EPOC through MN8) as the reference for sensor placement, noise envelope, and form-factor constraints — partners can use Emotiv hardware, design their own, or both

Scientific authority

23,000+ scholarly works indexed by Google Scholar across the user network and 5 peer-reviewed foundation-model papers (EEG2Rep, SpellerSSL, EEGM2, EEG-X, SAMBA) — the de facto standard the field cites

Regulatory and ethical posture

User-owned data architecture, on-device inference for privacy-critical workloads, and 15 years of institutional review experience across the global research network

Meet Our Team

We are a group of neuroscientists, AI researchers, and engineers committed to building the future of non-invasive brain decoding.

Tan

CEO

Geoff

CTO

Patrick

VP of Software Engineering

Navid

Foundation Model Architect & Lead

Jiazhen

AI Research Scientist

Nam

Director of Corpus Curation

Nikolas

Director of Neuroscience & Research

Cuong

Senior ML Ops Engineer

Quoc

ML Ops Engineer

More About the Team

How we work together

A partnership with Dcode.AI is a structured pathway, not a one-off integration. The stages below are how we have engaged with every serious partner conversation to date.

Scoping conversation

A short technical discussion to map the partner's hardware, target form factor, and the cognitive primitives the product requires. Output: a written scope and a recommendation on which model surface (on-device, cloud, or both) the partner will consume.

Technical evaluation

Access to the API and the on-device SDK against the partner's hardware and a representative data sample. Output: measured accuracy on the partner's target tasks, on the partner's silicon — not a benchmark in our environment.

Integration and co-development

A scoped engineering engagement: SDK integration, model tuning where warranted, joint validation on the partner's user population. Co-development is the norm — the LBM gets sharper with every serious integration, and partners get a model calibrated to their deployment.

Production and ongoing model access

Licensed access to the production model, with versioned releases, SLAs, and a model-improvement loop that flows back into the corpus. Partners ship on a foundation model that continues to improve underneath their product.

The user owns their data.

Our architecture is built around on-device inference for privacy-critical workloads and a user-owned data model end-to-end. Partners do not need to take on the regulatory and reputational risk of routing raw neural signal through their own infrastructure — and end users do not need to surrender their brain data to use the product.

Brainwear by Emotiv

The consumer product built on the LBM — in-ear EEG that measures cognitive state directly, not inferred from heart rate or movement. Every Brainwear session adds longitudinal real-world data to the training corpus: naturalistic, everyday neural signal in the modality with the least public dataset coverage in the field.

Learn About Brainwear

Get Early Access

Be among the first to build on the Large Brain Model. Request API access for updates on our platform launch — or request the deck if you are evaluating an investment.

Request API access

For researchers & developers

Request the deck →

For investors

dcode.ai · built on the Emotiv corpus · emotiv.com

Corpus

Model

Research

Partners

Request Access