

The Large Brain Model
LLMs were trained on what humans write. They were never trained on what humans experience. The LBM is the first foundation model built on the second—15 years of healthy human neural data, researcher-collected across 4,000+ institutions in 140+ countries.

1–14M parameters. State-of-the-art performance. Runs on-device.
The published EEGM2 architecture outperforms every published EEG foundation model — including models with up to 110M parameters — with linear compute scaling that fits on consumer wearable silicon. arXiv:2502.17873
Every major EEG dataset was collected from patients in clinical settings. Those that weren't suffer from the WEIRD problem — Western, Educated, Industrialised, Rich, Democratic populations only. Neither can train a general foundation model. Emotiv's can.
4,000+
Institutions in the network
Paying research institutions and global enterprises, including Fortune 500
140+
Countries represented
The only EEG corpus with full non-
WEIRD demographic coverage
23,000+
Scholarly works indexed by Google Scholar
Across 15 years of organic institutional adoption
15
Years of continuous collection
245M+ channel-minutes — 41× the most-cited EEG foundation model
Dcode.AI is built on the Emotiv corpus — the result of 15 years of sensor network deployment, 4,000+ institutional partnerships, and researcher collected, event-marked data from day one. Every session event-marked through EmotivPRO — a single software protocol across every institution and every country.
Maximum population diversity. Uniform technical structure. The corpus cannot be acquired, replicated, or assembled with capital alone. Minimum replication time: 15+ years.
Every other EEG foundation model trains on public datasets only. The LBM trains on all of those datasets plus 240,000+ hours of proprietary signal that no other model can access.
Model
Training hours
Channel-minutes
Subjects
LaBraM (ICLR 2024 Spotlight)
~2,500 hrs
~4.8M
3,857
EEGFormer (full TUH corpus)
—
~32.5M
3,394
REVE (2025) — largest public-only effort
~60,000 hrs
~50–70M est.
24,274
Emotiv LBM (proprietary + public)
300,000+ hrs
245M+
170,000+
* Emotiv LBM training hours (300,000+) reflects the total corpus including all major public EEG datasets plus 240,000+ hours of proprietary signal. Channel-minutes figure (245M+) reflects the same combined corpus.
Corpus property
What it produces in the model
What it means for the device
Full device-range training (2ch–32ch)
Spatial representations that transfer across channel densities
A model running on 2-channel in-ear hardware benefits from what it learned at 32-channel resolution
170,000+ unique subjects, 140+ countries
Robust inter- and intra-subject normalisation
Works accurately out of the box for a global consumer base — no per-user calibration
Thousands of experimental paradigms
Generalised cognitive state representations across the full waking spectrum
Supports the full application roadmap from a single model
Large event-marked subset
The model learns cognitive transitions, not just static states
State-change detection is accurate — the device responds to the moment attention shifts
On-device architecture (1–14M parameters)
Frontier accuracy at a size that fits on-device silicon
Real-time inference on the device itself — no cloud round-trip, no latency, no raw neural data exposure
"The reason no general-purpose neural foundation model exists is not a model architecture problem. It is a data problem. The corpus required to train one has never existed — until now."
— Tan Le, CEO & Founder, Emotiv / Dcode.AI
AGI cannot be reached by scraping the internet alone. The frontier labs have exhausted high-quality text. Synthetic data shows diminishing returns. The race is on for a new modality — and cognitive signal is the only one that requires a physical sensor network to capture.
LLMs learn from text. VLMs learn from images. The emerging class of Biological Large Models learns from the body — physiological signal from wrist-worn sensors, and cognitive signal from the brain. The LBM is the cognitive BLM: the modality that heart rate monitors, sleep trackers, and wrist-worn wearables cannot reach, no matter how much data they accumulate. Attention, emotion, fatigue, and intent require a different sensor, a different corpus, and a different model.
1
The investment wave
Record capital is flowing into the category
More capital has flowed into neurotech over the past five quarters than in any comparable period in the field's history — BCI hardware, neural wearables, and brain-computer interfaces are attracting category-defining investment. Every dollar going into device hardware is building demand for the intelligence layer above it. That layer is not yet built. That is the opportunity.
2
The benchmark
The field has its pre-ImageNet moment
Meta FAIR published NeuralBench (May 2026) — 36 EEG tasks, 14 architectures, 94 datasets. Foundation models already outperform traditional ML across most EEG tasks. Models are still small, datasets narrow. The unlock is scale and corpus diversity. That corpus exists. It is ours. The LBM will be submitted to NeuralBench when full-corpus training is complete.
3
The platform wave
Platform health AI creates the market
Every major AI platform now has a health product built on physiological wearable data. When heart rate, sleep, and movement flow through platform AI, the differentiated layer becomes the one they cannot reach: what the brain is doing. The platform health AI wave does not compete with the LBM — it creates the market the LBM licenses into.
The LBM learns the patterns of what the brain does — across cognitive load, attention, fatigue, emotion, motor intent, and neurological state. Five capabilities distinguish it from every other published EEG model.
Cross-device cognitive transfer
Representations learned from one EEG hardware configuration generalise across different sensor layouts, channel counts, and wearable form factors with minimal or no retraining. Because the model is trained across Emotiv's full device range — from 2-channel in-ear through 32-channel research-grade — it learns a unified representation of cognitive state that is not tied to any single electrode configuration. A partner's device, whether a 2-channel earbud, a 6-channel headband, or a 16-channel AR headset, can pass its signal to Emotiv-FM and receive accurate cognitive state outputs without bespoke retraining.
Longitudinal baseline modelling
The LBM is trained on data from the same individuals observed over time — not just a population snapshot. This enables a qualitatively different kind of inference: not just "what is this person's cognitive state right now" but "how has this person's cognitive baseline shifted over the past six months." Intra-individual drift — gradual attention decay, sleep-architecture changes, processing speed trends — is only visible when the same brain is observed across weeks and months, not in a single session. Cross-sectional models cannot detect it. The LBM can.
Robust real-world inference
Emotiv-FM reconstructs full cognitive-state representations from incomplete or degraded channel sets, using surrounding signal context to fill gaps. The model degrades gracefully rather than failing when electrode contact is imperfect — a property that matters for any device worn in everyday life.
No per-user calibration
Inter- and intra-subject normalisation is learned from 170,000+ subjects across 140+ countries. The model works accurately out of the box for a global consumer base — no onboarding friction, no per-user setup required.
On-device at frontier accuracy
The 1–14M parameter architecture runs inference directly on consumer wearable silicon without a cloud round-trip. For applications where latency, privacy, or connectivity matter, on-device inference is not a compromise — it is the design intent.
The EEG neurotech industry is at an inflection point. Investment is at record levels, hardware form factors are maturing, and the first generation of neural wearables is reaching consumers. The constraint is no longer the hardware. It is the data layer beneath it — where every device speaks a different language, every dataset lives in isolation, and every team rebuilds the same preprocessing infrastructure from scratch.
Inconsistent & Noisy Data
EEG devices vary wildly in their specifications — from 2-channel consumer wearables to 256-channel research systems. Each uses different sensor placements, input impedances, voltage resolutions, and signal-to-noise ratios. This hardware heterogeneity means findings from one device rarely transfer to another, fragmenting the knowledge base and limiting scalability.
Task-Specific Models
Current machine learning approaches in EEG are narrowly focused, with separate models for sleep staging, emotion detection, motor imagery, and seizure prediction. Each model starts from scratch, unable to leverage representations learned from other tasks. This siloed approach wastes resources and prevents breakthroughs across applications.
Inter- and Intra-Subject Variability
Brain signals are inherently personal. Inter-subject differences — cortical folding patterns, skull thickness, individual psychological profiles — often overwhelm the neural signatures researchers seek to measure. Intra-subject variability from circadian rhythms, hormonal cycles, fatigue, and aging compounds the challenge further.
Stuck on Data Preparation
Researchers report spending 60–80% of their time on artifact removal, signal filtering, and data alignment — tedious work that must be repeated for every new dataset and device. This leaves minimal time for actual discovery and development, creating a bottleneck that affects academia and industry alike.

We're building the industry's first universal EEG foundation model — the intelligence layer every neurotech company can build on. By unifying fragmented neural data into a single intelligent system, the LBM does for brain signals what GPT did for language.
Our work is backed by research published in top-tier academic venues, demonstrating superior performance, generalisability, and efficiency.
EEG2Rep: Enhancing Self-supervised EEG Representation Through Informative Masked Inputs
Published at the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'24)
🏆 KDD '24 Audience Appreciation Award — 1st Place
Novel EEG Architecture
First work to redesign the JEPA architecture specifically for EEG — a teacher–student model tailored to brain recording noise and variability.
Informative Masking
New masking strategy enabling rich representations without device-specific calibration.
Scalable and Noise-Resilient
+10.96% accuracy and +8.18% AUCROC gains over baselines. Strong robustness to noise.
EEGM2: An Efficient Mamba-2-Based Self-Supervised Framework for Long-Sequence EEG Modeling
2025 — Submitted for publication
Efficiency
Mamba-2 architecture offers linear computational complexity for long EEG sequences — outperforms transformer models that struggle with memory and speed.
SOTA Performance
State-of-the-art accuracy on long-sequence tasks.
Robustness
Spatiotemporal loss and multi-branch input embedding improve robustness across subjects and sequence lengths.
EEG-X: Device-Agnostic Foundation Model for EEG
2025 — Submitted for publication
Device-Agnostic Design
Generalises across EEG devices and channel layouts using location-based channel embeddings covering 10-05, 10-10, and 10-20 systems.
Zero-Shot Inference
Pretrain and infer on completely unseen headsets — truly universal EEG analysis.
Proven Performance
+2.41% accuracy and +2.32% AUCROC over state-of-the-art baselines.
SAMBA: Toward a Long-Context EEG Foundation Model via Spatial Embedding and Differential Mamba
2025 — Submitted for publication
Scalable Long-Context Modeling
Linear time complexity for efficient modelling of long EEG recordings (e.g. 100 seconds) without high memory usage.
Device-Agnostic Spatial Embedding
SAIE uses 3D electrode coordinates to unify data from different montages and channel counts.
Superior Performance
Outperforms state-of-the-art methods across 13 datasets while maintaining low memory and fast inference speed.
SpellerSSL: Self-Supervised Learning with P300 Aggregation for Speller BCIs
2025 — Submitted for publication
First SSL framework for P300 spellers
ignificantly reduces time-consuming calibration and improves robustness across subjects.
SOTA Performance
94% character recognition rate with only 7 repetitions and highest information transfer rate of 21.86 bits/min.
Emotiv-FM is the inference layer built on the Large Brain Model. A two-stage architecture splits responsibility between the device and the cloud — on-device inference for latency-critical outputs, cloud inference for richer downstream applications.
On-device
Mamba-2 U-Net (1–4M parameters)
A lightweight state-space architecture purpose-built for streaming EEG. Linear compute scaling and a memory footprint that fits on consumer wearable silicon. Runs the latency-critical primitives — attention, cognitive load, fatigue, drowsiness, intent — directly on the device, with no cloud round-trip and no raw neural data leaving the user.
Cloud
EEGX transformer (~14M parameters)
A larger transformer trained on the full multi-channel corpus, hosted on Dcode.AI infrastructure. Handles the representational work behind richer downstream tasks — neurological state classification, longitudinal trend analysis, multi-modal fusion — and is the surface that partners access through the API.
1
Any compatible device streams signal in
From a 2-channel in-ear earbud to a 32-channel research cap. The model normalises across channel counts, electrode positions, and form factors — the partner does not retrain for each new device.
2
Emotiv-FM produces standardised cognitive state outputs
Device-specific noise, inter-subject variability, and artifacts are suppressed. The output is a clean, model-ready representation of cognitive state — attention, load, fatigue, emotional valence, motor intent — calibrated against the full LBM corpus.
3
Partners build the application layer
Dashboards, classifiers, adaptive interfaces, longitudinal cohort tools — anything that consumes cognitive state. Because the upstream representation is consistent, the application layer sees brain-state differences, not inter-subject differences or hardware noise.
Dcode.AI is designed to serve four distinct partner types. The intelligence layer is the same — the access model is calibrated to the partner's hardware, scale, and use case.
Device Manufacturers
Embedded SDK licensing
The LBM runs inside your hardware via an embedded inference SDK — on-device, no cloud dependency, no raw neural data leaving the device. Designed for consumer wearables, AR glasses, hearables, and industrial hardware. Per-unit licensing scales with shipment volume.
Platform & Enterprise
Annual platform licensing
Annual licensing for neurotech platforms, pharmaceutical research, clinical research organisations, enterprise productivity tools, and defence and government contractors. Private on-premises deployment available for data-sovereignty requirements.
Developers & Researchers
Usage-based API
One endpoint. Standardised outputs: attention, cognitive load, fatigue, emotional valence, motor intent. Hardware-agnostic — Dcode.AI handles EEG preprocessing, artifact rejection, and cross-device normalisation. No domain expertise required to get started.
Direct & Consumer
Direct platform access
Users who want more than their device manufacturer exposes access Dcode.AI directly — personal cognitive data, custom protocols, and the ability to build personal applications on their own neural signal.
AI-powered neuro research
The LBM's training across thousands of experimental paradigms enables predicted neural response distributions for stimuli — packaging designs, content concepts, product formulations — without participant recruitment or hardware deployment. Results in hours rather than weeks, at a fraction of conventional EEG study costs. Built for consumer insights, pharmaceutical research, and advertising effectiveness.
The LBM is a general-purpose foundation model for cognitive state. The applications below are not aspirational — they are the categories of product that the corpus and the model architecture make tractable today.
Application category
What the model provides
Why the LBM corpus matters
Consumer wearables — attention, focus, fatigue
On-device cognitive state stream usable in earbuds, headbands, AR glasses
No per-user calibration; works across 140+ countries of demographic variation
Workplace safety and operator monitoring
Real-time drowsiness and cognitive overload detection in industrial settings
Event-marked corpus produces accurate state-transition detection, not lagging averages
Clinical and neurological assessment
Standardised cognitive markers for screening, monitoring, and longitudinal studies
15 years of paradigm diversity covers the full waking cognitive spectrum
Adaptive learning and education
Continuous attention and comprehension signal for learning platforms
Subject diversity ensures generalisation across learner populations, not WEIRD-only
Gaming, XR, and human-computer interaction
Intent and engagement signal as a control surface for next-generation interfaces
Cross-device transfer means a single model serves the full hardware ecosystem
AI-powered neuro research
Predicted neural response distributions for stimuli — packaging, content, formulations — without participant recruitment
Training across thousands of paradigms produces response predictions that generalise across the full waking cognitive spectrum
Research infrastructure for academic and pharma
A foundation model the field can fine-tune against, replacing one-off pipelines
The 4,000+ institution network is already the de facto standard substrate
Multi-modal AI fusion
EEG as a complementary signal to vision, language, and physiological models
Standardised, model-ready representations make EEG composable with other AI stacks
Partners do not need to build the corpus, the model, or the infrastructure. We bring the full stack — and a 15-year head start that cannot be replicated on a product timeline.
Layer
What Dcode.AI provides
Corpus
300,000+ hours of EEG signal across 170,000+ subjects in 140+ countries — the largest and most demographically diverse training corpus in the field
Foundation model
Emotiv-FM: an on-device Mamba-2 U-Net (1–4M parameters) paired with a cloud EEGX transformer (~14M parameters)
Inference infrastructure
On-device runtime SDK plus the cloud inference API, with the engineering organisation to maintain both at production scale
Hardware reference
15 years of EEG hardware design (EPOC through MN8) as the reference for sensor placement, noise envelope, and form-factor constraints — partners can use Emotiv hardware, design their own, or both
Scientific authority
23,000+ scholarly works indexed by Google Scholar across the user network and 5 peer-reviewed foundation-model papers (EEG2Rep, SpellerSSL, EEGM2, EEG-X, SAMBA) — the de facto standard the field cites
Regulatory and ethical posture
User-owned data architecture, on-device inference for privacy-critical workloads, and 15 years of institutional review experience across the global research network
We are a group of neuroscientists, AI researchers, and engineers committed to
building the future of non-invasive brain decoding.
A partnership with Dcode.AI is a structured pathway, not a one-off integration. The stages below are how we have engaged with every serious partner conversation to date.
1
Scoping conversation
A short technical discussion to map the partner's hardware, target form factor, and the cognitive primitives the product requires. Output: a written scope and a recommendation on which model surface (on-device, cloud, or both) the partner will consume.
2
Technical evaluation
Access to the API and the on-device SDK against the partner's hardware and a representative data sample. Output: measured accuracy on the partner's target tasks, on the partner's silicon — not a benchmark in our environment.
3
Integration and co-development
A scoped engineering engagement: SDK integration, model tuning where warranted, joint validation on the partner's user population. Co-development is the norm — the LBM gets sharper with every serious integration, and partners get a model calibrated to their deployment.
4
Production and ongoing model access
Licensed access to the production model, with versioned releases, SLAs, and a model-improvement loop that flows back into the corpus. Partners ship on a foundation model that continues to improve underneath their product.
The user owns their data.
Our architecture is built around on-device inference for privacy-critical workloads and a user-owned data model end-to-end. Partners do not need to take on the regulatory and reputational risk of routing raw neural signal through their own infrastructure — and end users do not need to surrender their brain data to use the product.
Brainwear by Emotiv
The consumer product built on the LBM — in-ear EEG that measures cognitive state directly, not inferred from heart rate or movement. Every Brainwear session adds longitudinal real-world data to the training corpus: naturalistic, everyday neural signal in the modality with the least public dataset coverage in the field.
Be among the first to build on the Large Brain Model. Request API
access for updates on our platform launch — or request the deck if you
are evaluating an investment.
Request the deck →
For investors
dcode.ai · built on the Emotiv corpus · emotiv.com
© 2026 Dcode AI. All rights reserved.







