We're building the future of brain decoding to generalize across people, contexts, and applications.

The first Large Brain Model.
Built on the only corpus that matters.

LLMs were trained on what humans write. They were never trained on what humans experience. The LBM is the first model built on the second — 15 years of healthy human neural data, researcher-collected across 4,000+ institutions in 140+ countries.

The intelligence layer every neurotech company can build on.

Request API access

4.5M parameters. State-of-the-art performance. Runs on-device.

The published EEGM2 architecture outperforms every published EEG foundation model — including models with up to 110M parameters — with linear compute scaling that fits on consumer wearable silicon. arXiv:2502.17873

Built on the only corpus that matters

Every major EEG dataset was collected from patients in clinical settings. Those that weren't suffer from the WEIRD problem — Western, Educated, Industrialised, Rich, Democratic populations only. Neither can train a general foundation model. Emotiv's can.

4,000+

institutions in the network

Paying research institutions and global enterprises, including Fortune 500

140+

countries represented

The only EEG corpus with full non-WEIRD demographic coverage

20,000+

Google Scholar citations

Across 15 years of organic institutional adoption

15

years of continuous collection

197.5 million channel-minutes — 41× the most-cited EEG foundation model

Dcode.AI is built on the Emotiv corpus — the result of 15 years of sensor network deployment, 4,000+ institutional partnerships, and researcher-collected, event-marked data from day one. Every session event-marked through EmotivPRO — a single software protocol across every institution and every country. Maximum population diversity. Uniform technical structure. The corpus cannot be acquired, replicated, or assembled with capital alone. Minimum replication time: 15+ years.

Why now

AGI cannot be reached by scraping the internet alone. OpenAI, Google, and Anthropic have exhausted high-quality text. Synthetic data shows diminishing returns. The race is on for a new modality — and cognitive signal is the only one that requires a physical sensor network to capture.

$2.36 billion flowed into neurotech in five quarters. Neuralink raised $650M. BrainCo raised $286M. Merge Labs raised $250M at $850M valuation. The category is real. The intelligence layer is not built.

Meta FAIR published NeuralBench (May 2026) — 36 EEG tasks, 14 architectures, 94 datasets. Foundation models already outperform traditional ML across most EEG tasks. The field is in its pre-ImageNet moment: models are still small, datasets narrow. The unlock is scale and corpus diversity. That corpus exists. It is ours.

The window is 18–24 months.

What the LBM learns

The LBM learns the patterns of what the brain does — across cognitive load, attention, fatigue, emotion, motor intent, and neurological state. Four capabilities distinguish it from every other published EEG model.

Cross-device cognitive transfer

Representations learned from one EEG hardware configuration generalise across different sensor layouts, channel counts, and wearable form factors with minimal or no retraining. Because the model is trained across Emotiv's full device range — from 2-channel in-ear through 32-channel research-grade — it learns a unified representation of cognitive state that is not tied to any single electrode configuration. A partner's device, whether a 2-channel earbud, a 6-channel headband, or a 16-channel AR headset, can pass its signal to Emotiv-FM and receive accurate cognitive state outputs without bespoke retraining.

No per-user calibration

Inter- and intra-subject normalisation is learned from 150,000+ subjects across 140+ countries. The model works accurately out of the box for a global consumer base — no onboarding friction, no per-user setup required.

Robust real-world inference

Emotiv-FM reconstructs full cognitive-state representations from incomplete or degraded channel sets, using surrounding signal context to fill gaps. The model degrades gracefully rather than failing when electrode contact is imperfect — a property that matters for any device worn in everyday life.

On-device at frontier accuracy

The 1–14M parameter architecture runs inference directly on consumer wearable silicon without a cloud round-trip. For applications where latency, privacy, or connectivity matter, on-device inference is not a compromise — it is the design intent.

How the corpus compares

Every other EEG foundation model trains on public datasets only. The LBM trains on all of those datasets plus 240,000+ hours of proprietary signal that no other model can access.

Model Training hours Channel-minutes Subjects
LaBraM (ICLR 2024 Spotlight) ~2,500 hrs ~4.8M 3,857
EEGFormer (full TUH corpus) ~32.5M 3,394
REVE (2025) — largest public-only effort ~60,000 hrs ~50–70M est. 24,274
Emotiv LBM (proprietary + public) 300,000+ hrs 245M+ 170,000+

What corpus quality produces

Corpus property What it produces in the model What it means for the device
Full device-range training (2ch–32ch) Spatial representations that transfer across channel densities A model running on 2-channel in-ear hardware benefits from what it learned at 32-channel resolution
150,000+ unique subjects, 140+ countries Robust inter- and intra-subject normalisation Works accurately out of the box for a global consumer base — no per-user calibration
Thousands of experimental paradigms Generalised cognitive state representations across the full waking spectrum Supports the full application roadmap from a single model, without bespoke retraining for each feature
Large event-marked subset The model learns cognitive transitions, not just static states State-change detection is accurate — the device responds to the moment attention shifts, not an average over a window
On-device architecture (1–14M parameters) Frontier accuracy at a size that fits on-device silicon Real-time inference on the device itself — no cloud round-trip, no latency, no raw neural data exposure

"The reason no general-purpose neural foundation model exists is not a model architecture problem. It is a data problem. The corpus required to train one has never existed — until now."

— Tan Le, CEO & Founder, Emotiv / Dcode.AI

The EEG Data Fragmentation Crisis

The EEG neurotech industry stands at a critical inflection point. Despite billions in investment and decades of research, the promise of brain-computer interfaces remains largely unfulfilled. Companies struggle to scale beyond proof-of-concept, researchers can't replicate findings across labs, and developers waste months reinventing the wheel. The root cause? A fractured ecosystem where every device speaks its own language, every dataset lives in isolation, and every breakthrough stays locked in its silo. This fragmentation isn't just slowing progress—it's preventing the entire industry from reaching its transformative potential in healthcare, wellness, and human augmentation.

Inconsistent & Noisy Data

The technical reality is stark: EEG devices vary wildly in their specifications. From 2-channel consumer wearables to 256-channel research systems. Each device uses different sensor placements (montages), input impedances, voltage resolutions, and signal-to-noise ratios. This hardware heterogeneity means findings from one device rarely transfer to another, effectively fragmenting the knowledge base and limiting the scalability of solutions.

Task-Specific Models

Current machine learning approaches in EEG are narrowly focused, with separate models for sleep staging, emotion detection, motor imagery, and seizure prediction. Each model starts from scratch, unable to leverage representations learned from other tasks. This siloed approach wastes computational resources and prevents the cross-pollination of insights that could accelerate breakthroughs across multiple applications.

Inter- and Intra-Subject Variability

Brain signals are inherently personal. Inter-subject differences including unique cortical folding patterns, skull thickness variations, and individual psychological profiles often overwhelm the neural signatures researchers seek to measure. Meanwhile, intra-subject variability stemming from brain changes over time from aging, development, learning, and even short-term physiological fluctuations caused by circadian rhythms, hormonal cycles, medication effects, fatigue levels, and emotional states can make the same person's brain appear drastically different from hour to hour, compromising repeatability and reliability.

Stuck on Data Preparation

The preprocessing burden is crushing innovation. Researchers report spending 60-80% of their time on artifact removal, signal filtering, and data alignment. This is tedious work that must be repeated for every new dataset and device. This leaves minimal time for actual discovery and development, creating a bottleneck that affects academia and industry alike.

Introducing the Large Brain Model

We're building the industry's first universal EEG foundation model — the intelligence layer every neurotech company can build on. By unifying fragmented neural data into a single intelligent system, the LBM aims to do for brain signals what GPT did for language.

Our Impact & Scientific Validation

Our work is backed by groundbreaking research published in top-tier academic venues, demonstrating our models' superior performance, generalizability, and efficiency.

EEG2Rep: Enhancing Self-supervised EEG Representation Through Informative Masked Inputs

Published at the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24)

🏆 Winner of the Audience Appreciation Award (1st Place)

  • Novel EEG Architecture: EEG2Rep is the first work to redesign the JEPA architecture specifically for EEG, creating a teacher–student model tailored to handle the noise and variability of brain recordings.
  • Informative Masking: We introduce a new masking strategy that enables rich and generalized representations without relying on data augmentations or device-specific calibration.
  • Scalable and Noise-Resilient: Our encoder-only design supports large-scale pretraining, achieving +10.96% accuracy and +8.18% AUCROC gains over baselines. Most importantly, EEG2Rep demonstrates strong robustness to noise, making it a solid foundation for universal EEG representation learning.
  • Access Paper

SpellerSSL: Self-Supervised Learning with P300 Aggregation for Speller BCIs

(2025) Manuscript submitted for publication

  • Relevance to P300 ERP: The SpellerSSL model is specifically designed for P300 ERP tasks, which have a different nature (requiring frequency components versus time domain features) than other EEG applications.
  • Data Preparation: The paper demonstrates a novel method for preparing data and designing a model that can leverage self-supervised techniques (like EEG-X and EEGM2) for improved accuracy and low-latency inference.
  • Efficiency & Generalization: This is the first framework to apply self-supervised learning to P300 spellers, significantly reducing time-consuming calibration and improving robustness across subjects.
  • SOTA Performance: The model achieves a 94% character recognition rate with only 7 repetitions and the highest information transfer rate (ITR) of 21.86 bits/min.

EEGM2: An Efficient Mamba-2-Based Self-Supervised Framework for Long-Sequence EEG Modeling

(2025) Manuscript submitted for publication

  • Efficiency: Our Mamba-2-based architecture overcomes the limitations of older Transformer models, offering linear computational complexity for long EEG sequences.
  • SOTA Performance: EEGM2 achieves state-of-the-art accuracy on long-sequence tasks, outperforming conventional models that struggle with memory and speed.
  • Robustness & Generalization: The framework's spatiotemporal loss and multi-branch input embedding enhance robustness to noise and improve generalization across subjects and varying sequence lengths.

EEG-X: Device-Agnostic Foundation Model for EEG

(2025) Manuscript submitted for publication

  • Device-Agnostic Design: EEG-X generalizes across a wide variety of EEG devices and channel layouts. Using location-based channel embeddings, it covers standard systems including 10-05, 10-10, and 10-20. The Quant-based context encoding allows the model to handle variable-length inputs, producing meaningful tokens for robust representation learning.
  • Noise-Free Reconstruction: Dual reconstruction captures rich information from noisy data without removing brain signals:
    • Artifact-removed raw signal reconstruction
    • Noise-free latent space reconstruction for enhanced representation learning
  • Zero-Shot Inference: Pretrain and infer on completely unseen headsets, enabling truly universal EEG analysis.
  • Proven Performance: EEG-X delivers +2.41% accuracy and +2.32% AUCROC improvements over state-of-the-art baselines across diverse EEG datasets.

SAMBA: Toward a Long-Context EEG Foundation Model via Spatial Embedding and Differential Mamba

(2025) Manuscript submitted for publication

  • Scalable Long-Context Modeling: SAMBA leverages a Mamba-based architecture to achieve linear time complexity, allowing it to efficiently model long EEG recordings (e.g., 100 seconds) without high memory usage.
  • Device-Agnostic Spatial Embedding: The Spatial-Adaptive Input Embedding (SAIE) uses 3D electrode coordinates to unify data from different montages and channel counts, enabling generalization to unseen devices.
  • Superior Performance: Experiments across 13 datasets show SAMBA consistently outperforms state-of-the-art methods while maintaining low memory and fast inference speed.
  • Transferability: The model demonstrates strong transferability, with pretraining on long sequences improving performance on tasks with shorter durations.

How it works

Emotiv-FM is the inference layer built on the Large Brain Model. A two-stage architecture splits responsibility between the device and the cloud — on-device inference for latency-critical cognitive state outputs, cloud inference for the heavier representational work that powers downstream applications.

On-device

Mamba-2 U-Net (1–4M parameters)

A lightweight state-space architecture purpose-built for streaming EEG. Linear compute scaling and a memory footprint that fits on consumer wearable silicon. Runs the latency-critical primitives — attention, cognitive load, fatigue, drowsiness, intent — directly on the device, with no cloud round-trip and no raw neural data leaving the user.

Cloud

EEGX transformer (~14M parameters)

A larger transformer trained on the full multi-channel corpus, hosted on Dcode.AI infrastructure. Handles the representational work behind richer downstream tasks — neurological state classification, longitudinal trend analysis, multi-modal fusion with other biosignals — and is the surface that partners access through the API.

  1. 1

    Any compatible device streams signal in

    From a 2-channel in-ear earbud to a 32-channel research cap. The model normalises across channel counts, electrode positions, and form factors — the partner does not retrain for each new device.

  2. 2

    Emotiv-FM produces standardised cognitive state outputs

    Device-specific noise, inter-subject variability, and artifacts are suppressed. The output is a clean, model-ready representation of cognitive state — attention, load, fatigue, emotional valence, motor intent — calibrated against the full LBM corpus.

  3. 3

    Partners build the application layer

    Dashboards, classifiers, adaptive interfaces, longitudinal cohort tools — anything that consumes cognitive state. Because the upstream representation is consistent, the application layer sees brain-state differences, not inter-subject differences or hardware noise.

What the LBM enables

The LBM is a general-purpose foundation model for cognitive state. The applications below are not aspirational — they are the categories of product that the corpus and the model architecture make tractable today.

Application category What the model provides Why the LBM corpus matters
Consumer wearables — attention, focus, fatigue On-device cognitive state stream usable in earbuds, headbands, AR glasses No per-user calibration; works across 140+ countries of demographic variation
Workplace safety and operator monitoring Real-time drowsiness and cognitive overload detection in industrial settings Event-marked corpus produces accurate state-transition detection, not lagging averages
Clinical and neurological assessment Standardised cognitive markers for screening, monitoring, and longitudinal studies 15 years of paradigm diversity covers the full waking cognitive spectrum
Adaptive learning and education Continuous attention and comprehension signal for learning platforms Subject diversity ensures generalisation across learner populations, not WEIRD-only
Gaming, XR, and human-computer interaction Intent and engagement signal as a control surface for next-generation interfaces Cross-device transfer means a single model serves the full hardware ecosystem
Research infrastructure for academic and pharma A foundation model the field can fine-tune against, replacing one-off pipelines The 4,000+ institution network is already the de facto standard substrate
Multi-modal AI fusion EEG as a complementary signal to vision, language, and physiological models Standardised, model-ready representations make EEG composable with other AI stacks

What we bring

Partners do not need to build the corpus, the model, or the infrastructure. We bring the full stack — and a 15-year head start that cannot be replicated on a product timeline.

Layer What Dcode.AI provides
Corpus 300,000+ hours of EEG signal across 150,000+ subjects in 140+ countries — the largest and most demographically diverse training corpus in the field
Foundation model Emotiv-FM: an on-device Mamba-2 U-Net (1–4M parameters) paired with a cloud EEGX transformer (~14M parameters)
Inference infrastructure On-device runtime SDK plus the cloud inference API, with the engineering organisation to maintain both at production scale
Hardware reference 15 years of EEG hardware design (EPOC through MN8) as the reference for sensor placement, noise envelope, and form-factor constraints — partners can use Emotiv hardware, design their own, or both
Scientific authority 20,000+ Google Scholar citations across the user network and 5 peer-reviewed foundation-model papers (EEG2Rep, SpellerSSL, EEGM2, EEG-X, SAMBA) — the de facto standard the field cites
Regulatory and ethical posture User-owned data architecture, on-device inference for privacy-critical workloads, and 15 years of institutional review experience across the global research network

Meet Our Team

We are a group of neuroscientists, AI researchers, and engineers committed to building the future of non-invasive brain decoding.

How we work together

A partnership with Dcode.AI is a structured pathway, not a one-off integration. The stages below are how we have engaged with every serious partner conversation to date.

  1. 1

    Scoping conversation

    A short technical discussion to map the partner's hardware, target form factor, and the cognitive primitives the product requires. Output: a written scope and a recommendation on which model surface (on-device, cloud, or both) the partner will consume.

  2. 2

    Technical evaluation

    Access to the API and the on-device SDK against the partner's hardware and a representative data sample. Output: measured accuracy on the partner's target tasks, on the partner's silicon — not a benchmark in our environment.

  3. 3

    Integration and co-development

    A scoped engineering engagement: SDK integration, model tuning where warranted, joint validation on the partner's user population. Co-development is the norm — the LBM gets sharper with every serious integration, and partners get a model calibrated to their deployment.

  4. 4

    Production and ongoing model access

    Licensed access to the production model, with versioned releases, SLAs, and a model-improvement loop that flows back into the corpus. Partners ship on a foundation model that continues to improve underneath their product.

The user owns their data.

Our architecture is built around on-device inference for privacy-critical workloads and a user-owned data model end-to-end. Partners do not need to take on the regulatory and reputational risk of routing raw neural signal through their own infrastructure — and end users do not need to surrender their brain data to use the product.

Get Early Access

Be among the first to build on the Large Brain Model. Request API access for updates on our platform launch — or request the deck if you are evaluating an investment.

Request API access For researchers & developers