We're building the future of brain decoding to generalize across people, contexts, and applications.

The first Large Brain Model.
Built on the only corpus that matters.

LLMs were trained on what humans write. They were never trained on what humans experience. The LBM is the first model built on the second — 15 years of healthy human neural data, researcher-collected across 4,000+ institutions in 140+ countries.

The intelligence layer every neurotech company can build on.

Request API access

Built on the only corpus that matters

Every major EEG dataset was collected from patients in clinical settings. Those that weren't suffer from the WEIRD problem — Western, Educated, Industrialised, Rich, Democratic populations only. Neither can train a general foundation model. Emotiv's can.

4,000+

institutions in the network

140+

countries represented

20,000+

Google Scholar citations

15

years of continuous collection

Dcode.AI is built on the Emotiv corpus — the result of 15 years of sensor network deployment, 4,000+ institutional partnerships, and researcher-collected, event-marked data from day one. Every session event-marked through EmotivPRO — a single software protocol across every institution and every country. Maximum population diversity. Uniform technical structure. The corpus cannot be acquired, replicated, or assembled with capital alone. Minimum replication time: 15+ years.

Why now

AGI cannot be reached by scraping the internet alone. OpenAI, Google, and Anthropic have exhausted high-quality text. Synthetic data shows diminishing returns. The race is on for a new modality — and cognitive signal is the only one that requires a physical sensor network to capture.

$2.36 billion flowed into neurotech in five quarters. Neuralink raised $650M. BrainCo raised $286M. Merge Labs raised $250M at $850M valuation. The category is real. The intelligence layer is not built.

Meta FAIR published NeuralBench (May 2026) — 36 EEG tasks, 14 architectures, 94 datasets. Foundation models already outperform traditional ML across most EEG tasks. The field is in its pre-ImageNet moment: models are still small, datasets narrow. The unlock is scale and corpus diversity. That corpus exists. It is ours.

The window is 18–24 months.

"The reason no general-purpose neural foundation model exists is not a model architecture problem. It is a data problem. The corpus required to train one has never existed — until now."

— Tan Le, CEO & Founder, Emotiv / Dcode.AI

The EEG Data Fragmentation Crisis

The EEG neurotech industry stands at a critical inflection point. Despite billions in investment and decades of research, the promise of brain-computer interfaces remains largely unfulfilled. Companies struggle to scale beyond proof-of-concept, researchers can't replicate findings across labs, and developers waste months reinventing the wheel. The root cause? A fractured ecosystem where every device speaks its own language, every dataset lives in isolation, and every breakthrough stays locked in its silo. This fragmentation isn't just slowing progress—it's preventing the entire industry from reaching its transformative potential in healthcare, wellness, and human augmentation.

Inconsistent & Noisy Data

The technical reality is stark: EEG devices vary wildly in their specifications. From 2-channel consumer wearables to 256-channel research systems. Each device uses different sensor placements (montages), input impedances, voltage resolutions, and signal-to-noise ratios. This hardware heterogeneity means findings from one device rarely transfer to another, effectively fragmenting the knowledge base and limiting the scalability of solutions.

Task-Specific Models

Current machine learning approaches in EEG are narrowly focused, with separate models for sleep staging, emotion detection, motor imagery, and seizure prediction. Each model starts from scratch, unable to leverage representations learned from other tasks. This siloed approach wastes computational resources and prevents the cross-pollination of insights that could accelerate breakthroughs across multiple applications.

Inter- and Intra-Subject Variability

Brain signals are inherently personal. Inter-subject differences including unique cortical folding patterns, skull thickness variations, and individual psychological profiles often overwhelm the neural signatures researchers seek to measure. Meanwhile, intra-subject variability stemming from brain changes over time from aging, development, learning, and even short-term physiological fluctuations caused by circadian rhythms, hormonal cycles, medication effects, fatigue levels, and emotional states can make the same person's brain appear drastically different from hour to hour, compromising repeatability and reliability.

Stuck on Data Preparation

The preprocessing burden is crushing innovation. Researchers report spending 60-80% of their time on artifact removal, signal filtering, and data alignment. This is tedious work that must be repeated for every new dataset and device. This leaves minimal time for actual discovery and development, creating a bottleneck that affects academia and industry alike.

Introducing the Large Brain Model

We're building the industry's first universal EEG foundation model — the intelligence layer every neurotech company can build on. By unifying fragmented neural data into a single intelligent system, the LBM aims to do for brain signals what GPT did for language.

Our Impact & Scientific Validation

Our work is backed by groundbreaking research published in top-tier academic venues, demonstrating our models' superior performance, generalizability, and efficiency.

EEG2Rep: Enhancing Self-supervised EEG Representation Through Informative Masked Inputs

Published at the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24)

🏆 Winner of the Audience Appreciation Award (1st Place)

  • Novel EEG Architecture: EEG2Rep is the first work to redesign the JEPA architecture specifically for EEG, creating a teacher–student model tailored to handle the noise and variability of brain recordings.
  • Informative Masking: We introduce a new masking strategy that enables rich and generalized representations without relying on data augmentations or device-specific calibration.
  • Scalable and Noise-Resilient: Our encoder-only design supports large-scale pretraining, achieving +10.96% accuracy and +8.18% AUCROC gains over baselines. Most importantly, EEG2Rep demonstrates strong robustness to noise, making it a solid foundation for universal EEG representation learning.
  • Access Paper

SpellerSSL: Self-Supervised Learning with P300 Aggregation for Speller BCIs

(2025) Manuscript submitted for publication

  • Relevance to P300 ERP: The SpellerSSL model is specifically designed for P300 ERP tasks, which have a different nature (requiring frequency components versus time domain features) than other EEG applications.
  • Data Preparation: The paper demonstrates a novel method for preparing data and designing a model that can leverage self-supervised techniques (like EEG-X and EEGM2) for improved accuracy and low-latency inference.
  • Efficiency & Generalization: This is the first framework to apply self-supervised learning to P300 spellers, significantly reducing time-consuming calibration and improving robustness across subjects.
  • SOTA Performance: The model achieves a 94% character recognition rate with only 7 repetitions and the highest information transfer rate (ITR) of 21.86 bits/min.

EEGM2: An Efficient Mamba-2-Based Self-Supervised Framework for Long-Sequence EEG Modeling

(2025) Manuscript submitted for publication

  • Efficiency: Our Mamba-2-based architecture overcomes the limitations of older Transformer models, offering linear computational complexity for long EEG sequences.
  • SOTA Performance: EEGM2 achieves state-of-the-art accuracy on long-sequence tasks, outperforming conventional models that struggle with memory and speed.
  • Robustness & Generalization: The framework's spatiotemporal loss and multi-branch input embedding enhance robustness to noise and improve generalization across subjects and varying sequence lengths.

EEG-X: Device-Agnostic Foundation Model for EEG

(2025) Manuscript submitted for publication

  • Device-Agnostic Design: EEG-X generalizes across a wide variety of EEG devices and channel layouts. Using location-based channel embeddings, it covers standard systems including 10-05, 10-10, and 10-20. The Quant-based context encoding allows the model to handle variable-length inputs, producing meaningful tokens for robust representation learning.
  • Noise-Free Reconstruction: Dual reconstruction captures rich information from noisy data without removing brain signals:
    • Artifact-removed raw signal reconstruction
    • Noise-free latent space reconstruction for enhanced representation learning
  • Zero-Shot Inference: Pretrain and infer on completely unseen headsets, enabling truly universal EEG analysis.
  • Proven Performance: EEG-X delivers +2.41% accuracy and +2.32% AUCROC improvements over state-of-the-art baselines across diverse EEG datasets.

SAMBA: Toward a Long-Context EEG Foundation Model via Spatial Embedding and Differential Mamba

(2025) Manuscript submitted for publication

  • Scalable Long-Context Modeling: SAMBA leverages a Mamba-based architecture to achieve linear time complexity, allowing it to efficiently model long EEG recordings (e.g., 100 seconds) without high memory usage.
  • Device-Agnostic Spatial Embedding: The Spatial-Adaptive Input Embedding (SAIE) uses 3D electrode coordinates to unify data from different montages and channel counts, enabling generalization to unseen devices.
  • Superior Performance: Experiments across 13 datasets show SAMBA consistently outperforms state-of-the-art methods while maintaining low memory and fast inference speed.
  • Transferability: The model demonstrates strong transferability, with pretraining on long sequences improving performance on tasks with shorter durations.

How It Works

Our platform abstracts away the complexity of raw brain data, providing a clean, consistent, and powerful vector for your applications.

1

Connect Any Device

Connect any EEG device, from 2 channels to a high-density research-grade cap or anything in between. If you tell us where the sensors are placed, our platform can handle it.

2

Our API Vectorizes Data

Our foundation model processes raw EEG signals and outputs a standardised brain-state representation vector — device-specific noise, inter-subject variability, and artifacts suppressed — a clean cognitive representation developers can build on directly.

3

Build Your Model

We provide the environment and tools for you to build your own application-layer components, such as dashboards, classifiers, and other services. Data clusters now reflect brain differences, not inter-subject differences or noise.

Unlock Limitless Possibilities

Our platform is the universal language for the brain. See where our technology can take you.

Medical & Clinical

Develop advanced neurological diagnostics, rehabilitation tools, and mental health applications with unparalleled accuracy.

Academic Research

Accelerate your research in cognitive science, psychology, and neuroscience by removing hardware constraints.

Human-Computer Interaction

Create next-generation interfaces for gaming, virtual reality, and productivity by using thought as input.

Meet Our Team

We are a group of neuroscientists, AI researchers, and engineers committed to building the future of non-invasive brain decoding.

Get Early Access

Be among the first to build on the Large Brain Model. Request API access for updates on our platform launch — or request the deck if you are evaluating an investment.

Request API access For researchers & developers