We're building the future of brain decoding to generalize across people, contexts, and applications.

The first Large Brain Model.
Built on the only corpus that matters.

AGI labs have exhausted the internet. The next modality is biological. Dcode.AI is the foundation model layer built on 15 years and 100 million+ minutes of healthy human neural data — assembled by the world's largest EEG sensor network.

The intelligence layer every neurotech company can build on.

Request API access

Built on the only corpus that matters

Every major EEG dataset was collected from patients in clinical settings. Emotiv's corpus was built differently — healthy subjects, real-world conditions, 15 years of continuous collection from a live, distributed physical sensor network across 4,000+ institutions in 140+ countries.

Existing datasets are small, fragmented, and suffer from the WEIRD problem — Western, Educated, Industrialised, Rich, Democratic populations only. None can train a general foundation model. Emotiv's can.

100M+

minutes of neural data

4,000+

institutions in the network

20,000+

Google Scholar citations

140+

countries represented

Dcode.AI is built on the Emotiv corpus — the result of 15 years of sensor network deployment, 4,000+ institutional partnerships, and neuroscientist-supervised data collection from day one. The corpus cannot be acquired, replicated, or assembled with capital alone. Minimum replication time: 12+ years.

Why now

AGI cannot be reached by scraping the internet alone. OpenAI, Google, and Anthropic have exhausted high-quality text. Synthetic data shows diminishing returns. The race is on for a new modality — and cognitive signal is the only one that requires a physical sensor network to capture.

$2.36 billion flowed into neurotech in five quarters. Neuralink raised $650M. BrainCo raised $286M. Merge Labs raised $250M at $850M valuation. The category is real. The intelligence layer is not built.

Meta FAIR's NeuralBench (May 2026) confirmed what Emotiv has known for years: current EEG foundation models barely outperform task-specific baselines. The field has confirmed the problem. The Large Brain Model is the solution — trained on the only corpus that can support it.

The window is 18–24 months.

“The reason no general-purpose neural foundation model exists is not a model architecture problem. It is a data problem. The corpus required to train one has never existed — until now.”

The EEG Data Fragmentation Crisis

The EEG neurotech industry stands at a critical inflection point. Despite billions in investment and decades of research, the promise of brain-computer interfaces remains largely unfulfilled. Companies struggle to scale beyond proof-of-concept, researchers can't replicate findings across labs, and developers waste months reinventing the wheel. The root cause? A fractured ecosystem where every device speaks its own language, every dataset lives in isolation, and every breakthrough stays locked in its silo. This fragmentation isn't just slowing progress—it's preventing the entire industry from reaching its transformative potential in healthcare, wellness, and human augmentation.

Inconsistent & Noisy Data

The technical reality is stark: EEG devices vary wildly in their specifications. From 2-channel consumer wearables to 256-channel research systems. Each device uses different sensor placements (montages), input impedances, voltage resolutions, and signal-to-noise ratios. This hardware heterogeneity means findings from one device rarely transfer to another, effectively fragmenting the knowledge base and limiting the scalability of solutions.

Task-Specific Models

Current machine learning approaches in EEG are narrowly focused, with separate models for sleep staging, emotion detection, motor imagery, and seizure prediction. Each model starts from scratch, unable to leverage representations learned from other tasks. This siloed approach wastes computational resources and prevents the cross-pollination of insights that could accelerate breakthroughs across multiple applications.

Inter- and Intra-Subject Variability

Brain signals are inherently personal. Inter-subject differences including unique cortical folding patterns, skull thickness variations, and individual psychological profiles often overwhelm the neural signatures researchers seek to measure. Meanwhile, intra-subject variability stemming from brain changes over time from aging, development, learning, and even short-term physiological fluctuations caused by circadian rhythms, hormonal cycles, medication effects, fatigue levels, and emotional states can make the same person's brain appear drastically different from hour to hour, compromising repeatability and reliability.

Stuck on Data Preparation

The preprocessing burden is crushing innovation. Researchers report spending 60-80% of their time on artifact removal, signal filtering, and data alignment. This is tedious work that must be repeated for every new dataset and device. This leaves minimal time for actual discovery and development, creating a bottleneck that affects academia and industry alike.

Introducing the Large Brain Model

We're revolutionizing brain-computer interfaces with the industry's first universal EEG foundation model. By unifying fragmented neural data into a single intelligent system, we're doing for brain signals what GPT did for language—creating the intelligence layer every neurotech company can build on. The LBM transforms noisy, device-specific recordings into actionable insights, enabling developers to build in days what used to take years.

Our Impact & Scientific Validation

Our work is backed by groundbreaking research published in top-tier academic venues, demonstrating our models' superior performance, generalizability, and efficiency.

EEG2Rep: Enhancing Self-supervised EEG Representation Through Informative Masked Inputs

Published at the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '24)

🏆 Winner of the Audience Appreciation Award (1st Place)

  • Novel EEG Architecture: EEG2Rep is the first work to redesign the JEPA architecture specifically for EEG, creating a teacher–student model tailored to handle the noise and variability of brain recordings.
  • Informative Masking: We introduce a new masking strategy that enables rich and generalized representations without relying on data augmentations or device-specific calibration.
  • Scalable and Noise-Resilient: Our encoder-only design supports large-scale pretraining, achieving +10.96% accuracy and +8.18% AUCROC gains over baselines. Most importantly, EEG2Rep demonstrates strong robustness to noise, making it a solid foundation for universal EEG representation learning.
  • Access Paper

SpellerSSL: Self-Supervised Learning with P300 Aggregation for Speller BCIs

(2025) Manuscript submitted for publication

  • Relevance to P300 ERP: The SpellerSSL model is specifically designed for P300 ERP tasks, which have a different nature (requiring frequency components versus time domain features) than other EEG applications.
  • Data Preparation: The paper demonstrates a novel method for preparing data and designing a model that can leverage self-supervised techniques (like EEG-X and EEGM2) for improved accuracy and low-latency inference.
  • Efficiency & Generalization: This is the first framework to apply self-supervised learning to P300 spellers, significantly reducing time-consuming calibration and improving robustness across subjects.
  • SOTA Performance: The model achieves a 94% character recognition rate with only 7 repetitions and the highest information transfer rate (ITR) of 21.86 bits/min.

EEGM2: An Efficient Mamba-2-Based Self-Supervised Framework for Long-Sequence EEG Modeling

(2025) Manuscript submitted for publication

  • Efficiency: Our Mamba-2-based architecture overcomes the limitations of older Transformer models, offering linear computational complexity for long EEG sequences.
  • SOTA Performance: EEGM2 achieves state-of-the-art accuracy on long-sequence tasks, outperforming conventional models that struggle with memory and speed.
  • Robustness & Generalization: The framework's spatiotemporal loss and multi-branch input embedding enhance robustness to noise and improve generalization across subjects and varying sequence lengths.

EEG-X: Device-Agnostic Foundation Model for EEG

(2025) Manuscript submitted for publication

  • Device-Agnostic Design: EEG-X generalizes across a wide variety of EEG devices and channel layouts. Using location-based channel embeddings, it covers standard systems including 10-05, 10-10, and 10-20. The Quant-based context encoding allows the model to handle variable-length inputs, producing meaningful tokens for robust representation learning.
  • Noise-Free Reconstruction: Dual reconstruction captures rich information from noisy data without removing brain signals:
    • Artifact-removed raw signal reconstruction
    • Noise-free latent space reconstruction for enhanced representation learning
  • Zero-Shot Inference: Pretrain and infer on completely unseen headsets, enabling truly universal EEG analysis.
  • Proven Performance: EEG-X delivers +2.41% accuracy and +2.32% AUCROC improvements over state-of-the-art baselines across diverse EEG datasets.

SAMBA: Toward a Long-Context EEG Foundation Model via Spatial Embedding and Differential Mamba

(2025) Manuscript submitted for publication

  • Scalable Long-Context Modeling: SAMBA leverages a Mamba-based architecture to achieve linear time complexity, allowing it to efficiently model long EEG recordings (e.g., 100 seconds) without high memory usage.
  • Device-Agnostic Spatial Embedding: The Spatial-Adaptive Input Embedding (SAIE) uses 3D electrode coordinates to unify data from different montages and channel counts, enabling generalization to unseen devices.
  • Superior Performance: Experiments across 13 datasets show SAMBA consistently outperforms state-of-the-art methods while maintaining low memory and fast inference speed.
  • Transferability: The model demonstrates strong transferability, with pretraining on long sequences improving performance on tasks with shorter durations.

How It Works

Our platform abstracts away the complexity of raw brain data, providing a clean, consistent, and powerful vector for your applications.

1

Connect Any Device

Connect any EEG device, from 2 channels to a high-density research-grade cap or anything in between. If you tell us where the sensors are placed, our platform can handle it.

2

Our API Vectorizes Data

Our foundation model processes raw EEG signals and provides a standardized, high-quality vector representation of real brain activity for each arbitrary time window. Inter-subject differences and noise artifacts are a thing of the past.

3

Build Your Model

We provide the environment and tools for you to build your own application-layer components, such as dashboards, classifiers, and other services. Data clusters now reflect brain differences, not inter-subject differences or noise.

Unlock Limitless Possibilities

Our platform is the universal language for the brain. See where our technology can take you.

Medical & Clinical

Develop advanced neurological diagnostics, rehabilitation tools, and mental health applications with unparalleled accuracy.

Academic Research

Accelerate your research in cognitive science, psychology, and neuroscience by removing hardware constraints.

Human-Computer Interaction

Create next-generation interfaces for gaming, virtual reality, and productivity by using thought as input.

Meet Our Team

We are a group of neuroscientists, AI researchers, and engineers committed to building the future of non-invasive brain decoding.

Get Early Access

Be among the first to build on the Large Brain Model. Request API access for updates on our platform launch — or request the deck if you are evaluating an investment.

Request API access For researchers & developers