BCI DATASET PROVENANCE SCORECARD
================================

If you claim to hear the brain, publish the score.

I built this as a practical rubric for EEG, ear-EEG, MEG, fNIRS, EMG-assisted BCI, and hybrid neurotech releases. The standard is simple: no more vibes-based cognition claims, no more empty repositories, no more "trust us" telemetry.

QUICK RUBRIC
------------

Score each dimension 0, 1, or 2.

1) Dataset identity
0 = no clear version / DOI / URL
1 = some identifiers
2 = versioned, cited, stable DOI / URL

2) License
0 = no license
1 = custom or ambiguous terms
2 = clear OSI-compatible or explicit research terms

3) Integrity
0 = no checksums
1 = partial hashes
2 = full SHA256.manifest for all artifacts

4) Raw signals
0 = no raw release
1 = raw subset only
2 = raw data released in structured layout

5) Preprocessing
0 = hand-wavy methods
1 = some scripts / settings
2 = exact pipeline with scripts and parameters

6) Timing / event map
0 = no trigger map
1 = incomplete events
2 = trigger schema, timing diagram, synced annotations

7) Hardware chain
0 = device unclear
1 = partial hardware info
2 = sensor model, montage, reference, gain, sampling, firmware

8) Electrode / fit quality
0 = no impedance / contact info
1 = one-time snapshot
2 = per-session impedance / contact quality logs

9) Noise accounting
0 = no controls
1 = mentions artifact rejection
2 = explicit controls for jaw, blink, motion, heartbeat, room vibration

10) Code provenance
0 = no code commit / container
1 = repo only
2 = commit hash plus lockfile or container digest

11) Subject / task metadata
0 = thin prose only
1 = partial protocol
2 = full task protocol, timing, conditions, exclusions

12) Governance
0 = no consent / de-ID note
1 = minimal ethics note
2 = consent version, de-ID method, usage boundaries

13) Null artifacts
0 = missing files vanish silently
1 = missingness noted informally
2 = missing artifacts logged explicitly as missing

14) Reproducibility
0 = no rerun path
1 = manual recreation possible
2 = seeds, environment, scripts, deterministic or bounded replay

SCORE INTERPRETATION
--------------------

0-10  = marketing fog
11-18 = interesting, but not trustworthy enough
19-24 = usable with caution
25-28 = serious scientific release

MINIMUM ARTIFACT BUNDLE
-----------------------

A release is not complete unless it includes, at minimum:

- README describing task, protocol, exclusions, and intended use
- LICENSE
- SHA256.manifest
- raw signal files in BIDS or a clearly documented equivalent
- derivatives or preprocessed outputs with exact generation steps
- event annotations and trigger mapping
- hardware manifest
- environment manifest
- code commit hash and dependency lockfile
- consent / de-identification statement
- explicit record of anything missing

HARDWARE MANIFEST: WHAT TO PUBLISH
----------------------------------

If the acquisition chain is vague, the result is theater.

Publish:
- device manufacturer and exact model
- firmware version
- channel count
- sampling rate
- reference scheme
- electrode positions / montage
- earbud fit method if ear-EEG
- amplifier gain and filters
- impedance / contact quality logs
- auxiliary channels present or absent: EMG, ECG or PPG, accelerometer, gyro, audio, respiration

MECHANOSENSITIVE NOISE QUARANTINE
---------------------------------

Before calling a signal cognitive, release controls for:
- jaw clench
- swallowing
- speech / subvocalization
- blink / saccade
- head movement
- posture shift
- heartbeat leakage
- ambient audio
- room vibration / desk coupling
- cable motion / connector noise
- impedance drift across the session

If you do not rule these out, you may be publishing a beautifully branded microphone for the skull.

MISSINGNESS MUST BE EXPLICIT
----------------------------

Silence is not consent, and absence is not proof of cleanliness.

Example missing-artifact record:

{
  "artifact": "raw/session_07_eeg.edf",
  "status": "missing",
  "reason": "corrupt upload",
  "discovered_at": "2026-03-05T20:00:00Z",
  "replacement_expected": false
}

MINIMAL METADATA SCHEMA
-----------------------

{
  "dataset_id": "example_bci_release_v1",
  "version": "1.0.0",
  "doi_or_url": "",
  "license": "",
  "modality": ["EEG"],
  "task": "motor imagery / auditory imagery / preference classification / etc",
  "subjects": 0,
  "sessions": 0,
  "sampling_rate_hz": 0,
  "hardware": {
    "manufacturer": "",
    "model": "",
    "firmware": "",
    "reference_scheme": "",
    "montage": "",
    "impedance_log_present": false
  },
  "aux_channels": {
    "emg": false,
    "ecg_or_ppg": false,
    "accelerometer": false,
    "audio": false,
    "respiration": false
  },
  "artifacts": {
    "raw_data_present": false,
    "derivatives_present": false,
    "preprocessing_scripts_present": false,
    "event_map_present": false,
    "sha256_manifest_present": false,
    "code_commit_present": false,
    "environment_lockfile_present": false
  },
  "noise_controls": {
    "jaw_control": false,
    "blink_control": false,
    "motion_control": false,
    "heartbeat_control": false,
    "room_vibration_control": false
  },
  "governance": {
    "consent_version": "",
    "deidentification_method": "",
    "usage_restrictions": "",
    "missing_artifacts_logged": false
  }
}

MY BIAS, STATED OPENLY
----------------------

I care less about whether a dataset is fashionable than whether it is inspectable.

A mediocre open dataset with timing maps, impedance logs, hashes, and honest caveats is more valuable than a glamorous closed demo claiming to decode desire from jaw tremor. If the future of BCI is going to sing, it needs tuning forks, not incense.