Using YAML to specify metadata#

While it is possible to specify all metadata programmatically in Python, it is often more convenient to use YAML files. These files can store data persistently, can be reused across different conversion projects, and can be easily inspected and edited without changing any of the conversion software.

Below is the metadata for NWBFile and Subject, which is applicable to all NWB conversions

NWBFile:
  session_id:  # required by DANDI
  session_description: # required by NWB
  session_start_time: "1900-01-01T08:15:30-05:00" # required by NWB
  lab:
  institution:
  experimenter:
    - "Last, First"
    - "Last, First M."
  experiment_description:
  keywords:
    - "olfaction"
    - "neuropixels"
  notes:
  pharmacology:
  protocol:
  related_publications:
    - "https://doi.org/10.7554/eLife.78362"
  source_script:
  source_script_file_name:
  data_collection:
  surgery:
  virus:
  stimulus_notes:
Subject:
  subject_id:  # required by DANDI
  description:
  species:  # Latin binomial, e.g. "Mus musculus" or "Homo sapiens"; required by DANDI
  genotype:
  strain:
  sex:  # required by DANDI
  age:  # required by DANDI
  weight:
  date_of_birth:  # can be used instead of age

See the API documentation for NWBFile and Subject for the intended use and form of each of these fields.

The fields marked as “required” will be needed later when converting to NWB or uploading to DANDI. It is sometimes possible to extract these fields from source data files or gather them from other sources, in which case it would not be necessary to populate them here.

This metadata can easily be added to any conversion pipelines. The content of the YAML file can be loaded as a dictionary using load_dict_from_file(). Then the metadata can be updated using dict_deep_update().

from neuroconv.datainterfaces import SpikeGLXRecordingInterface
from neuroconv.utils.dict import load_dict_from_file, dict_deep_update

spikeglx_interface = SpikeGLXRecordingInterface(file_path="path/to/towersTask_g0_t0.imec0.ap.bin")

metadata = spikeglx_interface.get_metadata()
metadata_path = "my_lab_metadata.yml"
metadata_from_yaml = load_dict_from_file(file_path=metadata_path)

metadata = spikeglx_interface.get_metadata()
metadata = dict_deep_update(metadata, metadata_from_yaml)

spikeglx_interface.run_conversion(
    save_path="path/to/destination.nwb",
    metadata=metadata
)

Note that any metadata extracted in by spikeglx_interface.get_metadata() will be updated by the YAML data.

The above YAML is common to all BaseDataInterface, NWBConverter, or ConverterPipe, and an analogous workflow for incorporating this data will work for each. Specific interfaces and converters will have additional fields, which you can see using the method DataInterface.get_metadata_schema() or NWBConverter.get_metadata_schema().