Path Expansion =============== Source data are often stored in an organizational structure where some of the metadata of the experiment, such as the subject and session IDs, are included in the folder names and file paths of the data. The :py:mod:`~neuroconv.tools.path_expansion` module allows a user to specify this path pattern so that NeuroConv can identify all matching data and automatically extract the relevant metadata. Local Path Expander ------------------- Use the :py:class:`~neuroconv.tools.path_expansion.LocalPathExpander` to find matching paths in your local filesystem. This class requires a source data specification, in which you can list multiple data interfaces by name, and for each provide a ``base_directory`` path and a "``file_path``" or "``folder_path``" argument in f-string format. The path expander will find all matching paths and automatically extract the specified metadata. .. code-block:: python from pathlib import Path from typing import Dict from neuroconv.tools.path_expansion import LocalPathExpander # Specify source data source_data_spec = { "spikeglx": { "base_directory": "/path/to/raw_data", "file_path": "{subject_id}/{session_id}/{session_id}_g0_imec0/{session_id}_g0_imec0.ap.bin" }, "phy" : { "base_directory": "/path/to/processed_data" "folder_path": "{subject_id}/{session_id}/phy" } } # Instantiate LocalPathExpander path_expander = LocalPathExpander() # Expand paths and extract metadata metadata_list = path_expander.expand_paths(source_data_spec) # Print the results for metadata in metadata_list: print(metadata) The ``expand_paths`` method returns a list of :py:class:`~neuroconv.utils.dict.DeepDict` objects that contain two dictionaries: ``source_data`` and ``metadata``. The ``source_data`` dictionary contains the resolved path of each interface, while the ``metadata`` dictionary contains the metadata extracted from the filepaths. Currently, only ``subject_id``, ``session_id``, and ``session_start_time`` are supported, but this approach could in principle be extended to support extraction of more metadata. Specifying Metadata Format -------------------------- The f-string format allows you to constrain the search for more precise metadata matching using the `Format Specification Mini-Language`_ and the `1989 C standard format codes`_ for datetimes. Below are some common examples. Length ~~~~~~ For example, you might have a data path where the ``subject_id`` and ``session_id`` are next to each other, and they can be disambiguated because the ``subject_id`` is always 4 characters and the ``session_id`` is always 5 characters. This can be expressed as ``"{subject_id:4}{session_id:5}"``. Character type ~~~~~~~~~~~~~~ If the ``subject_id`` is always numeric, you could use ``n``, e.g. ``"{subject_id:n}"``, which will match only digits. Datetimes ~~~~~~~~~ If your session start time is present in your data path, you can indicate this following the `1989 C standard format codes`_ for datetimes. For example, ``"{session_start_time:%Y-%m-%d}"`` will match ``"2021-01-02"`` and evaluate it to ``datetime.datetime(2021, 1, 2)``. Example Usage ---------------- Below are some full examples of how this feature can be used on some organizational patterns taken from real datasets. Example 1: `Allen Institute Visual Coding Dataset `_ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Allen Institute's Visual Coding dataset contains, among other data, motion-corrected videos of each experimental session, with the directory structure shown below. .. code-block:: bash allen-brain-observatory/ ¦ visual-coding-2p/ ¦ +-- ophys_movies/ ¦ ¦ +-- ophys_experiment_496908818.h5 ¦ ¦ +-- ophys_experiment_496934409.h5 ¦ ¦ +-- ophys_experiment_496935917.h5 ¦ ¦ +-- ... The video files are all stored in the directory ``ophys_movies/``, and their file names follow the pattern ``ophys_experiment_`` plus a 9-digit session ID. We can use :py:class:`~neuroconv.tools.path_expansion.LocalPathExpander` to find all of these ``ophys_movies`` files and extract their session IDs with the following code block. .. code-block:: python source_data_spec = { "allen-visual-coding": { "base_directory": "/allen-brain-observatory/visual-coding-2p", "file_path": "ophys_movies/ophys_experiment_{session_id}.h5" } } path_expander = LocalPathExpander() metadata_list = path_expander.expand_paths(source_data_spec) The ``metadata_list`` now contains the information extracted for each matching file found by :py:class:`~neuroconv.tools.path_expansion.LocalPathExpander`. The information for the first file is shown below. .. code-block:: python { "source_data": { "allen-visual-coding": { "file_path": "/allen-brain-observatory/visual-coding-2p/ophys_movies/ophys_experiment_496908818.h5" } }, "metadata": { "NWBFile": { "session_id": "496908818" } } } Example 2: `Buszaki Lab SenzaiY Dataset `_ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Buszaki Lab's SenzaiY dataset contains spiking and LFP data from mouse V1 with the directory structure shown below. Sorted unit spiking data are stored in the ``.res.1`` and ``.clu.1`` files, while the LFP data are stored in the ``.eeg`` files. .. code-block:: bash SenzaiY/ ¦ YMV01/ ¦ +-- YMV01_170818/ ¦ ¦ +-- YMV01_170818.eeg ¦ ¦ +-- YMV01_170818.res.1 ¦ ¦ +-- YMV01_170818.clu.1 ¦ ¦ +-- ... ¦ YMV02/ ¦ +-- YMV02_170815/ ¦ ¦ +-- YMV01_170815.eeg ¦ ¦ +-- YMV01_170815.res.1 ¦ ¦ +-- YMV01_170815.clu.1 ¦ ¦ +-- ... ¦ ... The data are organized into folders first by subject (``YMV01``, ``YMV02``, etc.) and then by session start times in the format ``yymmdd`` (``170818``, ``170815``, etc). We can use :py:class:`~neuroconv.tools.path_expansion.LocalPathExpander` to find both the LFP data files and the sorted unit spiking and extract their corresponding subject IDs and session start times. For the sorted unit spiking, we'll search for a matching ``folder_path`` instead of a ``file_path``, as ``neuroconv`` interfaces for such data, like :py:class:`~neuroconv.datainterfaces.ecephys.neuroscope.neuroscopedatainterface.NeuroScopeSortingInterface`, expect a ``folder_path`` as input. .. code-block:: python source_data_spec = { "SenzaiY_LFP": { "base_directory": "/SenzaiY/", "file_path": "{subject_id}/{subject_id}_{session_start_time:%y%m%d}/{subject_id}_{session_start_time:%y%m%d}.eeg" }, "SenzaiY_Spiking": { "base_directory": "/SenzaiY/", "folder_path": "{subject_id}/{subject_id}_{session_start_time:%y%m%d}/" } } path_expander = LocalPathExpander() metadata_list = path_expander.expand_paths(source_data_spec) The ``metadata_list`` now contains the information extracted for each matching file and directory found by :py:class:`~neuroconv.tools.path_expansion.LocalPathExpander`. The information for the first file is shown below. .. code-block:: python { "source_data": { "SenzaiY_LFP": { "file_path": "/SenzaiY/YMV01/YMV01_170818/YMV01_170818.eeg" } }, "metadata": { "NWBFile": { "session_start_time": datetime.datetime(2017, 8, 18, 0, 0) }, "Subject": { "subject_id": "YMV01" } } } The information found for the first matching directory is similar. .. code-block:: python { "source_data": { "SenzaiY_Spiking": { "folder_path": "/SenzaiY/YMV01/YMV01_170818/" } }, "metadata": { "NWBFile": { "session_start_time": datetime.datetime(2017, 8, 18, 0, 0) }, "Subject": { "subject_id": "YMV01" } } } Example 3: `IBL Brain Wide Map Data `_ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The IBL's Brain Wide Map features data from several labs of mice performing a visual decision-making task. Some experimental sessions, such as those from the Steinmetz Lab, include video recordings of the experiments from three cameras, stored in the following directory structure. .. code-block:: bash steinmetzlab/ ¦ Subjects/ ¦ +-- NR_0017/ ¦ ¦ +-- 2022-03-22/ ¦ ¦ ¦ +-- 001/ ¦ ¦ ¦ ¦ +-- raw_video_data/ ¦ ¦ ¦ ¦ ¦ +-- _iblrig_leftCamera.raw.6252a2f0-c10f-4e49-b085-75749ba29c35.mp4 ¦ ¦ ¦ ¦ ¦ +-- ... ¦ ¦ ¦ ¦ +-- ... ¦ +-- NR_0019/ ¦ ¦ +-- 2022-04-29/ ¦ ¦ ¦ +-- 001/ ¦ ¦ ¦ ¦ +-- raw_video_data/ ¦ ¦ ¦ ¦ ¦ +-- _iblrig_leftCamera.raw.9041b63e-02e2-480e-aaa7-4f6b776a647f.mp4 ¦ ¦ ¦ ¦ ¦ +-- ... ¦ ¦ ¦ ¦ +-- ... ¦ ... We can use :py:class:`~neuroconv.tools.path_expansion.LocalPathExpander` to find these left camera video files and extract the subject ID, the session start time (formatted as ``yyyy-mm-dd``), and a session number (``001`` for both files shown). .. code-block:: python source_data_spec = { "IBL_video": { "base_directory": "/steinmetzlab/", "file_path": "Subjects/{subject_id}/{session_start_time:%Y-%m-%d}/{session_id}/raw_video_data/_iblrig_leftCamera.raw.{}.mp4" } } path_expander = LocalPathExpander() metadata_list = path_expander.expand_paths(source_data_spec) The ``metadata_list`` now contains the information extracted for each matching file found by :py:class:`~neuroconv.tools.path_expansion.LocalPathExpander`. The information for the first file is shown below. .. code-block:: python { "source_data": { "IBL_video": { "file_path": "/steinmetzlab/Subjects/NR_0017/2022-03-22/001/raw_video_data/_iblrig_leftCamera.raw.6252a2f0-c10f-4e49-b085-75749ba29c35.mp4" } }, "metadata": { "NWBFile": { "session_id": "001", "session_start_time": datetime.datetime(2022, 3, 22, 0, 0) }, "Subject": { "subject_id": "NR_0017" } } } If you would like to experiment locally with :py:class:`~neuroconv.tools.path_expansion.LocalPathExpander`, we provide a helper method in :py:mod:`neuroconv.tools.testing ` that partially replicates the directory structure of the IBL data with dummy files on your machine. .. code-block:: python from neuroconv.tools.testing import generate_path_expander_demo_ibl generate_path_expander_demo_ibl(folder_path="path/to/generate/dummy/files") Non-local Path Expansion ------------------------ Note that :py:class:`~neuroconv.tools.path_expansion.LocalPathExpander` expands file paths locally, so it can only expand file paths that are on the same system as the code. Other types of path expanders could be implemented to support different platforms, such as Google Drive, Dropbox, or S3. These tools have not yet been developed, but would extend from the :py:class:`~neuroconv.tools.path_expansion.AbstractPathExpander` .. _Format Specification Mini-Language: https://docs.python.org/3/library/string.html#formatspec .. _`1989 C standard format codes`: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes