Data Transfers#
Collection of helper functions for assessing and performing automated data transfers.
- estimate_s3_conversion_cost(total_mb: float, transfer_rate_mb: float = 20.0, conversion_rate_mb: float = 17.0, upload_rate_mb: float = 40.0, compression_ratio: float = 1.7)[source]#
Estimate potential cost of performing an entire conversion on S3 using full automation.
- Parameters
total_mb (float) – The total amount of data (in MB) that will be transferred, converted, and uploaded to dandi.
transfer_rate_mb (float, default: 20.0) – Estimate of the transfer rate for the data.
conversion_rate_mb (float, default: 17.0) – Estimate of the conversion rate for the data. Can vary widely depending on conversion options and type of data. Figure of 17MB/s is based on extensive compression of high-volume, high-resolution ecephys.
upload_rate_mb (float, default: 40.0) – Estimate of the upload rate of a single file to the DANDI Archive.
compression_ratio (float, default: 1.7) – Estimate of the final average compression ratio for datasets in the file. Can vary widely.
- automatic_dandi_upload(dandiset_id: str, nwb_folder_path: FolderPathType, dandiset_folder_path: Optional[FolderPathType] = None, version: str = 'draft', staging: bool = False, cleanup: bool = False, number_of_jobs: Optional[int] = None, number_of_threads: Optional[int] = None)[source]#
Fully automated upload of NWBFiles to a DANDISet.
Requires an API token set as an envrinment variable named DANDI_API_KEY.
- To set this in your bash terminal in Linux or macOS, run
export DANDI_API_KEY=…
- or in Windows
set DANDI_API_KEY=…
DO NOT STORE THIS IN ANY PUBLICLY SHARED CODE.
- Parameters
dandiset_id (str) – Six-digit string identifier for the DANDISet the NWBFiles will be uploaded to.
nwb_folder_path (folder path) – Folder containing the NWBFiles to be uploaded.
dandiset_folder_path (folder path, optional) – A separate folder location within which to download the dandiset. Used in cases where you do not have write permissions for the parent of the ‘nwb_folder_path’ directory. Default behavior downloads the DANDISet to a folder adjacent to the ‘nwb_folder_path’.
version ({None, “draft”, “version”}) – The default is “draft”.
staging (bool, default: False) – Is the DANDISet hosted on the staging server? This is mostly for testing purposes. The default is False.
cleanup (bool, default: False) – Whether to remove the dandiset folder path and nwb_folder_path. Defaults to False.
number_of_jobs (int, optional) – The number of jobs to use in the DANDI upload process.
number_of_threads (int, optional) – The number of threads to use in the DANDI upload process.
- get_globus_dataset_content_sizes(globus_endpoint_id: str, path: str, recursive: bool = True, timeout: float = 120.0) Dict[str, int][source]#
May require external login via ‘globus login’ from CLI.
Returns dictionary whose keys are file names and values are sizes in bytes.
- transfer_globus_content(source_endpoint_id: str, source_files: Union[str, List[List[str]]], destination_endpoint_id: str, destination_folder: DirectoryPath, display_progress: bool = True, progress_update_rate: float = 60.0, progress_update_timeout: float = 600.0) Tuple[bool, List[str]][source]#
Track progress for transferring content from source_endpoint_id to destination_endpoint_id:destination_folder.
- Parameters
source_endpoint_id (str) – Source Globus ID.
source_files (string, or list of strings, or list of lists of strings) – A string path or list-of-lists of string paths of files to transfer from the source_endpoint_id. If using a nested list, the outer level indicates which requests will be batched together. If using a nested list, all items in a single batch level must be from the same common directory.
It is recommended to transfer the largest file(s) with minimal batching, and to batch a large number of very small files together.
It is also generally recommended to submit up to 3 simultaneous transfer, i.e., source_files is recommended to have at most 3 items all of similar total byte size.
destination_endpoint_id (str) – Destination Globus ID.
destination_folder (FolderPathType) – Absolute path to a local folder where all content will be transferred to.
display_progress (bool, default: True) – Whether to display the transfer as progress bars using tqdm.
progress_update_rate (float, default: 60.0) – How frequently (in seconds) to update the progress bar display tracking the data transfer.
progress_update_timeout (float, default: 600.0) – Maximum amount of time to monitor the transfer progress. You may wish to set this to be longer when transferring very large files.
- Returns
success (bool) – Returns the total status of all transfers when they either finish or the progress tracking times out.
task_ids (list of strings) – List of the task IDs submitted to globus, if further information is needed to reestablish tracking or terminate.
- estimate_total_conversion_runtime(total_mb: float, transfer_rate_mb: float = 20.0, conversion_rate_mb: float = 17.0, upload_rate_mb: float = 40, compression_ratio: float = 1.7)[source]#
Estimate how long the combined process of data transfer, conversion, and upload is expected to take.
- Parameters
total_mb (float) – The total amount of data (in MB) that will be transferred, converted, and uploaded to dandi.
transfer_rate_mb (float, default: 20.0) – Estimate of the transfer rate for the data.
conversion_rate_mb (float, default: 17.0) – Estimate of the conversion rate for the data. Can vary widely depending on conversion options and type of data. Figure of 17MB/s is based on extensive compression of high-volume, high-resolution ecephys.
upload_rate_mb (float, default: 40.0) – Estimate of the upload rate of a single file to the DANDI archive.
compression_ratio (float, default: 1.7) – Estimate of the final average compression ratio for datasets in the file. Can vary widely.