Adopting Datalad for collaboration¶
Datalad is a powerful tool for the versioning and sharing raw and processed data as well as for the tracking of data provenance (i.e. the recording on how data was processed). This page was created with the intention to share with the user how we adopted the use of datalad datasets with the connectome mapper in in our lab at the time of creation of this document (2019 Jan 8). For more details and tutorials on Datalad,please check the recent Datalad Handbook
Warning
This was tested on Ubuntu 16.04
with Datalad 0.11.3
and its extensions datalad-container 0.3.1
, datalad_neuroimaging 0.2.0
and datalad_revolution 0.6.0
. This example might not work with their latest versions as they are under intensive developement and a number of new versions with minor and major changes have been released in the meantime.
Move original BIDS dataset to server¶
rsync -P -v -avz -e 'ssh' --exclude 'derivatives' --exclude 'code' --exclude '.datalad' --exclude '.git' --exclude '.gitattributes' /media/localadmin/HagmannHDD/Seb/ds-newtest2/* tourbier@<SERVER_IP_ADDRESS>:/home/tourbier/Data/ds-newtest2
Datalad setup and dataset creation on Server (accessible via ssh)¶
Connect to server¶
ssh tourbier@<SERVER_IP_ADDRESS>
Install git-annex and liblzma-dev (datalad dependencies), Datalad and its extensions¶
sudo apt-get install git-annex liblzma-dev
pip install datalad[all]==0.11.3
pip install datalad-container==0.3.1
pip install datalad_neuroimaging==0.2.0
pip install datalad_revolution==0.6.0
Note
Tested using git-annex version 7.20190219-gad7c11b
Go to source dataset directory, create a Datalad dataset and save all¶
cd /home/tourbier/Data/ds-newtest2
datalad rev-create -f -D "Original test dataset on lab server"
datalad rev-save -m 'Source (Origin) BIDS dataset' --version-tag origin
Report on the state of dataset content¶
datalad rev-status --recursive
Processing using the Connectome Mapper BIDS App on a local workstation¶
Dataset installation¶
datalad install -s ssh://tourbier@<SERVER_IP_ADDRESS>:/home/tourbier/Data/ds-newtest2 \
/home/localadmin/Data/ds-newtest2
cd /home/localadmin/Data/ds-newtest2
Get T1w and Diffusion images to be processed, written in a bash script for reproducibility¶
datalad get -J 4 sub-*/ses-*/anat/sub-*_T1w.nii.gz
datalad get -J 4 sub-*/ses-*/dwi/sub-*_dwi.nii.gz
datalad get -J 4 sub-*/ses-*/dwi/sub-*_dwi.bvec
datalad get -J 4 sub-*/ses-*/dwi/sub-*_dwi.bval
Write datalad get commands to get_required_files_for_analysis.sh:
mkdir code
echo "datalad get -J 4 sub-*/ses-*/anat/sub-*_T1w.nii.gz" > code/get_required_files_for_analysis.sh
echo "datalad get -J 4 sub-*/ses-*/dwi/sub-*_dwi.nii.gz" >> code/get_required_files_for_analysis.sh
echo "datalad get -J 4 sub-*/ses-*/dwi/sub-*_dwi.bvec" >> code/get_required_files_for_analysis.sh
echo "datalad get -J 4 sub-*/ses-*/dwi/sub-*_dwi.bval" >> code/get_required_files_for_analysis.sh
Add all content in the code/ directory directly to git:
datalad add --to-git code
Add the container image of the connectome mapper to the dataset¶
datalad containers-add connectomemapper-bidsapp-|release| \
--url dhub://sebastientourbier/connectomemapper-bidsapp:|release| \
--update
Save the state of the dataset prior to analysis¶
datalad rev-save -m "Seb's test dataset on local \
workstation ready for analysis with connectomemapper-bidsapp:|release|" \
--version-tag ready4analysis-<date>-<time>
Run Connectome Mapper on all subjects¶
datalad containers-run --container-name connectomemapper-bidsapp-|release| \
'/tmp' '/tmp/derivatives' participant \
--anat_pipeline_config '/tmp/code/ref_anatomical_config.ini' \
--dwi_pipeline_config '/tmp/code/ref_diffusion_config.ini' \
Save the state¶
datalad rev-save -m "Seb's test dataset on local \
workstation processed by connectomemapper-bidsapp:|release|, {Date/Time}" \
--version-tag processed-<date>-<time>
Report on the state of dataset content¶
datalad rev-status --recursive
With DataLad with don’t have to keep those inputs around – without losing the ability to reproduce an analysis.¶
Let’s uninstall them – checking the size on disk before and after:
datalad uninstall sub-*/*
Local collaboration with Bob for Electrical Source Imaging¶
Processed dataset installation on Bob’s workstation¶
datalad install -s (ssh://)localadmin@HOS51827:/home/localadmin/Data/ds-newtest2 \
/home/bob/Data/ds-newtest2
cd /home/bob/Data/ds-newtest2
Get connectome mapper output files (Brain Segmentation and Multi-scale Parcellation) used by Bob in his analysis¶
datalad get -J 4 derivatives/cmp/sub-*/ses-*/anat/sub-*_mask.nii.gz
datalad get -J 4 derivatives/cmp/sub-*/ses-*/anat/sub-*_class-*_dseg.nii.gz
datalad get -J 4 derivatives/cmp/sub-*/ses-*/anat/sub-*_scale*_atlas.nii.gz
Write datalad get commands to get_required_files_for_analysis_by_bob.sh for reproducibility:
echo "datalad get -J 4 derivatives/cmp/sub-*/ses-*/anat/sub-*_mask.nii.gz" > code/get_required_files_for_analysis_by_bob.sh
echo "datalad get -J 4 derivatives/cmp/sub-*/ses-*/anat/sub-*_class-*_dseg.nii.gz" >> code/get_required_files_for_analysis_by_bob.sh
echo "datalad get -J 4 derivatives/cmp/sub-*/ses-*/anat/sub-*_scale*_atlas.nii.gz" >> code/get_required_files_for_analysis_by_bob.sh
Add all content in the code/ directory directly to git:
datalad add --to-git code
Update derivatives¶
cd /home/bob/Data/ds-newtest2
mkdir derivatives/cartool ...
Save the state¶
datalad rev-save -m "Bob's test dataset on local \
workstation processed by cartool:|release|, {Date/Time}" \
--version-tag processed-<date>-<time>
Report on the state of dataset content¶
datalad rev-status --recursive
With DataLad with don’t have to keep those inputs around – without losing the ability to reproduce an analysis.¶
Let’s uninstall them – checking the size on disk before and after:
datalad uninstall sub-*/*
datalad uninstall derivatives/cmp/*
datalad uninstall derivatives/freesurfer/*
datalad uninstall derivatives/nipype/*
- Created by Sebastien Tourbier - 2019 Jan 8