Prepare datasets

You have to download the datasets you needed to their path respectively. Datasets are stored in data/ folder. For example, ImageNet100 is stored at data/imagenet. If data/ folder is not yet created, you may create it in the project root by:

mkdir data

Warning

We do not own these datasets, please read and agree to the original authors’ license before download and use.

CIFAR10

There is no need of extra downloading for CIFAR10 dataset. The dataset will be downloaded automatically at the first time of running.

ImageNet100

This is a subset/variant of original ImageNet dataset, we do not own the dataset. Please head to ImageNet official website to read and accept their term of uses and license before download the dataset here (updated Aug 2022) or huggingface (updated Feb 2026). This is a subset with most common 100 classes.

Download and extract the archive at data/ directory:

# at <project_root>/data/
tar -xzf imagenet.tar.gz

NUS-WIDE

This is a subset/variant of original NUS-WIDE dataset, we do not own the dataset. Please head to NUS-WIDE official website to read and accept their term of uses and license before download the dataset here or here (updated Feb 2023) or huggingface (updated Feb 2026). This is a subset with 21 most common classes.

Download and extract the archive at data/ directory:

# at <project_root>/data/
tar -xzf nuswide_v2_256.tar.gz

MS-COCO

This is a subset/variant of original MS-COCO dataset, we do not own the dataset. Please head to MS-COCO official website to read and accept their term of uses and license before download the dataset here (updated Aug 2022) or huggingface (updated Feb 2026).

Download and extract the archive at data/ directory:

# at <project_root>/data/
tar -xzf coco.tar.gz

MIRFlickr-25000

This is a subset/variant of original MIRFLICKR dataset, we do not own the dataset. Please head to MIRFLICKR official website to read and accept their term of uses and license before download the dataset here (updated Aug 2022) or huggingface (updated Feb 2026).

Download and extract the archive at data/ directory:

# at <project_root>/data/
tar -xzf mirflickr.tar.gz

Stanford Online Product

This is a subset/variant of original Stanford Online Product dataset, we do not own the dataset. Please head to Stanford Online Product official website to read and accept their term of uses and license before download the dataset here (updated Aug 2022) or huggingface (updated Feb 2026).

Download and extract the archive at data/ directory:

# at <project_root>/data/
tar -xzf sop.tar.gz

References

We thank the following repository for the processed datasets:

https://github.com/TreezzZ/DSDH_PyTorch (imagenet100,nuswide)
https://github.com/thuml/HashNet/tree/master/pytorch (coco)