Hugging Face (🤗 ) is a platform that allows developers to train and deploy open-source AI models. It's similar to GitHub in providing a space for developers to code and deploy AI applications, including language models, transformers, text2image, and more.
One of the stand-out features of the platform is “🤗 Datasets” – which is a collection of over 5,000 ML datasets that are available for use.
Familiarity and account with Hugging Face (see Quick Start Guide)
Storj S3 compatible access and secret key (see Getting started)
A bucket created on Storj (see Create buckets)
Storj will use s3fs in order to work with the Hugging Face APIs.
First, install some dependencies needed.
Next, enter your Storj S3 compatible access and secret key (see Getting started)
Create a bucket (see Create buckets) from the dataset to be stored in. In this walk-through, the bucket will be called
If your dataset is already on Hugging Face Hub, you can use the load_dataset_builder function to download and transfer it to Storj. It'll first download raw datasets to your specified
cache_dir, then prepare it to uploaded to Storj using the
storage_options defined previously.
Here we transfer the dataset imdb to Storj.
Once you've encoded a dataset, you can persist it using the
load_from_disk method so you can download your datasets.