website logo
Create accountLogin
Home
DCS
Node
Resources
Help center
Navigate through spaces
Home
DCS
Node
Resources
Help center
⌘K
Storj DCS
Get Started
AWS CLI and Hosted Gateway MT
AWS SDK and Hosted Gateway MT
Uplink CLI
Satellite Admin Console
Object Browser
Downloads
Download Uplink CLI
Download Self-hosted S3 Compatible Gateway
Download Storj Client Libraries
SDKs & Reference
Tutorial
How-to Guides
Concepts
Support
Support Overview
FAQ
Community Forum
Status Page
Help Desk
Billing, Payment & Accounts
Resources
Moved Documents
Docs powered by archbee 
11min

Hugging Face

Hugging Face (🤗 ) is a platform that allows developers to train and deploy open-source AI models. It's similar to GitHub in that it provides a space for developers to code and deploy AI applications, including language models, transformers, text2image, and more. One of the stand-out features of the platform is “🤗 Datasets” – which is a collection of over 5,000 ML datasets that are available for use. In this guide, we will walk through the processfor configuring HuggingFace Datasets with Storj using S3FS , until a storj-native integration pattern is defined

Prerequisites

  • Familarity and account with Hugging Face (see Quick Start Guide)
  • Familarity with Colab or equivalent environment to run code in (see Notebooks)
  • Storj S3 compatiable access and secret key (see Storj with AWS SDK)
  • A bucket created on Storj (see Create a Bucket)

Setup Storj with S3Fs

Storj will use s3fs in order to work with the Hugging Face APIs. First install some dependencies needed

Shell
|

Next enter your Storj S3 compatible access and secret key (see Storj with AWS SDK)

Python
|

Create a bucket (see Create a Bucket) from the dataset to be stored in. In this walk-through the bucket will be called my-dataset-bucket.

Transfer existing Hugging Face dataset to Storj

If your dataset is already on Hugging Face Hub, you can use the load_dataset_builder function to download and transfer it to Storj. It'll first download raw datasets to your specified cache_dir then prepare it to uploaded to Storj using the storage_options defined previously. Here we transfer the dataset imdb to Storj

Python
|

Save dataset to Storj

Once you've encoded a dataset, you can persist it using the save_to_disk method.

Python
|

Load dataset from Storj

Use the load_from_disk method you can download your datasets.

Python
|
Updated 24 Jan 2023
Did this page help you?
Yes
No
UP NEXT
Storj IPFS Pinning Service (Beta)
Docs powered by archbee 
TABLE OF CONTENTS
Prerequisites
Setup Storj with S3Fs
Transfer existing Hugging Face dataset to Storj
Save dataset to Storj
Load dataset from Storj