DuckDB with IPFS CID's

Wed Feb 22 2023

Thanks to fsspec, you can query arbitrary filesystems with DuckDB quite easily.

To do so, you need to register a fsspec filesystem on DuckDB. Since IPFS has a supported fsspec plugin, ipfsspec, we can register it and start to query directly it with SQL.

If you want to follow along, you’ll need to install ipfsspec, duckdb and fsspec. You can do so with:

pip install git+https://github.com/fsspec/ipfsspec duckdb fsspec

Now, let’s register the IPFS filesystem on DuckDB:

import duckdb
from ipfsspec import AsyncIPFSFileSystem

ipfs_fs = AsyncIPFSFileSystem()

duckdb.register_filesystem(ipfs_fs)

Once the filesystem is registered, you can use CIDs as URIs inside read_csv_auto or read_parquet!

The bafybeif5reawvqtsoybj5fhdl4ghaq3oc7kzepuws26zawkjm4johlv3uq CID is a CSV file. Querying it is as simple as:

>> cid = 'bafybeif5reawvqtsoybj5fhdl4ghaq3oc7kzepuws26zawkjm4johlv3uq'
>> duckdb.sql(f"select * from read_csv_auto('ipfs://{cid}')")
┌────────┐
│   c    │
│ int64  │
├────────┤
143732
└────────┘

For Parquet files, you can do the same with read_parquet:

>> cid = 'bafkreibnx5q6qwxobozkdm6xt7ktvwciyfvtkgy7fud67w5oyxnf5tch4e'
>> duckdb.sql(f"select * from read_parquet('ipfs://{cid}')")
┌─────────────────────┬───────┬───────────────┐
│       entity        │ year  │ literacy_rate │
│       varchar       │ int32 │    double     │
├─────────────────────┼───────┼───────────────┤
│ Afghanistan         │  200028.1
│ Albania             │  201196.8
│ Algeria             │  200672.6
│ American Samoa      │  198097.0
│ Andorra             │  2011100.0
│ Angola              │  201170.4
│ Anguilla            │  198495.0
│ Antigua and Barbuda │  201199.0
│ Argentina           │  201197.9
│ Armenia             │  201199.6
│    ·                │    ·  │            ·  │
│    ·                │    ·  │            ·  │
│    ·                │    ·  │            ·  │
│ Uruguay             │  201098.1
│ Uzbekistan          │  201199.4
│ Vanuatu             │  201183.2
│ Vatican             │  2011100.0
│ Venezuela           │  200995.5
│ Vietnam             │  201193.4
│ Wallis and Futuna   │  196950.0
│ Yemen               │  201165.3
│ Zambia              │  200761.4
│ Zimbabwe            │  201183.6
├─────────────────────┴───────┴───────────────┤
215 rows (20 shown)               3 columns │
└─────────────────────────────────────────────┘

Voilà!

← Back to home!