Gitcoin Data Portal

Mon Sep 11 2023

Last week, I went on a rabbit hole after coming across RegenData.xyz, an initiative to collect and surface grants data. I wanted to explore the idea of a fully local and open data portal.

If you don’t like prose that much, code is available at GitHub. Check it out!

At Protocol Labs, I’ve done something similar with the Filecoin Data Portal; an open source and local friendly data hub for Filecoin data (chain and off chain). Think of Dune, but in your laptop! This portal at the same time was based in the ideas I started exploring with Datadex a while back. Turns out that chain data is such a great fit for testing things that might work in the the larger “open data movement”. Chain data is useful/needed, open, and, immutable! All great properties for this kind of approach.

Gitcoin Grant data is being exposed via the Indexer Grants Stack Data API, so, the work was mostly porting things out and dealing with the nitty gritty of the data!

Exploring the Data

Before jumping into the portal itself, I wanted to see the kind of data exposed and which format/schemas were available. That means downloading the entire indexed dataset to my laptop. The fastets solution I found was to use lftp. In this case, the command was:

lftp -c 'mirror --parallel=200 https://indexer-grants-stack.gitcoin.co/data/1/ ;exit'

Is not the fastest thing in the world as it has to paginate through all the folders, but will download all the JSONs to your machine!

Then, I joined the JSONs locally (doing ugly things like cat */rounds/*/votes.json | jq .[] -c > round_votes.json) and started running SQL queries on top of them with DuckDB.

These steps alone give you the ability to query all Gitcoin Grant data with SQL in your laptop. Easy and fast!

The two main takeaways (which could probably derived from the Allo Indexer repository) I came up are:

The Data Portal

Since I already worked on the Filecoin Data Portal, I went with the same approach, reuse the Datadex ideas and stack.

Gitcoin Data Portal Repository in Github

The first thing was to create the relevant Dagster assets. These act as the extract and load part of the pipeline and are later transformed by dbt with SQL queries. Data is finally exposed via Parquet files on top of IPFS. A notebook is rendered as the project website with each push too.

All of the avobe unlocks some interesting properties relevant to Gitcoin and decentralized data in general:

Pretty cool, right?

Example

The easiest thing you can do right now is explore the generated website with some query examples. You can reproduce them by running the following query (computing the top rounds by votes) in shell.duckdb.org.

select
    round_id,
    count(id)
from read_parquet('https://bafybeieaztvldk23xghlpmzjz5ppry5jrd6bi2kag6q73huckhfrlrabby.ipfs.w3s.link/round_votes.parquet')
group by 1 order by 2 desc limit 10;

DuckDB Example

Next Steps

The main thing I’d love to explore next is how to remove the Allo Indexing data alltogether and read everything from on-chain sources. Not sure how efficient can that be, but it would be a great experiment!


That’s all for now. Reach out if you have any feedback or ideas!

← Back to home!