Atmospheric Data Portals

Thu Mar 26 2026

I wrote this quickly more as a sketch for people already familiar with atproto and the open data discovery challenges than a fully self-contained post.

This post can be seen as a continuation of my Open Data wranglings. Check them out if you are interested in some ideas to solve issues at earlier stages of the Open Data pipeline.

In the last couple of days, I’ve come across two interesting projects that are working on making “datasets discoverable” using the AT Protocol: Matadisco and atdata. This is something I’ve been thinking about for a while, and I wanted to write down some thoughts and ideas!

The AT Protocol has a real chance to improve the main issue these projects point to: finding useful data is hard. A big part of the difficulty comes from the current state of things: isolated portals, weird APIs, lack of reputation and usage signals, … Turns out, data discovery is also a people problem! And the best hammer protocol we have for this kind of social problem is indeed the AT Protocol.

The beauty of designing data portals on top of atproto is that we get packaging and indexing at the same time, relatively for free. Until now, data package managers have had to deal with both on their own 1.

So, how would I do it? Here are some ideas I haven’t seen in these projects and think are interesting.

Basically, whatever comes out of this should fit existing storage, files, and publishing habits and not require migration into a blessed stack (programming laguage, platform, …). It should allow anyone to mirror, fix, annotate, and republish datasets.

There are many interesting ideas to follow up too! Curating datasets into collections, reputation, or simple things like linking datasets. I’m very excited to continue these discussions and see where we go! For now, I think starting with something like this would be enough to see some interesting atmospheric data portals pop up.

Footnotes

  1. For example, Hugging Face Datasets offers a git-repo-like space for you to put the data and a metadata file. Since they own that space, they can search across all their datasets. Easy, but not open or decentralized. On the other hand, you have the Data Package spec, used by organizations like OWID and maintained by a more neutral actor. The issue there is discovery. The best you can do today is search GitHub. There are many more examples here.

← Back to home