Data Engineering

Data Pipelines

Data Pipelines are a set of actions that extract data, transform it, and then load the final data somewhere. As any distributed system, they’re tricky to work with. These are some great principles to keep in mind as production data engineering is mostly computer science.

Systems tend towards production and data pipelines aren’t an exception. Valuable data work and outputs end up being consumed in use cases that are increasingly more important / production grade.

Basic Principles

Data Flow

graph LR;
 A-->B;
 B-->D;
 C-->D;

Great Blog Posts