- Analytics Engineering is the organization of an organization’s information.
- Real impact comes from continuous decision-making and implementing actions with feedback.
- Accept that analytics is a mess.
- Explicitly create separate workspaces for curated (production) and messy (experimental) work.
- Reports are rarely read, and often forgotten. [[Data Culture|Decision-making involves getting data, summarizing and predicting ad then taking action]].
- One of the best ways to communicate data is telling stories. Stories are more captative and present a coherent view around a topic.
- The analytics engineer workload is a lot like being a data librarian.
- If you are running a library you have these books coming in and you have people who are looking for books on specific topics and you’ve got to figure out a way to organize all those books so that all those people can find what they need. There are many different ways to organize books, not just one perfect solution. A librarian is interested in helping people find the books that they’re looking for but also discovering new books that they didn’t realize that they were looking for.
- Analytics code should be version controlled, tested, modular and maintainable.
- Define all resources ([[Dashboards]] in YAML, Cohorts in SQL, …) as code.
- Treat data the same way engineers treat code. That means CI/CD, tests, frequent PRs, …
- Use [[Data Practices#Data Request Template]] when getting questions.
- Analytics work can be roughly split in two buckets:
- Building automated [[Systems]], from metrics to [[Dashboards]], to enable self-service use cases for business users. This is what we now typically call analytics engineering.
- Doing ad-hoc analyses, to answer some questions directly.
- Make your modeling approach explicit (e.g. Dimensional Modeling).
- Modeling reality gets complex quickly. There are small nuances, special conditions, things that changed, edge cases and, of course, errors.
- Imagine your company today as a human society where only half the population can read (understand the data), one tenth can write (SQL queries), where half a dozen languages are spoken, and where most of the books ([[Dashboards]]/insight reports) in the library contain things that once were true but have since been outdated (but you don’t know which ones). Not a highly productive information ecosystem.
- Domain knowledge is more important than your coding skills.
- Ground truth isn’t a single place. Start by joining on common unique keys and counting things, then figure out what’s different and why.
- [[Teamwork|Collaborate with your team]] and break down complex models into reusable pieces.
- Working with data is like exploring the horizon. It changes as soon as you look it from a higher place (more data).
- Find out what decisions your stakeholders need to make, repeatedly, and help with those.
- Attach a date to your team output resources ([[Dashboards]], analysis, …) so they exist as artifacts that were true at a certain point on time.
- Reduce the areas where business logic can be injected, create “time to live” policies on last mile transforms, build a culture of standardizing + celebrating access to cross-functional codebases.
- Modern data warehouses might need new model design paradigms.
- Good data models make good products.
- Behind every null value there is a story.
- Take credit for your work. Get close to the business, figure out what the executives care about, and communicate value not the work.