Machine Learning

Production Project Checklist

  1. Frame the problem. Define a clear and concise objective with clear metrics. Write it as a design doc.
  2. Get the data. Make the data tidy. Machine learning models are only as reliable as the data used to train them. The data matters more than the model. Data matters more than the model.
  3. Explore the data. Verify any assumptions. Garbage in, garbage out.
  4. Create a model. Start with the simplest model!. That will be the baseline model. Evaluate the model with the defined metric.
  5. Make sure everything works end to end. You design it, you train it, you deploy it. Deploy the model quickly and automatically. Add a clear description of the model. Monitor model performance in production.
  6. Make results (models, analysis, graphs, …) reproducible (code, environment and data). Version your code, data and configuration. Make feature dependencies explicit in the code. Separate code from configuration.
  7. Test every part of the system (ML Test Score): data (distributions, unexpected values, biases, …).
  8. Iterate. Deliver value first, then iterate. Go back to the first point and change one thing at a time. Machine Learning progress is nonlinear. It’s really hard to tell in advance what’s hard and what’s easy.
  1. Explain your results in terms your audience cares about. [[Data Culture|Data is only useful as long as it’s being used]].

These points are expanded with more details in courses like Made With ML.

Tips

ML In Production Resources

Machine Learning Technical Debt

Tech debt is an analogy for the long-term buildup of costs when engineers make design choices for speed of deployment over everything else. Fixing technical debt can take a lot of work.

Resources