mlip-cmu

Machine Learning in Production @ CMU

Find resources related to teaching and research on how to build, deploy, assure, and maintain software products with machine-learned models. For example, how to integrate a voice-to-text model and an LLM into a video conferencing product to create automated meeting summaries. We cover the entire lifecycle from a prototype ML model to an entire product deployed in the real world, not just models or notebooks. Covers also the responsible ML engineering of such systems (safety, security, fairness, transparency) and MLOps.

All materials (book, slides, assignments, bibliography) are released under creative commons licenses. We hope that this fosters teaching and research on these topics.

Maintained by Christian Kaestner.

The Pitch

The following talk motivates the entire endeavor, explains the need to focus on engineering the entire system, not just the model, and runs through what this means from the lens of quality assurance (model testing to system testing):

The Book

Print & Ebook version Online version

A book has been published MIT Press as open access. All author proceeds are donated to Evidence Action. The complete book is available online under a creative common license here.

Book cover

The Course

Spring 2025 website

We teach a 12-unit course at Carnegie Mellon University on this topic, open to undergraduates and graduate students. We expect some minimal machine learning background and some programming skills, but no prior software engineering research. The course is always offered in the spring semester and often also in the fall.

Course topics overview

For a description of topics covered and course structure, see learning goals.

Annotated Bibliography

Annotated Bibliography

An (opinionated) annotated bibliography of academic papers in this space, covering a wide range of topics from research on testing to requirements to notebooks: https://github.com/ckaestne/seaibib

Open-Source ML Products

Awesome ML Products

A curated set of open source products that use machine learning. These are all end-user products that incorporate machine learning models, not libraries, research prototypes, or notebooks. We hope that this lists facilitates research on building products beyond the model-centric view of analyzing ML components: https://github.com/mlip-cmu/awesome-ml-products.