Machine Learning in Production @ CMU
Find resources related to teaching and research on how to build, deploy, assure, and maintain software products with machine-learned models. These cover the entire lifecycle from a prototype ML model to an entire system deployed in production, not just models or notebooks. Covers also the responsible ML engineering of such systems (safety, security, fairness, transparency) and MLOps.*
All materials (book, slides, assignments, bibliography) are released under creative commons licenses. We hope that this fosters teaching and research on these topics.
Maintained by Christian Kaestner.
The Course
Spring 2025 website
We teach a 12-unit course at Carnegie Mellon University on this topic, open to undergraduates and graduate students. We expect some minimal machine learning background and some programming skills, but no prior software engineering research. The course is always offered in the spring semester and often also in the fall.
For a description of topics covered and course structure, see learning goals.
The Book
Online version
A book on these topics will be published by MIT Press later this year under a creative commons license. The complete online version of the book is available here.
Annotated Bibliography
Annotated Bibliography
An (opinionated) annotated bibliography of academic papers in this space, covering a wide range of topics from research on testing to requirements to notebooks: https://github.com/ckaestne/seaibib
Open-Source ML Products
Awesome ML Products
A curated set of open source products that use machine learning. These are all end-user products that incorporate machine learning models, not libraries, research prototypes, or notebooks. We hope that this lists facilitates research on building products beyond the model-centric view of analyzing ML components: https://github.com/mlip-cmu/awesome-ml-products.
- 17-649 Artificial Intelligence for Software Engineering: This course focuses on how AI techniques can be used to build better software engineering tools and goes into more depth with regard to specific AI techniques, whereas we focus on how software engineering techniques can be used to build AI-enabled systems. Our application scenarios are typical web-based systems for end users, rather than tools for software developers.
- 05-318 Human-AI Interaction: Focuses on the HCI angle on designing AI-enabled products. Overlaps in some coverage on fairness, covers in much more detail user interface design and how to involving humans in ML-supported decisions, whereas this course focuses more on architecture design, requirements engineering, and deploying systems in production. Both courses are complementary.
- 17-646 DevOps: Modern Deployment, 17-647 Engineering Data Intensive Scalable Systems, and similar: These course cover techniques to build scalable, reactive, and reliable systems in depth. We will survey DevOps, and big data systems in the context of designing and deploying systems, but will not explore them in as much detail as a dedicated course can. We will look at MLOps as a ML-specific variant of DevOps.
- 10-601 Machine Learning, 15-381 Artificial Intelligence: Representation and Problem Solving, 05-834 Applied Machine Learning, 95-865 Unstructured Data Analytics, 10-718: Machine Learning in Practice, and many others: CMU offers many course that teach how machine learning and artificial intelligence techniques work internally or how to apply them to specific problems (including feature engineering and model evaluation), often on static data sets. We assume a basic understanding of such techniques and processes (see prerequisites) but focus on the engineering process for production ML systems.
- 17-691 Machine Learning in Practice: Applied machine learning with an focus on deployment and monitoring of models.
- 15-884 Machine Learning Systems, 10-714 Deep Learning Systems: Courses that focus on the systems aspect of building ML libraries, such as distributed learning and using hardware acceleration.
- 17-630 Prompt Engineering: Course focused specifically on prompt engineering for large language models.
- 10-613 Machine Learning, Ethics and Society, 16-735 Ethics and Robotics, [05-899 Fairness, Accountability, Transparency, & Ethics (FATE) in Sociotechnical Systems], and others dive much deeper into ethical issues and fairness in machine learning, in some cases diving deeper into statistical notions or policy. We will cover these topics in a two-week segment among many others.