Effective Machine Learning Teams 9781098144630

Gain the valuable skills and techniques you need to accelerate the delivery of machine learning solutions. With this pra

345 28 4MB

English Pages 193 Year 2023

Report DMCA / Copyright


Effective Machine Learning Teams

Table of contents :
Who Is This Book For
How This Book Is Organized
Part 1: Engineering Practices
Part 2: Product and Delivery Practices
Some Parting Thoughts
Conventions Used in This Book
Using Code Examples
O’Reilly Online Learning
How to Contact Us
1. Challenges and Better Paths in Delivering Machine Learning Solutions
Machine Learning: Promises and Disappointments
Continued Optimism in Machine Learning
Why ML Projects Fail
Macro-level view: barriers to success
Micro-level view: everyday impediments to success
Lifecycle of a story in a low effectiveness environment
Lifecycle of a story in a high effectiveness environment
Is There a Better Way? How Lean and Systems Thinking Can Help
But First, You Can’t “MLOps” Your Problems Away
See the Whole: A Systems Thinking Lens for Effective ML Delivery
Using Lean to Improve ML Delivery Systems
What is Lean, and why should ML practitioners care?
Prototype testing
Vertically sliced work
Vertically sliced teams, or cross functional teams
Ways of working
Measuring delivery metrics
Automated testing
Code editor effectiveness
Continuous delivery for machine learning (CD4ML)
Machine learning
Framing ML problems
ML systems design
Responsible AI
ML governance
Closing the data collection loop
Reducing data distribution shifts
Data security and privacy
An Invitation to Journey with Us
2. Effective Dependency Management: Principles and Tools
What if Our Code Worked Everywhere, Every Time?
A Better Way: Check Out and Go
Principles for Effective Dependency Management
Reproducible environments
Production-like development environments from day one
Application-level environment isolation
OS-level environment isolation
Tools for Dependency Management
Managing OS-level dependencies (with Docker)
Misconception 1: Docker is over-complicated and unnecessary
Misconception 2: I don’t need Docker because I already use X (e.g. conda)
Misconception 3: Docker will have a significant performance impact
Managing application-level dependencies (with Poetry)
A Crash Course on Docker and batect
What are Containers?
Where Will We Use Docker?
Reduce the Number of Moving Parts in Docker with batect
Benefit 1: Simpler command-line interface
Benefit 2: Local-CI symmetry
Benefit 3: Faster builds with caches
How to use batect in your projects
3. Effective Dependency Management in Practice
In Context: ML Development Workflow
What Exactly Are We Containerizing?
Hands-on Exercise: Reproducible Development Environments, Aided by Containers
1. Check out and go: Installing prerequisite dependencies
2. Create our local development environment
3. Start our local development environment
4. Serving the ML model locally as a web API
5. Configure our code editor
6. Training model on the cloud
7. Deploying model web API
Secure Dependency Management
Remove Unnecessary Dependencies
Automate checks for security vulnerabilities
Further Reading
4. Automated Testing: Move Fast Without Breaking Things
Automated Tests: The Foundation for Iterating Quickly and Reliably
Starting with Why: Benefits of Test Automation
If Automated Testing is so Important, Why Aren’t We Doing It?
Reason 1: We think writing automated tests slows us down
Reason 2: “We have CI/CD”
Reason 3: We just don’t know how to fully test ML systems
Building Blocks for a Comprehensive Test Strategy
The What: Identifying Components For Testing
Software logic
ML models
The How: Structure of a Test
Characteristics of a Good Test and Pitfalls to Avoid
Tests should be independent and idempotent
Tests should fail fast and fail loudly
Tests should check behavior, not implementation
Tests should be runnable locally
Tests must be part of feature development
Tests let us “catch bugs once”
Software Tests
Unit Tests
Designing unit-testable code
How do I write a unit test?
Training Smoke Tests
How do I write these tests?
API Tests
How do I write these tests?
Recommended practice: Assert on “the whole elephant”
Post-deployment Tests
How do I write these tests?
5. Automated Testing: ML Model Tests
Model Tests
The Necessity of Model Tests
Challenges of Testing ML Models
Fitness Functions for ML Models
Model Metrics Tests (Global and Stratified)
How do I write these tests?
Advantages and limitations of metrics tests
Behavioral Tests
Complementary Practices for Model Tests
Error Analysis and Visualization
Learn from Production by Closing the Data Collection Loop
Open-closed Test Design
Exploratory Testing
Means to Improve the Model
Designing for Failures
Monitoring in Production
Bringing It All Together
Next Steps: Applying What You’ve Learned
Make incremental improvements
Demonstrate value
6. Supercharging Your Code Editor with Simple Techniques
Why Should I Care? The Benefits (and Surprising Simplicity) of Knowing our IDE
If It’s so Important, Why Haven’t I Learned It Yet?
The Plan: Getting Productive In Two Stages
Stage 1: Configuring our IDE
Install IDE and basic navigation shortcuts
Create a virtual environment
Configure virtual environment: PyCharm
Configure virtual environment: VS Code
Testing that we’ve configured everything correctly
Stage 2: The Star of the Show – Keyboard Shortcuts
Code completion suggestions
Inline documentation / Parameter information
Auto fix errors
Move / copy lines
Rename variable
Extract variable / method / function
Reformat code
Navigating code without getting lost
Opening things (files, classes, methods, functions) by name
Navigating the flow of code
Screen real estate management
That’s it: You Did It!
Guidelines for setting up a code repository for your team
Additional tools and techniques

Polecaj historie