Newsletter #085 - ML in Production

This week a data scientist on my team discovered a pretty bad bug in one of our production applications. While it’s important to diagnose and fix every bug, what made this bug special was how it was diagnosed. The data scientist was analyzing historical outputs of the machine learning models powering the application when he noticed that the average model prediction had suddenly shifted by several order of magnitude at some point :O . We traced the issue back to a specific update made to the codebase and a few teammates were able to pitch in to fix the bug.

There are so many lessons we can unpack from this experience. A lot to learn from our mistakes.

First, this is an excellent example of what can go wrong in data-driven applications. The app is responsible for rendering decisions in real time which are based on the outputs of several machine learning models. Since we didn’t have proper guardrail metrics in place, the system continued to render decisions, even though the model outputs weren’t correct. I’d classify this as a silent failure – a system error that doesn’t explicitly raise an exception. How do we prevent this from happening again? One way is to monitor the model predictions themselves. While monitoring ML is a deep topic, we would have caught this issue immediately had we monitored even a simple metric like the average model prediction.

Other issues we discovered while diagnosing the bug are related to application logging and software testing. Although these are thought of as "software topics", data scientists and ML engineers need to care about them. If you’re deploying/operating an ML-powered application, you better realize that you’re in the business of software development. That means that all the things that are important to software engineers – clean and modular code, comprehensive test suites, detailed applications logs – should be important to you. Maybe you have a highly specialized team of software engineers that help you build applications, but maybe you don’t. Either way you need to get someone with this expertise in the room when designing and implementing your application.

But this leads to a tricky situation. What if no one on your team has software experience? How would you know whether you’re following best practices or not?

Chances are there are plenty of software engineers at your company that would be willing to examine your code and provide advice. Try to enlist some senior engineers that other devs look up to. These people have experienced all types of application issues and can quickly describe potential issues and how to improve you codebase. Frankly, good software engineers love to point out flaws in code, so you shouldn’t have to search too hard to find 1 or 2 of these people ; )

Here’s what I’ve been reading/watching/listening to recently:

Solving the time-travel problem in machine learning – When ML models are used to predict future events, data scientists need access to snapshots of feature data from the past to prevent data leakage during training. One way to access these snapshots is to "log and wait" for feature values to accumulate until we have enough data for model training. Another approach is to backfill the data by efficiently calculating historical feature values.
Bringing an AI Product to Market – This is the third post in O’Reilly’s series on AI Product Management, and discusses how to bring an AI-powered product to market. Core responsibilities of an AI PM include identifying the right problem and agreeing on metrics, planning and managing the project, and executing on the project roadmap by working on interface design, developing prototypes, and partnering with technical leaders. The post emphasizes the importance of experimentation in building AI products: "Lack of clarity about metrics is technical debt worth paying down. Without clarity in metrics, it’s impossible to do meaningful experimentation."
AI Product Management After Deployment – O’Reilly’s series on AI Product Management concludes with a post describing the an AI PM’s responsibilities after the product is deployed. Compared to traditional software engineering, PMs and the development team should remain heavily involved in managing operations in order to improve the model and pipeline and ensure that the product functions as expected and desired over time. This debugging process relies on using logging and monitoring tools to detect and resolve the issues that arise in a production environment. From my own experience managing products, I can confidently say that AI products cannot just be handed off to operations teams that don’t have ML expertise.
A step by step process to fix the root causes of most event analytics mistakes – Written by the former SVP Growth and Business Intelligence at Gojek, this post describes a process for planning and implementing an event tracking system to enable data driven decision making. This isn’t a machine learning specific issue, but does have serious ramifications for the type of data that you’ll be able to use for training ML models later down the road. If part of your responsibility is figuring out parts of your company’s products to instrument, then this post is for you. Here’s a quote: "Beyond all of the tooling, there is one foundational thing that makes or breaks any data initiative within a company – how you think about what to track, how to track it, and manage it over time. If you get these things wrong, the best tooling in the world won’t save you."
Scaling Airbnb’s Experimentation Platform – Over the last several years Airbnb has seen exponential growth in the number of experiments used to optimize the look and feel of their website or native apps, their smart pricing and search ranking algorithms, and the targeting of their email campaigns. Along the way they’ve evolved their Experimentation Reporting Framework (ERF) from a Ruby script to an application with a full UI, domain language, and series of Airflow jobs. Here they describe how the system evolved and discuss specific features.

That does it for this week’s issue. If you have any thoughts, I’d love to hear them in the comment section below!

Leave a Reply Cancel reply