After a lot of thought, I’ve decided to stop publishing my weekly newsletter for the time being. It wasn’t an easy decision to make, but I feel that it’s the right one for me, given where I’m at in my life right now. I won’t go into too much detail, but I’d like to briefly explain the reasoning behind my decision.
When I published the first issue of the newsletter I was a data scientist at 2U. As an individual contributor, I was focused on the technical work of building, deploying, and operating machine learning models in production settings. When I was promoted to Director, my focus began to shift away from purely technical aspects of ML. However, my responsibilities were split about 50-50 between managing a small team and continuing to work on implementation.
I’m proud to say that I’ve continued to grow at 2U and so have my responsibilities. Today I’m managing a much larger team, overseeing several more projects, and thinking through how we can expand the data science practice at the company. One reason I’m choosing to pause my newsletter is to focus on my new responsibilities at 2U. I believe that one’s manager is one of, if not, the most important aspects of an employee’s work life. Because of that, I want to be the best manager possible for my team. I’m obsessed with the concept of mastery, and I know I have a lot to learn about being a great manager. Therefore I want to focus as much as possible on improving in this role.
There are 2 personal reasons behind my decision.
As the son of immigrants, I’m a serious believer in hard work. Taking time off is very hard for me. Sometimes this is good, but other times this can really detract from my life. For instance, I wrote about experiencing burnout and depression in my post on why I started MLinProduction. I’m super proud of how MLinProduction has evolved over the last 2 years, but a few months ago I found myself on the brink of another burnout. I was juggling my course on SageMaker, a consulting engagement, a few sponsored blog posts, my full time job, and a newborn. I took all the time I gained from coronavirus shutdowns and reinvested it into MLinProduction. Luckily, I’ve experienced enough to know that I can’t sustain this pace. So I’m consciously choosing to step back.
The second personal reason, and arguably the most important reason of them all, is my son. I’m the proud father of a beautiful 7 month old boy and I want to spend as much time as possible with him right now. I’ve been told repeatedly that kids grow up fast. I’m choosing to heed this counsel and enjoy my moments with him. Plus, he’s a hell of a lot cuter than my Sublime text editor.
So for the time being, this will be my last weekly email (for now). It’s been a true pleasure writing to you for the last 2 years.
Here’s what I’ve been reading/watching/listening to recently:
- The definitive guide to AI monitoring – If you can get past the click-bait-y title, this blog post presents 5 useful classes of metrics to track for ML systems. These metrics seem to be ordered in terms of increasing complexity and required infrastructure – beginning with model performance metrics where predictions are compared to actuals and ending with metadata and performance measures captured during training and testing time. Useful if you’re thinking through what metrics you should monitor for your production ML systems.
- Data Quality at Airbnb – As Airbnb scaled from a startup to a mature organization with thousands of employees, the company realized it had to revamp their entire process to guarantee data timeliness and quality. This post summarizes their Data Quality Initiative, a company wide investment in data ownership, architecture, and governance. If your company is growing rapidly and and needs to scale its data infrastructure, this is a super valuable post.
- Bootstrapping prediction intervals – While confidence intervals measure uncertainty around a statistics like means or model parameters, prediction intervals measure uncertainty around single values and can be used to estimate the probable interval in which the outputs of a regression model can be expected to occur. The author of this post does a really nice job of explaining a general way to compute prediction intervals for generic regression models (ie nonlinear models like random forests). +1 to the author for including a discussion of the math, a Python implementation of the algorithm, and a simulation all in a single, easy-to-read post!
- 4 Principles for Making Experimentation Count – A data scientist in Growth at Airbnb details 4 principles that helped the company scale from running 100 experiments a week to over 700. Key pieces of advice include adding sanity metrics to experiments to ensure proper user exposure and understanding base rates of the phenomenon you’re testing (this one is something I’ve been focusing on recently).
- The Engineering Problem of A/B Testing – It’s one thing to say you want to A/B test a product change, it’s a whole other thing to actually perform the tests. Engineering requirements for running tests vary widely depending on the type of application you need to test (e.g. mobile app vs single page site vs microservice). This brief post lists several key engineering features needed and five ways to implement A/B testing. (Regular readers will note that I’ve been sharing a lot of A/B testing resources lately. We’re ramping up experimentation at 2U so I’ve been catching up on best practices. Here’s another engineering-style link demonstrating how to implementing A/B testing with Kubernetes and Istio.)
That does it for this week’s issue. If you have any thoughts, I’d love to hear them in the comment section below!
Hello Luigi. I wanted to ask if there’s a way to read up on your newsletters from the first one you sent. I can see links for the last 5 you wrote. I am interested in the Monitoring and Deployment side of ML so I find your blog very valuable. This is why I’m asking if there’s a way to start reading your newsletters from the first. The M&D side of machine learning is something I have very little knowledge in
Hi Joshua. Thanks for your question. There’s currently no way to see an archive, but I will be publishing an archive in the next few months. I’ll email my subscribers when the archive is published.