Top ML Resources: Interview with Erik Bernhardsson

Erik-b-3

Below is an interview with Erik Bernhardsson, Chief Technology Officer at Better.com.

Our interview with Erik Bernhardsson is part of our interview series with the creators of some of the top ML resources we’ve come across online.

Erik’s top ML resource, Modeling conversion rates and saving millions of dollars using Kaplan-Meier and gamma distributions, is a piece on user acquisition that helps you calculate the signal earlier and make business decisions faster.

1.How did you get started working on machine learning? How have you progressed through the ML space?

I grew up programming, but studied Physics, so I always considered myself a programmer. At some point it became clear that tech is a pretty good industry to be in, so I doubled down and focused on sharpening those skills. But coming from a Physics background, I always loved math, and it felt like one way I could solve "hard" problems that maybe other programmers would not be able to. In 2006, me and some friends got interested in recommendation systems, and built a book recommendation system as a school project. Netflix Prize was a huge thing back then as well and we played around building some algorithms for it. I managed to convince Spotify to hire me in 2008 to build a music recommendation system despite not having much experience with ML. This was before deep learning, and machine learning was a bit more "underground", which meant that I had to build pretty much everything from scratch, so I think being a coder at heart was very helpful.

2. What kinds of machine learning problems do you work on today?

Not a ton. I used to love tinkering with hard problems, and still do – but spending 7 years at Spotify, I realized that I actually love being part of a startup journey even more. So I left in 2015 and took a job as a CTO of a small (7 people) company building an online mortgage experience. Five years later, we’re now 1,300 people and the company is doing quite well. I’ve started building up a data team here that I’m running in the interim, so I still get to do some data, but it’s somewhat limited. I’ve done some work building models for user conversion and have been playing around with probabilistic methods a bit just as a way for me to learn.

What challenges do you face as a machine learning practitioner? Libraries are 100x better than they were 10 years ago, but there’s still things that are a bit annoying to do. I find probabilistic stuff much harder than I think it should be, for instance. On the more infrastructure side, building complex pipelines is still a bit too hard. I built Luigi at Spotify for this purpose, and I’ve looked at Airflow, but I feel like it’s an open problem to solve well. I actually think visualization is another area where there’s a lot of unsolved problems

3. What’s your favorite machine learning tool? What problem does it solve?

Recently I’ve been using autograd as my go-to tool because it’s simple and I haven’t worked on very large scale type problems. I’ve been meaning to look at JAX as well which seems really solid to me; scikit-learn is always solid as well.

4. What differentiates successful industry ML projects from unsuccessful projects?

I think any successful project has to be built with a super commercial mindset. No one cares about whether you can get z% accuracy in your churn prediction model. Can you use the the model to give decision makers insights into how to reduce churn? That’s how you drive dollars for the business, and almost any project will be far more successful if it’s built with that goal in mind.

I’ve also seen issues where ML teams are their own isolated teams that are coming up with solutions no one asked for. Maybe they are great, but it’s very hard to sell something to another team which has their own priorities and backlog. It’s much better in that case to have ML practitioners embedded into teams and work as a part of that team’s prioritization.

Another issue is separating people building models from people implementing them in production. I don’t understand why you would deliberately do this – all the iteration speed goes out the window. I have pretty strong feelings that teams building models should also put them into production (doesn’t have to be the same person, but at least the same team).

5. What advice do you have for ML practitioners who are struggling to build machine learning solutions into products?

Solve for the business need first and make sure you’re building something that is valuable for the business. Then build something super incrementally. Start with a prototype that’s end to end that uses the most simple model you can think of that’s still enough to show some potential. Try to get that in front of people as soon as you can. Then iterate and make things better.

You can follow Erik Bernhardsson on Twitter here.

Leave a Reply

Your email address will not be published. Required fields are marked *