Top ML Resources: Interview with Veronika Megler, PhD

Veronika-M

Below is an interview with Veronika Megler, PhD, Principal Data Scientist; Big Data & Analytics at Amazon.

Our interview with Veronika Megler, PhD is part of our interview series with the creators of some of the top ML resources we’ve come across online.

Veronika’s top ML resource, Training models with unequal economic error costs using Amazon SageMaker, thoroughly demonstrates how to use a custom cost function to incorporate the cost of different types of errors in the model training process.

1. How did you get started working on machine learning? How have you progressed through the ML space?

I’ve cycled regularly through programming, data analytics, and technology strategy/consulting roles. In one of my recent cycles back to the technical side I earned my PhD, focused in data search. That was followed by a post-doc that was pure IoT data science, blending data analytics and programming aspects – and refreshing my understanding of the plentiful sources of error. At AWS ProServe I moved back towards Big Data, then back into data science as ML adoption started picking up. My more recent work on managing larger and higher-impact machine learning projects then pulled my strategy and executive consulting expertise back in.

2. What kinds of machine learning problems do you work on today?

I work with large corporations on their highest-impact problems, where the financial consequences are very real. I focus less on the actual model – these days they’re becoming increasingly commoditized – and more on what’s around it. Understanding the data, how it’s produced and what assumptions it makes. Whether the model produced is solid, or merely providing a good-looking result due to multiple testing or other fallacies. Then I focus on whether the model results meaningfully affect the business outcomes – and what risks are being taken in the process.

3. What challenges do you face as a machine learning practitioner?

The most interesting problems to me are where models based on flawed data are used to make consequential business decisions. Every dataset provides a partial and imperfect model of reality. Business managers have their own model of reality in their heads – frequently a causal model. The ML model is itself an (imperfect) model, inferred from the data. How do you blend these three imperfect models together in a practical way? How do you maximize the overlap between these models? How do you understand and manage the risks you’re taking?

I’ve recently started to fully understand how frequently business management expects an ML model to represent causes and predict the results of interventions. They themselves exist in a probabilistic world – “this decision will probably give us better results than the alternatives”. I’m excited at some of the work I’m doing with our customers in identifying methods to address these challenges.

4. What’s your favorite machine learning tool? What problem does it solve?

My favorite machine learning tool is Jupyter notebooks in SageMaker. It lets me easily iterate and visualize my data and my results so far, while also letting me scale my models and analyze their results. I can then build my production model without leaving the environment. I can spend more time thinking about and solving business problems, and less about the technology.

5. What differentiates successful industry ML projects from unsuccessful projects?

I’d say, use of the best practices outlined in Managing Machine Learning Projects! Seriously, though, those best practices are described there for a reason. They allow ML practitioners to communicate more effectively with the stakeholders about the opportunities, the challenges and the risks. That allows the stakeholders to adjust their expectations, provide appropriate resources, and to communicate back decisions or modifications needed in the model, the use of the inferences or the business processes surrounding them. These are key factors in reliably producing successful ML projects. Even killing an ML project for the right reasons – it won’t fulfill the expectations, or the data is discovered to be flawed – can be a success.

6. What advice do you have for ML practitioners who are struggling to build machine learning solutions into products?

Think carefully about what you’re doing and why, and how the ML solution fits into the business process and solves the business problem. Then, question your assumptions. It’s easy to get caught up in the technology and in automating it, but the real value comes in fitting thoughtfully and seamlessly into your environment.

You can connect with Veronika Megler, PhD on Linkedin here.

Leave a Reply

Your email address will not be published. Required fields are marked *