One of the most rewarding aspects of my role as a Director of Data Science is leading the hiring process for new data scientists. Collaborating with other leaders to craft the interview process is extremely challenging and fascinating at the same time.
Generally speaking, hiring is tough because of time and resource constraints. At the aggregate level, the HR team receives hundreds to thousands (I’m not kidding) of resumes for a single data scientist position. Interviewing each of these candidates isn’t realistic, so the HR team intensely filters which candidates make it to the next round of interviews.
At the individual level, hiring managers have relatively little time to determine whether a given candidate will add significant value to their organization. "Relatively little time" might seem ridiculous if you’re a candidate whose been subjected to multiple rounds of grueling technical interviews. But suspend this thought for a moment and think like an owner. Consider the organization’s challenge: to extract enough data through an imperfect interview process to predict the long term success of an individual at this company.
While hiring is difficult in general, hiring in data science has its own set of specific challenges. One factor that challenges both interviewers and interviewees is the highly variable responsibilities data scientists hold across different companies. What’s expected of a data scientist at one company is typically very different than what’s expected at another company. This results in frustration on both sides. The data scientist with years of A/B testing experience wonders why he/she never heard back from the interviewer. The hiring manager seeking to add machine learning expertise to the team continues the search.
This heterogeneity stems somewhat from how new the role of data scientist is. The role has only been around for the last 5-10 years, mostly in tech. As companies across industries begin to hire data scientists and adapt the role to their needs we should see continue to see diversity in job descriptions, required skill sets, and experience.
When you combine these challenges we see that hiring in data science remains challenging due to a lack of established best practices. So let me leave off with a few recommendations for both hiring managers and individual data scientists.
My Advice For Hiring Managers
If you’re in charge of hiring data scientists, figure out the type of data scientists you need on your team before posting a job description. Spend time writing down the profile of your ideal candidate. Are you looking for a deep learning expert or someone who’s well versed in A/B testing methodology? Do you want generalists that can transition between modeling and engineering or specialists with advanced academic degrees? Beyond specific characteristics, take the time to understand which trade-offs you’d be willing to accept. Know that you probably won’t find the perfect candidate, but you should know what a minimal viable candidate looks like.
With this information in hand, craft job descriptions specific to these profiles. These descriptions will help prevent unqualified candidates from applying. They’ll also make it easier for HR to filter out the overzealous, yet unqualified, candidates who decide to test their luck and apply anyway.
Use your ideal candidate profile to craft specific questions that test for the qualities you desire. Coming up with good questions takes time. Keep your list of questions consistent across interviews to accurately compare candidates. but don’t be afraid to ask each candidate a couple of additional questions about their specific experiences. Exploit and explore.
My Advice For Data Scientists
The main piece of advice I have for data scientists looking for a new position is to know thyself. What I mean by this is to determine objectively what skills you possess, what previous experiences you’ve had (and what you’ve learned from them), and what new skills and abilities you desire to learn.
Be specific by writing a short blog post about yourself. Don’t worry about sharing this post with others (although I’m not discouraging that either). Writing is thinking. If you can’t write down what you bring to the table as a data scientist, chances are you won’t be able to explain your abilities to a potential employer either.
Another tip is to research the company you want to work for. Think about their data problems. What skills do you possess that would help solve these problems? If a company has a data science or tech blog, read their blog posts (every single one of them)! And reference these posts during the interview process. As I mentioned above, data science roles at different companies require varied skill sets, so you should expect the interviews to differ as well. Knowing what problems a company is working on prepares you to speak to their specific issues and concerns.
If you have any questions about hiring in data science, either from an employer or employee perspective, leave a comment on this post or send me a tweet @MLinProduction.
Great article and highly informative whether you are a hiring manager or a job seeker.
Thanks, Matt!
Hi Luigi,
I have been enjoying your posts since the start of this year. Thank you for bringing different perspectives in these crowded blog market.
I have a question. What skills do hiring managers look for in machine learning engineers (my interested fields are connected autonomous systems and financial systems) and if you shed light on how to develop them specifically because doing projects in these quiet difficult.
I’m glad you’re enjoying the blog, Srujan! Thanks for the kind words.
The advice I provided for data scientists definitely applies to machine learning engineers as well. In terms of skills that are specific to MLEs, I’d say an understanding of machine learning fundamentals coupled with strong software/data engineering skills are very important. You could develop these skills in a number of ways. One of my favorites is thinking about how to build a product that’s out in the market and powered by machine learning. An example I like to use is UberEats estimated-time-to-delivery feature. Think through how the data flows through the application. Where does the training data come from? What about the inference data? What kind of components would be necessary to build this up from scratch? Can you outline the key components of this?
Of course, that’s a complicated system. You can start with something much smaller.
Thank You very much, Luigi, for replying.
What you have mentioned is interesting and I used to think in those lines, if not complete system design when you share about the different companies’ machine learning production systems. But never knew how to start, approach, and implement those processes. Coincidently, I came across a blog where the author has implemented the Airbnb Amenity paper from data collection to a production model allowing the Airbnb data science teams’ medium article.
I hope to follow that and make another project this summer.
I hope to continue reading your articles too.
Hi Srujan. That sounds great. The fact that he wrote a blog post about it is even better. Can you provide a link for that? I’d like to feature it in my newsletter!
This may not be what Srujan was referring to, but here’s an entertaining vlog on the airbnb amenity detection problem: https://www.youtube.com/watch?v=C_lIenSJb3c
Another great article Luigi. I am glad that I stumbled across your blogs . They are full of wisdom & I am learning a lot from them . There is so much information in blogs these days but only few are of practical importance & goes beyond the basics . Yours is among the very few . Please keep writing .