Building An Effective Experimentation Program (Experimentation Program Series: Guide 01)

Seamless. Frictionless. Elegant. Efficient.

These are some of the words used to describe the products and services offered by the world’s largest and most successful businesses. For instance, unwrapping a package and holding the physical copy of a book whose image you clicked on in an unprompted email from Amazon the previous day is nothing short of magical.

A bit too magical to have happened by chance. But it’s very likely that this experience wasn’t crafted in some sort of top-down, divine-intervention-like manner either.

Instead, many of these products and services went through hundreds, sometimes thousands, of iterations, with teams of engineers, designers, product managers, data scientists and others, tweaking every conceivable aspect of the experience to arrive at a new version that’s slightly better than the previous iteration.

In each case teams ran randomized controlled experiments, tweaking particular parts of their products, releasing them into the wild to be consumed by users, and measuring the results in a statistically rigorous manner. These teams can be thought of as demigods of multiple parallel universes, where the only difference across such universes might be the color of a button on a webpage. Winning universes are selected on the basis of optimizing some metric, such as total revenue generated from a webpage, yielding a lineage of product enhancements determined through an evolutionary process.

Many business leaders today are familiar with examples of companies that have evolved their products and services, and correspondingly optimized their profit-and-loss statements, through experimentation. These executives are eager to improve the performance of their own business units, but don’t have the know-how or technical resources to run experiments without external support.

Data science teams don’t need to be convinced of the benefits of running experiments. But often they lack the business knowledge, cross-team relationships, and structured processes for engaging with business teams and helping them optimize their products through experimentation.

Building an Experimentation Program from Scratch

Over the past 16 months I’ve been heavily involved in the development and execution of an experimentation program at 2U. Our goal has been to drive positive business outcomes through experiments and has required us to develop the processes, infrastructure, and institutional knowledge and relationships necessary to through the use of trustworthy online controlled experiments.

While a multitude of sources exist for learning how to perform the required technical parts of an A/B test, for example, how to randomize units into variants or how to conduct statistical hypothesis tests, there’s very little content out there that comprehensively describes the process of creating an experimentation program from scratch. My goal in this blog series is to help fill that void by describing my experiences designing, implementing, and iteratively improving an experimentation program (which I’ll refer to as an ExPr for short).

These posts are mainly intended to help data science teams, specifically the leadership of those teams – including chief data scientists, VPs, and senior managers – and data science product managers. But I believe that individual contributors, i.e. data scientists, data analysts, and data engineers, will also find the content valuable. This is especially true for ICs that are leading experimentation initiatives or who are considering paths into data science management.

This Blog Series is a Summary of My Experiences

This series of posts is based on, and subject to the limitations of, my experience over the last 16 months:

I manage a medium sized team of 5 data scientists and 3 engineers and work very closely with a dedicated PM.
We’ve focused on optimizing the operational efficiency of our tech-enabled service stack by testing the impact of specific human interventions. These experiments are more appropriately classified as offline field tests rather than online, software-based experiments. That said, we are also running software-based (i.e. "online") experiments that are more familiar to folks in the tech industry.
Typical sample sizes for these experiments are in the 1000s-10,000s units range. We’ve focused on understanding how to run small-medium sized experiments.

Prior to developing our ExPr, we had no direct experience with A/B testing. No internal processes for managing experiments existed, nor was there any infrastructure in place for running experiments. But after 16 months of research, development, mistakes, and iterative improvements, we’ve:

Run 10s of experiments.
Operationalized the results of multiple successful experiments, which senior leadership has credited for substantially increasing operational efficiencies.
Developed an effective working model for interacting with diverse stakeholders from across our business to ideate, prioritize, and implement experiments.
Developed an infrastructure that helps us easily design, launch, and analyze controlled experiments.

The Plan for this Series

With that background out of the way, my plan for future posts is to discuss:

What is an experimentation program?
What stakeholders should be involved in an ExPr? What level of involvement should each of these stakeholders be expected to provide?
How should these stakeholders interact to drive the ExPr forward? How often should they meet? How should they interact outside of meetings?
What is each stakeholder responsible for? How should you manage accountability?
How do you go from an idea or hypothesis to an experimentally-driven conclusion? What is the end-to-end process for running a controlled experiment?
What is required of a data science team? How should that team manage its efforts?
What infrastructure and tooling do data scientists need in order to run trustworthy experiments?
How should you measure the results of an experiment?
How do you measure the success of the experimentation program?

In the next post of this series we discuss what an experimentation program is and which stakeholders should be involved in order to maximize successful outcomes. If you’d like to be notified when I publish this series, sign up below and I’ll email you each post as I publish them.

Opinions expressed here are my own and do not express the views or opinions of my employers.

2 thoughts on “Building An Effective Experimentation Program (Experimentation Program Series: Guide 01)”

Pingback: A/B Testing Machine Learning Models (Deployment Series: Guide 08) - ML in Production
Pingback: What is an Experimentation program and Who is Involved? (Experimentation Program Series: Guide 02) - ML in Production

Building an Experimentation Program from Scratch

This Blog Series is a Summary of My Experiences

The Plan for this Series

2 thoughts on “Building An Effective Experimentation Program (Experimentation Program Series: Guide 01)”

Leave a Reply Cancel reply