Enabling Machine-Learning-as-a-Service Through Privacy Preserving Machine Learning

Companies are investing significant resources into AI development to generate business value. For the most part, this development has focused on using machine learning to optimize operations or to build and improve consumer facing applications. Recently, some organizations have begun to offer access to the trained ML models themselves as a commercial service. Dubbed Machine Learning as a Service (MLaaS), these offerings enable companies to profit directly from their proprietary data and internal AI capabilities.

While MLaaS provides a compelling business model for buyers and sellers, significant issues prevent the rise of MLaaS, including privacy and regulatory risks, exposure of intellectual property (IP), and infrastructure challenges. Here we describe these challenges and introduce decentriq, a company making data collaboration simple and safe by leveraging confidential computing.

The Business of Machine Learning as a Service

Machine learning as a service is a term that describes a wide range of cloud-based software tools that perform capabilities such as data preprocessing, model training, and inference. These tools serve a variety of audiences including non-technical users, software engineers lacking ML knowledge, as well as data scientists and ML engineers. These different users interact with ML services through interfaces that support their level of technical competency. Business users might upload data through browser-based UIs, while engineers and data scientists exchange data programmatically through REST APIs.

This ability to exchange raw data for AI-generated insights provides the basis for an attractive new business model. Organizations with access to large, diverse datasets and data science competency can offer access to a suite of internally trained ML models. Large tech companies like Amazon, Microsoft, Google and IBM each provide a catalog of high-level MLaaS APIs for use cases like text recognition, language translation, sentiment analysis, and other natural language processing (NLP) and computer vision (CV) related functionality.

Since few companies have the technical capabilities, both in terms of training datasets and AI skills, to train these models, MLaaS offers clients improved access to advanced analytics capabilities at low cost. Utilizing these services, i.e. paying for access to 3rd party models, provides these companies a method to strategically incorporate AI into their own products and services through buying rather than building.

Although MLaaS offers a compelling value proposition for both sellers and buyers, we’ve seen few machine learning services emerge outside of major tech companies. One reason for this is the density of ML talent concentrated at companies like Google and Amazon. Another is the access these companies have to huge datasets from their consumers.

But while these players have released useful tools in NLP and computer vision, we’re yet to see more specialized models. While one can argue that these companies are targeting low hanging fruit with large addressable markets, I argue that other challenges exist that prevent more valuable machine learning services from emerging.

Challenges of MLaaS

MLaaS hasn’t emerged outside of the tech giants for several reasons including privacy and regulatory risks, inconsistent and diverse infrastructure, and the cost of model maintenance.

Privacy-wise, organizations with access to large, diverse datasets can train highly predictive models. But for deployment, these companies either aren’t willing to provide unfettered access to those models for fear of revealing valuable intellectual property, or it is difficult to deploy them appropriately with their clients due to limited infrastructure.. To maintain model governance and minimize integration friction, these organizations prefer to host models on their own private infrastructure.

But this preference leads to privacy hurdles for clients seeking to access the models for predictions. If a model is hosted on the owner’s infrastructure, clients are forced to transmit their sensitive data to that system. In many cases this is a regulatory impossibility. If model owners decide to host their models on client premises, then they must deal with deploying on diverse client infrastructure. And since models must be periodically retrained to remain accurate, owners must also deal with updating and maintaining these models.

This leads to two unsatisfactory options:

Option 1: The model owner deploys the model on its own premises. This puts the burden of privacy risk on the client by forcing the client to transmit its sensitive data.
Option 2: The model owner can deploy on the client’s infrastructure. This forces the owner to risk its IP, install its technology on diverse client infrastructure, and bear the cost of updating models on external systems.

Enabling Machine Learning Services through Confidential Inference

The most effective way to solve these privacy challenges is to keep all data confidential. For instance, suppose that a business could expose its trained models to clients and maintain its IP by preventing inspection of those models. Further, imagine that clients could use those models and be guaranteed that their input data can be observed by the model performing inference and not by the model’s owner. If models and data could be securely and reliably shared in this way, then fundamentally new ways for model owners to utilize and monetize their models would emerge that protect both their valuable IP and their users’ sensitive data.

decentriq, a startup based in Switzerland, is tackling this challenge head on through its avato platform. The avato platform is a cloud-based software solution that enables organizations to run ML inference with sensitive input data on public cloud infrastructure without exposing the model or the input data to any party, including decentriq and any infrastructure provider. The avato software operates on Intel Software Guard Extension (SGX) which uses hardware-level memory protection to encrypt data in-use from anyone who wants to “see” inside.

decentriq’s core thesis is that confidential computing – i.e. computation on data while provably keeping the data confidential – can unlock upwards of $3 trillion annually by unlocking new business models for companies and bringing new value to users in a completely privacy-preserving way.

“In some sense, data is not the new oil,” says David Sturzenegger, Product Manager at decentriq, “Once I’ve used a liter of oil, I need to buy more. Data however can be reused indefinitely, diminishing the incentive to refine unstructured information into valuable and shareable datasets. Confidential computing enables owners to share data in a use-case specific, privacy-preserving way. We believe that this has the potential to revolutionize the data economy.”

Use Case: Keeping Medical Data Confidential

The avato platform can enable MLaaS use-cases that are currently impossible due to privacy and ethical issues. As an example, consider a pharmaceutical company that’s developed a model to predict health outcomes from individual patient data.

The pharmaceutical company wishes to license the model to hospitals but can’t risk exposing its IP by running the models on hospital infrastructure. Similarly, hospitals may wish to use the model for inference, but aren’t willing to transfer patient data outside of their systems due to legitimate privacy and regulation concerns.

With avato, the private data of both the pharmaceutical company and the hospital is kept confidential from both parties, decentriq, and the cloud infrastructure provider hosting the model. Using a Python API, the pharmaceutical company can host its model in the cloud, provision access, and scale up according to demand. The hospital can leverage the predictive power of the model while preserving the privacy of its patients’ sensitive data.

Recently I had the opportunity to test out decentriq’s confidential inference capability. The company provides a Python SDK that allows model publishers to upload their models. Clients use the same client to upload their data securely and retrieve predictions. Using the SDK was a seamless experience. Installation was quick and easy (think pip install). Then I uploaded a trained Tensorflow model in one process and performed inference in a separate process. Check out this YouTube video for a demo of the capability.

Confidential Inference is Just the Beginning

Although we’ve discussed the benefits of confidential inference, it’s important to note that privacy-preserving model training also has the potential to generate massive value.

Google has published several papers over the last few years on federated learning, a distributed ML approach that enables distributed training on decentralized data residing on devices like mobile phones. With federated learning all the training data remains on the device, and no individual updates are stored in the cloud, which leads to lower latency and less power consumption while ensuring privacy.

Nvidia claims that federated learning could transform healthcare by enabling large volumes of diverse data from across different organizations to be included in model development, while complying with local governance of the clinical data.

decentriq believes that its confidential computing approach to machine learning inference can also transform model training. While the company is currently focused on the inference use case, it plans to tackle model training next. “This is just the start. Any device capable of basic crypto can verify the confidentiality proofs. We want to help move more sensitive data computation from edge devices to the cloud and extend it to general confidential services,” says Nikolas Molyndris, a Product Manager at decentriq.

David Sturzenegger continues, “Our mission is to make data collaborations simple and safe. We want to dramatically improve data access for data scientists and engineers. Confidential training with SGX is on its way, bringing many advantages over the known federated learning approaches, not the least time-to-deployment.”

Conclusion

While machine learning and AI promises to enable entirely new products, services, and industries, many regulatory and privacy hurdles exist. These issues will continue to prevent MLaaS offerings from proliferating. decentriq is working to solve these challenges by enabling privacy preserving machine learning through its avato platform. What remains to be seen is just how effective the AI can become when the data and models are hidden from end users. If decentriq is successful, they will have a hand in unleashing tremendous amounts of new business value.

4 thoughts on “Enabling Machine-Learning-as-a-Service Through Privacy Preserving Machine Learning”

Timothy says:

May 9, 2020 at 12:26 am

Oh my goodness! Incredible article dude! Many thanks, However I am encountering difficulties with your RSS. I don’t understand why I am unable to subscribe to it. Is there anybody getting the same RSS problems? Anyone that knows the solution will you kindly respond? Thanks!!|

1. Luigi says:
  
  May 12, 2020 at 6:19 pm
  
  Thanks, Timothy! To subscribe just fill out the form at the bottom of the article.
  
Adelaida says:

June 26, 2020 at 4:55 pm

Do you have any video of that? I’d love to find out more details.

1. Luigi says:
  
  July 19, 2020 at 7:59 am
  
  Hi there. Video of what exactly?