Machine learning in the cloud

Setting up a complete machine learning stack that is able to scale with the increasing amount of data could be challenging. Recent wave of Software as a Service (SaaS) and Infrastructure as a Service (IaaS) paradigm was spilled over to machine learning domain as well. The trend today is to move the actual data preprocessing, modeling, and prediction to cloud environments and focus on modeling task only.

In this section, we'll review some of the promising services offering algorithms, predictive models already train in specific domain, and environments empowering collaborative workflows in data science teams.

Machine learning as a service

The first category is algorithms as a service, where you are provided with an API or even graphical user interface to connect pre-programmed components of data science pipeline together:

  • Google Prediction API was one of the first companies that introduced prediction services through its web API. The service is integrated with Google Cloud Storage serving as data storage. The user can build a model and call an API to get predictions.
  • BigML implements a user-friendly graphical interface, supports many storage providers (for instance, Amazon S3) and offers a wide variety of data processing tools, algorithms, and powerful visualizations.
  • Microsoft Azure Machine Learning provides a large library of machine learning algorithms and data processing functions, as well as graphical user interface, to connect these components to an application. Additionally, it offers a fully-managed service that you can use to deploy your predictive models as ready-to-consume web services.
  • Amazon Machine Learning entered the market quite late. It's main strength is seamless integration with other Amazon services, while the number of algorithms and user interface needs further improvements.
  • IBM Watson Analytics focuses on providing models that are already hand-crafted to a particular domain such as speech recognition, machine translations, and anomaly detection. It targets a wide range of industries by solving specific use cases.
  • Prediction.IO is a self-hosted open source platform, providing the full stack from data storage to modeling to serving the predictions. Prediciton.IO can talk to Apache Spark to leverage its learning algorithms. In addition, it is shipped with a wide variety of models targeting specific domains, for instance, recommender system, churn prediction, and others.

Predictive API is an emerging new field, so these are just some of the well-known examples; KDnuggets compiled a list of 50 machine learning APIs at


To learn more about it, you can visit PAPI, the International Conference on Predictive APIs and Apps at or take a look at a book by Louis Dorard, Bootstrapping Machine Learning (L. Dorard, 2014).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.