Web resources and competitions

In this section, we'll review where to find additional resources for learning, discussing, presenting, or sharpening our data science skills.


One of the most well-known repositories of machine learning datasets is hosted by the University of California, Irvine. The UCI repository contains over 300 datasets covering a wide variety of challenges, including poker, movies, wine quality, activity recognition, stocks, taxi service trajectories, advertisements, and many others. Each dataset is usually equipped with a research paper where the dataset was used, which can give you a hint on how to start and what is the prediction baseline.

The UCI machine learning repository can be accessed at https://archive.ics.uci.edu, as follows:


Another well-maintained collection by Xiaming Chen is hosted on GitHub:


The Awesome Public Datasets repository maintains links to more than 400 data sources from a variety of domains, ranging from agriculture, biology, economics, psychology, museums, and transportation. Datasets, specifically targeting machine learning, are collected under the image processing, machine learning, and data challenges sections.

Online courses

Learning how to become a data scientist has became much more accessible due to the availability of online courses. The following is a list of free resources to learn different skills online:


The best way to sharpen your knowledge is to work on real problems; and if you want to build a proven portfolio of your projects, machine learning competitions are a viable place to start:

  • Kaggle: This is the number one competition platform, hosting a wide variety of challenges with large prizes, strong data science community, and lots of helpful resources. You can check it out at https://www.kaggle.com/.
  • CrowdANALYTIX: This is a crowdsourced data analytics service that is focused on the life sciences and financial services industries at https://www.crowdanalytix.com.
  • DrivenData: This hosts data science competitions for social good at http://www.drivendata.org/.

Websites and blogs

In addition to online courses and competitions, there are numerous websites and blogs publishing the latest developments in the data science community, their experience in attacking different problems, or just best practices. Some good starting points are as follows:

Venues and conferences

The following are a few top-tier academic conferences with the latest algorithms:

  • Knowledge Discovery in Databases (KDD)
  • Computer Vision and Pattern Recognition (CVPR)
  • Annual Conference on Neural Information Processing Systems (NIPS)
  • International Conference on Machine Learning (ICML)
  • IEEE International Conference on Data Mining (ICDM)
  • International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp)
  • International Joint Conference on Artificial Intelligence (IJCAI)

Some business conferences are as follows:

  • O'Reilly Strata Conference
  • The Strata + Hadoop World Conferences
  • Predictive Analytics World
  • MLconf

You can also check local meetup groups.

