In this section, we'll review where to find additional resources for learning, discussing, presenting, or sharpening our data science skills.
One of the most well-known repositories of machine learning datasets is hosted by the University of California, Irvine. The UCI repository contains over 300 datasets covering a wide variety of challenges, including poker, movies, wine quality, activity recognition, stocks, taxi service trajectories, advertisements, and many others. Each dataset is usually equipped with a research paper where the dataset was used, which can give you a hint on how to start and what is the prediction baseline.
The UCI machine learning repository can be accessed at https://archive.ics.uci.edu, as follows:
Another well-maintained collection by Xiaming Chen is hosted on GitHub:
https://github.com/caesar0301/awesome-public-datasets
The Awesome Public Datasets repository maintains links to more than 400 data sources from a variety of domains, ranging from agriculture, biology, economics, psychology, museums, and transportation. Datasets, specifically targeting machine learning, are collected under the image processing, machine learning, and data challenges sections.
Learning how to become a data scientist has became much more accessible due to the availability of online courses. The following is a list of free resources to learn different skills online:
The best way to sharpen your knowledge is to work on real problems; and if you want to build a proven portfolio of your projects, machine learning competitions are a viable place to start:
In addition to online courses and competitions, there are numerous websites and blogs publishing the latest developments in the data science community, their experience in attacking different problems, or just best practices. Some good starting points are as follows:
The following are a few top-tier academic conferences with the latest algorithms:
Some business conferences are as follows:
You can also check local meetup groups.