Chapter 21
In This Chapter
Getting the lowdown on essential learning resources at Data Science Weekly
Finding resources at U Climb Higher
Learning about data mining and data science with KDnuggets
Locating an obscure resource on Data Science Central
Educating yourself about open source data science through Masters
Obtaining a free education with Quora
Discovering answers for advanced topics at Conductrics
Reading the Aspirational Data Scientist blog posts
Discovering data intelligence and analytics resources at AnalyticBridge
Getting the developer resource you need with Jonathan Bower
In reading this book, you discover quite a lot about data science and Python. Before your head explodes from all the new knowledge you gain, it’s important to realize that this book is really just the tip of the iceberg. Yes, there really is more information available out there, and that’s what this chapter is all about. The following sections introduce you to a wealth of data science resource collections that you really need to make the best use of your new knowledge.
In this case, a resource collection is simply a listing of really cool links with some text to tell you why they’re so great. In some cases, you gain access to articles about data science; in other cases, you’re exposed to new tools. In fact, data science is such a huge topic that you could easily find more resources than those discussed here, but the following sections provide a good place to start.
The Data Science Weekly is a free newsletter that you can sign up for to obtain the latest information about changes in data science. However, for this chapter, the most important element is the list of resources you find at http://www.datascienceweekly.org/data-science-resources
. The resources cover the following broad range of topics:
Even with the right connections online and a good search engine, trying to find just the right resource can be hard. U Climb Higher has published a list of 24 data science resources at http://blog.udacity.com/2014/12/24-data-science-resources-keep-finger-pulse.html
that’s guaranteed to help keep your finger on the pulse of new strategies and technologies. This resource broaches the following topics: trends and happenings; places to learn more about data science; joining a community; data science news; people who really know data science well; all the latest research
Learning about data mining and data science is a process. KDnuggets breaks the learning process down into a series of steps at http://www.kdnuggets.com/faq/learning-data-mining-data-science.html
. Each step provides you with an overview of what you should be doing and why. You also find links to a variety of resources online to make the learning process considerably easier. Even though the site emphasizes the use of R, Python (clicking the Getting Started With Python For Data Science link shows that even Kaggle prefers Python 2.7), and SQL (in that order) to perform data science tasks, the steps will actually work for any of a number of approaches that you might take.
Many of the resources you find online cover mainstream topics. Data Science Central (http://www.datasciencecentral.com/
) provides access to a relatively large number of data science experts that will tell you about the most obscure facts of data science. One of the more interesting blog posts appears at http://www.datasciencecentral.com/profiles/blogs/huge-trello-list-of-great-data-science-resources
.
This resource points you to a Trello list (https://trello.com/
) of some truly amazing resources. Navigating the huge list can be a bit difficult, but the process is aided by the treelike structure that Trello provides for organizing information. You want to meander through this sort of list when you have time and simply want to see what is available. The categories include the following (with possibly more by the time you read this book):
Many organizations now focus on open source for data science solutions. The focus has become so prevalent that you can now get an Open-Source Data Science Masters (OSDSM) education at http://datasciencemasters.org/
. The emphasis is on providing you with the materials that are normally lacking from a purely academic education. In other words, the site provides pointers to courses that fill in gaps in your education so that you become more marketable in today’s computing environment. The various links provide you with access to online courses, books, and other resources that help you gain a better understanding of just how OSDSM works.
It’s really hard to resist the word free, especially when it comes to education, which normally costs many thousands of dollars. The Quora site at http://www.quora.com/What-are-the-best-free-resources-to-learn-data-science
provides a listing of the best nonpaid learning resource for data science.
Most of the links take on a question format, such as, “How do I become a data scientist?” The question-and-answer format is helpful because you might be asking the questions that the site answers. The resulting list of sites, courses, and resources are introductory, for the most part, but they are a good way to get started working in the data science field.
The Conductrics site (http://conductrics.com/
) as a whole is devoted to selling products that help you perform various data science tasks. However, the site includes a blog that contains a couple of useful blog posts that answer the sorts of advanced questions that you might find it difficult to answer elsewhere. The two posts appear at http://conductrics.com/data-science-resources/
and http://conductrics.com/data-science-resources-2
.
The author of the blog posts, Matt Gershoff, makes it clear that the listings are the result of answering people’s questions in the past. The list is huge, which is why it appears in two posts rather than one, so Matt must answer many questions. The list focuses mostly on machine learning rather than hardware or specific coding issues. Therefore, you can expect to see entries for topics such as Latent Semantic Indexing (LSI); Single Value Decomposition (SVD); Linear Discriminant Analysis (LDA); non-parametric Bayesian approaches; statistical machine translation; Reinforcement Learning (RL); Temporal Difference (TD) learning; context bandits.
The Aspirational Data Scientist (http://newdatascientist.blogspot.com/
) blog site provides you with an amazing array of essays on various data science topics. The author splits the posts into these areas: data science commentary; online course reviews; becoming a data scientist.
Data science attracts practitioners from all sorts of existing fields. The site seems mainly devoted to serving the needs of social scientists moving into the data science field. In fact, the most interesting post that appears at http://newdatascientist.blogspot.com/p/useful-links.html
provides a listing of resources to help the social scientist move into the data scientist field. The list of resources is organized by author, so you may find names that you already recognize as potential informational resources.
The AnanlyticBridge site (http://www.analyticbridge.com/
) contains an amazing array of helpful resources for the data scientist. One of the more helpful resources is the list of data intelligence and analytics resources at http://www.analyticbridge.com/page/links
. This page contains a wealth of resources you won’t find anywhere else that are organized into the following categories: general resources; big data; visualization; best and worst of data science; new analytics startup ideas; rants about healthcare, education, and other topics; career stuff, training, and salary surveys; miscellaneous.
More than a few interesting resources appear on GitHub (https://github.com/
), a site devoted to collaboration, code review, and code management. One of the sites you need to check out is Jonathan Bower’s listing of data science resources at https://github.com/jonathan-bower/DataScienceResources
. The majority of these resources will appeal to the developer, but just about anyone can benefit from them. You find resources categorized into the following topics:
The hierarchical formatting of the various topics makes finding just what you need easier. Each major category divides into a list of topics. Within each topic, you find a list of resources that apply to that topic. For example, within Data Pipeline & Tools, you find Python, which includes a link for Anyone Can Code. This is one of the most usable sites in the list.