Foreword

If you’re a beginning data scientist, or want to be one, Practical Data Science with R (PDSwR) is the place to start. If you’re already doing data science, PDSwR will fill in gaps in your knowledge and even give you a fresh look at tools you use on a daily basis—it did for me.

While there are many excellent books on statistics and modeling with R, and a few good management books on applying data science in your organization, this book is unique in that it combines solid technical content with practical, down-to-earth advice on how to practice the craft. I would expect no less from Nina and John.

I first met John when he presented at an early Bay Area R Users Group about his joys and frustrations with R. Since then, Nina, John, and I have collaborated on a couple of projects for my former employer. And John has presented early ideas from PDSwR—both to the “big” group and our Berkeley R-Beginners meetup. Based on his experience as a practicing data scientist, John is outspoken and has strong views about how to do things. PDSwR reflects Nina and John’s definite views on how to do data science—what tools to use, the process to follow, the important methods, and the importance of interpersonal communications. There are no ambiguities in PDSwR.

This, as far as I’m concerned, is perfectly fine, especially since I agree with 98% of their views. (My only quibble is around SQL—but that’s more an issue of my upbringing than of disagreement.) What their unambiguous writing means is that you can focus on the craft and art of data science and not be distracted by choices of which tools and methods to use. This precision is what makes PDSwR practical. Let’s look at some specifics.

Practical tool set: R is a given. In addition, RStudio is the IDE of choice; I’ve been using RStudio since it came out. It has evolved into a remarkable tool—integrated debugging is in the latest version. The third major tool choice in PDSwR is Hadley Wickham’s ggplot2. While R has traditionally included excellent graphics and visualization tools, ggplot2 takes R visualization to the next level. (My practical hint: take a close look at any of Hadley’s R packages, or those of his students.) In addition to those main tools, PDSwR introduces necessary secondary tools: a proper SQL DBMS for larger datasets; Git and GitHub for source code version control; and knitr for documentation generation.

Practical datasets: The only way to learn data science is by doing it. There’s a big leap from the typical teaching datasets to the real world. PDSwR strikes a good balance between the need for a practical (simple) dataset for learning and the messiness of the real world. PDSwR walks you through how to explore a new dataset to find problems in the data, cleaning and transforming when necessary.

Practical human relations: Data science is all about solving real-world problems for your client—either as a consultant or within your organization. In either case, you’ll work with a multifaceted group of people, each with their own motivations, skills, and responsibilities. As practicing consultants, Nina and John understand this well. PDSwR is unique in stressing the importance of understanding these roles while working through your data science project.

Practical modeling: The bulk of PDSwR is about modeling, starting with an excellent overview of the modeling process, including how to pick the modeling method to use and, when done, gauge the model’s quality. The book walks you through the most practical modeling methods you’re likely to need. The theory behind each method is intuitively explained. A specific example is worked through—the code and data are available on the authors’ GitHub site. Most importantly, tricks and traps are covered. Each section ends with practical takeaways.

In short, Practical Data Science with R is a unique and important addition to any data scientist’s library.

JIM PORZAK

SENIOR DATA SCIENTIST AND COFOUNDER OF THE BAY AREA R USERS GROUP

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset