Preface

Mango Solutions has been teaching face-to-face R training courses to business professionals and academics alike for over 13 years. In this time, we’ve seen R grow from its early days as a low cost alternative to S-PLUS and SAS to become the leading analytical programming language in the world today, with several thousand contributors and somewhere upward of a million users. R is widely used throughout academia and is commercially supported by the likes of Microsoft, Google, HP, and Oracle.

In Mango’s face-to-face training program we teach R to statisticians, data scientists, physicists, biologists, chemists, geographers, and psychologists among others. All are looking to R to help improve the way they analyze their data in a professional environment. Our aim with this book was to take tried and tested training material and turn it into a lasting resource for anyone looking to learn R for analysis.

Who Should Read This Book?

This book is designed for professional statisticians, data scientists, and analysts looking to widen the scope of analytical tools available to them by learning R. Although it is expected that you might have some programming knowledge in another analytical application or language for data analysis, such as SAS, Python, or Excel/VBA, this is not a prerequisite. This book is suitable for complete novices in programming. From the start, we do not assume any prior knowledge of R; however, those familiar with the basics may find that they can jump straight to later chapters.

What Should You Expect from This Book?

This book is designed to take you from the basics of the R language through common tasks in data science, including data manipulation, visualization, and modeling, to elements of the language that will allow you to produce high-quality, production-ready code. As with our face-to-face training, this book is structured around simple and easy-to-follow examples, all of which are available to download from the book’s website (http://www.mango-solutions.com/wp/teach-yourself-r-in-24-hours-book). Throughout, we introduce good practices for writing code as well as provide tips and tricks from our combined experience in R development.

By the end of this book, you should have a good understanding of the fundamentals of R as well as many of the most commonly used packages. You should have a good understanding of what makes well written R code and how to implement this yourself.

How Is This Book Organized?

This book is designed to guide you through everything you need to know to get started with the R language and then introduce additional elements of the language for specific tasks.

The following is an outline of each of the hours and what to expect:

Hour 1, “The R Community—In this hour, we start by looking at how R evolved from the S language to become the all-purpose data science programming language that it is today. The R community offers a plethora of help and support options for users. We look at some of the better-known options during this hour.

Hour 2, “The R Environment—In this hour, we start a new R session via RStudio, type some basic commands, and explore the idea of an R “object.” You will be more formally introduced to the concept of an R package.

Hour 3, “Single-Mode Data Structures—In this hour, we describe the standard types of data found in R and introduce three key structures that can be used to store these data types: vectors, matrices, and arrays. We illustrate the ways in which these structures can be created and manage these data structures with a focus on how we can extract data from them.

Hour 4, “Multi-Mode Data Structures—The majority of data sources contain a mixture of data types, which we need to store together in a simple, effective format. In this hour, we focus on two key data structures that allow us to store “multi-mode” data: lists and data frames. We illustrate the ways in which these structures can be created and manage these data structures with a focus on how we can extract data from them. We also look at how these two data structures can be effectively used in our day-to-day work.

Hour 5, “Dates, Times, and Factors—In this hour, you learn more about some of the special data types in R that enable us to work with dates and times and with categorical data.

Hour 6, “Common R Utility Functions—In this hour, we introduce you to some of the most common utility functions in R that you will find yourself using every day.

Hour 7, “Writing Functions: Part I—One of the strengths of R is that we can extend it by writing our own functions, allowing us to create utilities that can perform a variety of tasks. In this hour, we look at ways in which we can create our own functions, specify inputs, and return results to the user. We also discuss the “if/else” structure in R and use it to control the flow of code within a function.

Hour 8, “Writing Functions: Part II—This hour looks at a range of advanced function-writing topics, such as returning error messaging, checking whether inputs are appropriate to our functions, and the use of function “ellipses.”

Hour 9, “Loops and Summaries—In this hour, you see how we can apply simple functions and code in a more “applied” fashion. This allows us to perform tasks repeatedly over sections of our data without the need to produce verbose, repetitive code.

Hour 10, “Importing and Exporting—In this hour, we introduce common methods for importing and exporting data. By the end of the hour you will have seen how R can be used to read and write flat files and connect to database management systems (DBMSs) as well as Microsoft Excel.

Hour 11, “Data Manipulation and Transformation—As data scientists and statisticians, we rarely get to control the structure and format of our data. Now we will look a little closer at the structure of our data. Several approaches to data manipulation in R have evolved over time. In this hour, we start by looking at what could be called “traditional” approaches to the data manipulation tasks of sorting, setting, and merging. We then look at the popular packages reshape, reshape2, and tidyr for data restructuring.

Hour 12, “Efficient Data Handling in R—We begin the hour by looking at the incredibly popular dplyr package. The data.table package is a standalone package for data manipulation that offers greater efficiency for very large data.

Hour 13, “Graphics—After all the manipulations to our data, we want to be able to start to do something with it. In this hour, we look at how we can create graphics using the base graphics functionality, including how to send your graphics to devices such as a PDF and the standard graphics functions. We finally look at how to control the layout of graphics on the page.

Hour 14, “The ggplot2 Package for Graphics—In this hour, we look at the hugely popular ggplot2 package, developed by Hadley Wickham for creating high-quality graphics.

Hour 15, “Lattice Graphics—Here we will look at a third way of creating graphics: using the lattice package. This graphic system is well suited to graphing highly grouped data, with the code designed to closely resemble the modeling capabilities of R.

Hour 16, “Introduction to R Models and Object Orientation—In this hour, we see how to fit a simple linear model and assess its performance using a range of textual and graphical methods. Beyond this, we introduce “object orientation” and see how the R statistical modeling framework is built on this concept.

Hour 17, “Common R Models—In this hour, we extend the ideas of the previous hour to other modeling approaches. Specifically, we look at Generalized Linear Models, nonlinear models, time series models, and survival models.

Hour 18, “Code Efficiency—In this hour, we look at some of the techniques we can use to improve the efficiency and, importantly, the professionalism of our R code.

Hour 19, “Package Building—When we put our code into a package, it forces us to ensure that our code is of a high standard and we are adhering to good practices, such as documenting our code. We focus here on making sure our code is well written and documented, the starting point for high-quality, professional code that is easy to share and reuse.

Hour 20, “Advanced Package Building—There are a number of ways we can extend a package to make it more robust to changes and easier for users to get started with. You learn the most common of these extra components in this hour.

Hour 21, “Writing R Classes—In this hour, we take a general look at some key features of object-oriented programming before focusing in on R’s S3 implementation.

Hour 22, “Formal Class Systems—During this hour, we look at the more formal S4 and Reference Class systems in R. Along the way, you will be introduced to concepts such as validity checking, multiple dispatch, message-passing object orientation, and mutable objects.

Hour 23, “Dynamic Reporting—Up to this point we have seen the fundamentals of the R language as well as the aspects of R that allow us to ensure that we write high-quality, well-documented, and easily shareable code. In this hour, we take a look at one of the ways you can extend your use of R, specifically for simplifying the generation of reports that rely heavily on R-generated output.

Hour 24, “Building Web Applications with Shiny—Although you may initially be put off by the idea of building a web application, we introduce a package that allows you to generate web applications entirely in R, writing only R code. This is currently one of the most popular packages available in R, with more and more packages being added to CRAN that use this framework.

About the Sample Code

Throughout this book, we have included examples of the concepts that are being introduced. You may notice that the code is prefixed with the symbols “>” and “+”. These are the R prompt and continuation characters and do not need to be entered when writing code. We have used the formatting conventions of function for a function name and package for a package name.

All of the code examples included in this book are available from our web page: http://www.mango-solutions.com/wp/teach-yourself-r-in-24-hours-book/


Note

Code-Continuation Arrows and Listing Line NumbersYou might see code-continuation arrows (Image) occasionally in this book to indicate when a line of code is too long to fit on the printed page. Also, some listings have line numbers and some do not. The listings that have line numbers have them so that we can reference code by line; the listings that do not have line numbers are not referenced by line.


Contacting the Authors

If you have any comments or questions about this book, please drop us an email at [email protected].

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset