Introduction

Welcome to R For Dummies, the book that takes the steepness out of the learning curve for using R.

We can’t guarantee that you’ll be a guru if you read this book, but you should be able to do the following:

check.png Perform data analysis by using a variety of powerful tools

check.png Use the power of R to do statistical analysis and other data-processing tasks

check.png Appreciate the beauty of using vector-based operations rather than loops to do speedy calculations

check.png Appreciate the meaning of the following line of code:

knowledge <- apply(theory, 1, sum)

check.png Know how to find, download, and use code that has been contributed to R by its very active community of developers

check.png Know where to find extra help and resources to take your R coding skills to the next level

check.png Create beautiful graphs and visualizations of your data

About This Book

R For Dummies is an introduction to the statistical programming language known as R. We start by introducing the interface and work our way from the very basic concepts of the language through more sophisticated data manipulation and analysis.

We illustrate every step with easy-to-follow examples. This book contains numerous code snippets, several write-it-yourself functions you can use later on, and complete analysis scripts. All these are for you to try out yourself.

We don’t attempt to give a technical description of how R is programmed internally, but we do focus as much on the why as on the how. R doesn’t function as your average scripting language, and it has plenty of unique features that may seem surprising at first. Instead of just telling you how you have to talk to R, we believe it’s important for us to explain how the R engine reads what you tell it to do. After reading this book, you should be able to manipulate your data in the form you want and understand how to use functions we didn’t cover in the book (as well as the ones we do cover).

This book is a reference. You don’t have to read it from beginning to end. Instead, you can use the table of contents and index to find the information you need. In each chapter, we cross-reference other chapters where you can find more information.

Conventions Used in This Book

Code snippets appear like this example, where we simulate 1 million throws of two six-sided dice:

> set.seed(42)

> throws <- 1e6

> dice <- sapply(1:2,

+     function(x)sample(1:6, throws, replace=TRUE)

+ )

> table(rowSums(dice))

     2      3      4      5      6      7      8

28007  55443  83382 110359 138801 167130 138808

     9     10     11     12

110920  83389  55816  27945

Each line of R code in this example is preceded by one of two symbols:

check.png >: The prompt symbol, >, is not part of your code, and you should not type this when you try the code yourself.

check.png +: The continuation symbol, +, indicates that this line of code still belongs to the previous line of code. In fact, you don’t have to break a line of code into two, but we do this frequently, because it improves the readability of code and helps it fit into the pages of a book.

The lines that don’t start with either the prompt symbol or the continuation symbol are the output produced by R. In this case, you get the total number of throws where the dice added up to the numbers 2 through 12. For example, out of 1 million throws of the dice, on 28,007 occasions, the numbers on the dice added to 2.

You can copy these code snippets and run them in R, but you have to type them exactly as shown. There are only three exceptions:

check.png Don’t type the prompt symbol, >.

check.png Don’t type the continuation symbol, +.

check.png Where you put spaces or tabs isn’t critical, as long as it isn’t in the middle of a keyword. Pay attention to new lines, though.

You get to write some R code in every chapter of the book. Because much of your interaction with the R interpreter is most likely going to be in interactive mode, you need a way of distinguishing between your code and the results of your code. When there is an instruction to type some text into the R console, you’ll see a little > symbol to the left of the text, like this:

> print(“Hello world!”)

If you type this into a console and press Enter, R responds with the following:

[1] “Hello world!”

For convenience, we collapse these two events into a single block, like this:

> print(“Hello world!”)

[1] “Hello world!”

This indicates that you type some code (print(“Hello world!”)) into the console and R responds with [1] “Hello world!”.

Finally, many R words are directly derived from English words. To avoid confusion in the text of this book, R functions, arguments, and keywords appear in monofont. For example, to create a plot, you use the plot() function in R. When talking about functions, the function name will always be followed by open and closed parentheses — for example, plot(). We refrain from adding arguments to the function names mentioned in the text, unless it’s really important.

On some occasions we talk about menu commands, such as File⇒Save. This just means that you open the File menu and choose the Save option.

What You’re Not to Read

You can use this book however works best for you, but if you’re pressed for time (or just not interested in the nitty-gritty details), you can safely skip anything marked with a Technical Stuff icon. You also can skip sidebars (text in gray boxes); they contain interesting information, but nothing critical to your understanding of the subject at hand.

Foolish Assumptions

This book makes the following assumptions about you and your computer:

check.png You know your way around a computer. You know how to download and install software. You know how to find information on the Internet and you have Internet access.

check.png You’re not necessarily a programmer. If you are a programmer, and you’re used to coding in other languages, you may want to read the notes marked by the Technical Stuff icon — there, we fill you in on how R is similar to, or different from, other common languages.

check.png You’re not a statistician, but you understand the very basics of statistics. R For Dummies isn’t a statistics book, although we do show you how to do some basic statistics using R. If you want to understand the statistical stuff in more depth, we recommend Statistics For Dummies, 2nd Edition, by Deborah J. Rumsey, PhD (Wiley).

check.png You want to explore new stuff. You like to solve problems and aren’t afraid of trying things out in the R console.

How This Book Is Organized

The book is organized in six parts. Here’s what each of the six parts covers.

Part I: R You Ready?

In this part, we introduce you to R and show you how to write your first script. You get to use the very powerful concept of vectors to make simultaneous calculations on many variables at once. You get to work with the R workspace (in other words, how to create, modify, or remove variables). You find out how save your work and retrieve and modify script files that you wrote in previous sessions. We also introduce some fundamentals of R (for example, how to extend functionality by installing packages).

Part II: Getting Down to Work in R

In this part, we fill you in on the three R’s: reading, ’riting, and ’rithmetic — in other words, working with text and number (and dates for good measure). You also get to use the very important data structures of lists and data frames.

Part III: Coding in R

R is a programming language, so you need to know how to write and understand functions. In this part, we show you how to do this, as well as how to control the logic flow of your scripts by making choices using if statements, as well as looping through your code to perform repetitive actions. We explain how to make sense of and deal with warnings and errors that you may experience in your code. Finally, we show you some tools to debug any issues that you may experience.

Part IV: Making the Data Talk

In this part, we introduce the different data structures that you can use in R, such as lists and data frames. You find out how to get your data in and out of R (for example, by reading data from files or the Clipboard). You also see how to interact with other applications, such as Microsoft Excel.

Then you discover how easy it is to do some advanced data reshaping and manipulation in R. We show you how to select a subset of your data and how to sort and order it. We explain how to merge different datasets based on columns they may have in common. Finally, we show you a very powerful generic strategy of splitting and combining data and applying functions over subsets of your data. When you understand this strategy, you can use it over and over again to do sophisticated data analyses in only a few small steps.

We’re just itching to show you how to do some statistical analysis using R. This is the heritage of R, after all. But we promise to keep it simple. After reading this part, you’ll know how to describe and summarize your variables and data using R. You’ll be able to do some classical tests (for example, calculating a t-test). And you’ll know how to use random numbers to simulate some distributions.

Finally, we show you some of the basics of using linear models (for example, linear regression and analysis of variance). We also show you how to use R to predict the values of new data using some models that you’ve fitted to your data.

Part V: Working with Graphics

They say that a picture is worth a thousand words. This is certainly the case when you want to share your results with other people. In this part, you discover how to create basic and more sophisticated plots to visualize your data. We move on from bar charts and line charts, and show you how to present cuts of your data using facets.

Part VI: The Part of Tens

In this part, we show you how to do ten things in R that you probably use Microsoft Excel for at the moment (for example, how to do the equivalent of pivot tables and lookup tables). We also give you ten tips for working with packages that are not part of base R.

Icons Used in This Book

As you read this book, you’ll find little pictures in the margins. These pictures, or icons, mark certain types of text:

tip.eps When you see the Tip icon, you can be sure to find a way to do something more easily or quickly.

remember.eps You don’t have to memorize this book, but the Remember icon points out some useful things that you really should remember. Usually this indicates a design pattern or idiom that you’ll encounter in more than one chapter.

warning_bomb.eps When you see the Warning icon, listen up. It points out something you definitely don’t want to do. Although it’s really unlikely that using R will cause something disastrous to happen, we use the Warning icon to alert you if something is bound to lead to confusion.

technicalstuff.eps The Technical Stuff icon indicates technical information you can merrily skip over. We do our best to make this information as interesting and relevant as possible, but if you’re short on time or you just want the information you absolutely need to know, you can move on by.

Where to Go from Here

There’s only one way to learn R: Use it! In this book, we try to make you familiar with the usage of R, but you’ll have to sit down at your PC and start playing around with it yourself. Crack the book open so the pages don’t flip by themselves, and start hitting the keyboard!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset