Chapter 1.  Quick Start

Data analysis is difficult without the proper tools. It is almost impossible to extract patterns directly from a large set of numbers aligned in rows and columns and draw any conclusion, even for experts. A suitable tool, such as R, will remarkably boost your productivity in working with data. From my experience, learning a programming language is somehow like learning a human language. It is probably not a good idea to jump right into the details of vocabulary and grammar before looking at the big picture, getting motivated, and starting small. This chapter gives you a quick start by taking an overview of the R programming language in depth.

In this chapter, we will cover the following topics:

  • Introducing R
  • The need for R
  • Installing R
  • Tools required to write R code

As soon as the software and tools are ready to go, you will write a simple R program to experience how it basically works. Once this is done, the R journey will unfold from the basics to advanced techniques and applications.

Introducing R

R is a powerful programming language and environment for statistical computing, data exploration, analysis, and visualization. It is free, open source, and has a strong, rapidly growing community where users and developers share their experience and actively contribute to the development of more than 7,500 packages, so that R can deal with problems in a wide range of fields (refer to https://cran.r-project.org/web/views/).

Although the origin of the R programming language dates back to 1993, its general adoption in R programming language data-related research industry has grown rapidly in the last decade and has become the lingua franca of data science.

In general, R should be viewed as more than just a programming language; it is a comprehensive computing environment, a strong and active community, and a rapidly growing and expanding ecosystem.

R as a programming language

R, as a programming language, has been evolving and developing over the last 20 years. Its goal is quite clear to make it easy and flexible to perform comprehensive statistical computing, data exploration, and visualization.

However, ease of use and flexibility usually create conflicts. It can be very easy to click a few buttons to finish a variety of tasks in statistical analysis, but it won't be flexible if you need customization, automation, and your work needs to be reproducible. It can be very flexible to use tens of functions to transform data and make complicated graphics, but it won't be easy to learn and combine these functions correctly. R stands out for its well-positioned balance.

R as a computing environment

R, as a computing environment, is lightweight and ready to use. Compared to some other famous statistical software, for example, Matlab and SAS, R is much smaller and easier to deploy.

In this book, we will use RStudio to handle almost all our work in R. This integrated development environment provides rich features such as syntax-highlighting, auto-completion, package management, graphics viewer, help viewer, environment viewer, and debugging. These features hugely boost your productivity.

R as a community

R, as a community, is strong and active. You can visit Try R (http://tryr.codeschool.com/) immediately and get a first impression of R basics through an interactive tutorial. In practice, when you are coding, you probably won't solve every problem by yourself. You may google an R question and find that it almost always has answers in StackOverflow (http://stackoverflow.com/questions/tagged/r). If your question is not fully addressed, you can ask it and probably get an answer in a couple of minutes.

If you need to use a package but also want to see how it works in detail, you can visit the source code at its online repository (or repo). Many repos are hosted by GitHub (https://www.github.com). In GitHub, you can do much more. When you find that a package is not working correctly, you can report a bug by filing an issue on the problem. If you need a feature that fits the purpose of the package, you can request a feature also by filing an issue for your demand. If you are interested in contributing to the package by resolving bugs and implementing features, you can fork the project, edit the code, and send merge requests so that your changes can be accepted by the owner. If your changes are accepted, congratulations, you have become a contributor to the package! Amazingly, R and its thousands of packages are built by contributors all over the world.

R as an ecosystem

R, as an ecosystem, is rapidly growing and expanding in all data-related areas beyond the IT industry. The majority of its users are not professional developers but data analysts and statisticians. These users may not write the best-quality code, but they may contribute cutting-edge tools to the ecosystem in R language, and everyone else has free access to these tools without having to reinvent the wheels.

For example, let's say an econometrician writes an extension package that includes a new method to detect a category of time series patterns; it may attract several users who find it interesting and useful. Some professional users may improve the original code to make it faster and more general-purpose. A while later, a quantitative investor may find it helpful to incorporate this method into a trading strategy because it can detect patterns that usually causes risks in his/her portfolio. At the end of the day, the econometrician's tool is applied in a real-world industry, and the investor finds the portfolio less risky.

That is how the ecosystem works. And that is one of the reasons why R rocks in these areas: it has the ability to quickly adapt cutting-edge knowledge outside the IT industry (usually data science, Academia, and Industry) to generally available and applicable tools in the ecosystem. In other words, it facilitates conversion from the field knowledge and data science to productivity and value.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset