Chapter 7. Importing and Exporting Data from Various DBs

In this final chapter, we are going to see how data from different sources can be loaded into R for performing the data analytics operations. Here, we have considered some of the popular databases that are being used as data storage, required for performing data analytics with different applications and technologies. As we know, performing the analytics operations with R is quite easy as compared to the other analytics tools and again, it's free and open source. Since, R has available methods to use customized functions via installing R packages, many database packages are available in CRAN to perform database connection with R. Therefore, the R programming language is becoming more and more popular due to database, as well as operating system, independence.

We have specially designed this chapter to share knowledge of how data from various database systems can be loaded and used into R for performing data modeling. In this chapter, we have included several popular database examples for performing various DB operations.

We have covered various data sources that are popular and are used with R. They are as follows:

  • RData
  • MySQL
  • Excel
  • MongoDB
  • SQLite
  • PostgreSQL
  • Hive
  • HBase
Importing and Exporting Data from Various DBs

From the preceding diagram, we can understand that R is supported with several database systems to perform data analytics related operations over various databases. Since there are a large number of libraries available for R to perform the connection with various DBs, we just need to inherit them.

In the following table, the possible database systems and the related R packages are given for easy understanding of the related R packages:

Database system name

Useful R packages / function utilities

Text files

Text data files such as .csv, .txt, and .r

MySQL

RMySQL

Excel

Xlsx

Mongo

RMongo

SQLlite

RSQLlite

PostgreSQL

RPostgreSQL

HDFS

RHDFS

Hive

RHive

HBase

RHBase

As we know, each of the mentioned databases have their own importance with the features. Each of these data sources will be described with the following points for better understanding:

  • Introduction
  • Features
  • Installation
  • Import the data into R
  • Data manipulation
  • Export the data from R

In this chapter, we are going to install and interact with R packages that will be used for various data operations in R.

Now, we will start understanding about databases and how to perform data-related operations to forward to data analytics for all databases.

Learning about data files as database

While dealing with the data analytics activities, we need to do data importing, loading, or exporting functionalities all the time. Sometimes the same operations need to be iterated with R programming language. So, we can use the available R function for performing the same data activities.

Understanding different types of files

There are commonly four different types of data files used with R for data storage operations. They are as follows:

  • CSV (Comma Separated Values)
  • Txt (with Tab Separated Values)
  • .RDATA (R's native data format)
  • .rda (R's native data format)

Installing R packages

To use the data file with the format specified earlier, we don't need to install extra R packages. We just need to use the built-in functions available with R.

Importing the data into R

To perform analytics-related activities, we need to use the following functions to get the data into R:

  • CSV: read.csv() is intended for reading the comma separated value (CSV) files, where the decimal point is ",". The retrieved data will be stored into one R object, which is considered as Dataframe.
    Dataframe <- read.csv("data.csv",sep=",")
    
  • TXT: To retrieve the tab separated values, the read.table() function will be used with some important parameters and the return type of this function will be Dataframe type.
    Dataframe <- read.table("data.csv", sep="	")
    
  • .RDATA: Here, the .RDATA format is used by R for storing the workspace data for a particular time period. It is considered as image file. This will store/retrieve all of the data available in the workspace.
    load("history.RDATA")
    
  • .rda: This is also R's native data format, which stores the specific data variable as per requirement.
    load("data_variables_a_and_b.rda")
    

Exporting the data from R

To export the existing data object from R and to support data files as per requirements, we need to use the following functions:

  • CSV: Write the dataframe object into the csv data file via the following command:
    write.csv(mydata, "c:/mydata.csv", sep=",", row.names=FALSE)
    
  • TXT: Write the data with the tab delimiters via the following command:
    write.table(mydata, "c:/mydata.txt", sep="	")
    
  • .RDATA: To store the workspace data variables available to R session, use the following command:
    save.image()
    
  • .rda: This function is used to store specific data objects that can be reused later. Use the following code for saving them to the .rda files.
    # column vector
    a <- c(1,2,3)
    
    # column vector
    b <- c(2,4,6)
    
    # saving it to R (.rda) data format
    save(a, b, file=" data_variables_a_and_b.rda")
    
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset