Home Page Icon
Home Page
Table of Contents for
Table of Contents
Close
Table of Contents
by Vignesh Prajapati
Big Data Analytics with R and Hadoop
Big Data Analytics with R and Hadoop
Table of Contents
Big Data Analytics with R and Hadoop
Credits
About the Author
Acknowledgment
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
Introducing R
Understanding features of R
Studying the popularity of R
Introducing Big Data
Getting information about popular organizations that hold Big Data
Introducing Hadoop
Exploring Hadoop features
Studying Hadoop components
Understanding the reason for using R and Hadoop together
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Getting Ready to Use R and Hadoop
Installing R
Installing RStudio
Understanding the features of R language
Using R packages
Performing data operations
Increasing community support
Performing data modeling in R
Installing Hadoop
Understanding different Hadoop modes
Understanding Hadoop installation steps
Installing Hadoop on Linux, Ubuntu flavor (single node cluster)
Installing Hadoop on Linux, Ubuntu flavor (multinode cluster)
Installing Cloudera Hadoop on Ubuntu
Understanding Hadoop features
Understanding HDFS
Understanding the characteristics of HDFS
Understanding MapReduce
Learning the HDFS and MapReduce architecture
Understanding the HDFS architecture
Understanding HDFS components
Understanding the MapReduce architecture
Understanding MapReduce components
Understanding the HDFS and MapReduce architecture by plot
Understanding Hadoop subprojects
Summary
2. Writing Hadoop MapReduce Programs
Understanding the basics of MapReduce
Introducing Hadoop MapReduce
Listing Hadoop MapReduce entities
Understanding the Hadoop MapReduce scenario
Loading data into HDFS
Executing the Map phase
Shuffling and sorting
Reducing phase execution
Understanding the limitations of MapReduce
Understanding Hadoop's ability to solve problems
Understanding the different Java concepts used in Hadoop programming
Understanding the Hadoop MapReduce fundamentals
Understanding MapReduce objects
Deciding the number of Maps in MapReduce
Deciding the number of Reducers in MapReduce
Understanding MapReduce dataflow
Taking a closer look at Hadoop MapReduce terminologies
Writing a Hadoop MapReduce example
Understanding the steps to run a MapReduce job
Learning to monitor and debug a Hadoop MapReduce job
Exploring HDFS data
Understanding several possible MapReduce definitions to solve business problems
Learning the different ways to write Hadoop MapReduce in R
Learning RHadoop
Learning RHIPE
Learning Hadoop streaming
Summary
3. Integrating R and Hadoop
Introducing RHIPE
Installing RHIPE
Installing Hadoop
Installing R
Installing protocol buffers
Environment variables
The rJava package installation
Installing RHIPE
Understanding the architecture of RHIPE
Understanding RHIPE samples
RHIPE sample program (Map only)
Word count
Understanding the RHIPE function reference
Initialization
HDFS
MapReduce
Introducing RHadoop
Understanding the architecture of RHadoop
Installing RHadoop
Understanding RHadoop examples
Word count
Understanding the RHadoop function reference
The hdfs package
The rmr package
Summary
4. Using Hadoop Streaming with R
Understanding the basics of Hadoop streaming
Understanding how to run Hadoop streaming with R
Understanding a MapReduce application
Understanding how to code a MapReduce application
Understanding how to run a MapReduce application
Executing a Hadoop streaming job from the command prompt
Executing the Hadoop streaming job from R or an RStudio console
Understanding how to explore the output of MapReduce application
Exploring an output from the command prompt
Exploring an output from R or an RStudio console
Understanding basic R functions used in Hadoop MapReduce scripts
Monitoring the Hadoop MapReduce job
Exploring the HadoopStreaming R package
Understanding the hsTableReader function
Understanding the hsKeyValReader function
Understanding the hsLineReader function
Running a Hadoop streaming job
Executing the Hadoop streaming job
Summary
5. Learning Data Analytics with R and Hadoop
Understanding the data analytics project life cycle
Identifying the problem
Designing data requirement
Preprocessing data
Performing analytics over data
Visualizing data
Understanding data analytics problems
Exploring web pages categorization
Identifying the problem
Designing data requirement
Understanding the required Google Analytics data attributes
Collecting data
Preprocessing data
Performing analytics over data
Visualizing data
Computing the frequency of stock market change
Identifying the problem
Designing data requirement
Preprocessing data
Performing analytics over data
Visualizing data
Predicting the sale price of blue book for bulldozers – case study
Identifying the problem
Designing data requirement
Preprocessing data
Performing analytics over data
Understanding Poisson-approximation resampling
Fitting random forests with RHadoop
Summary
6. Understanding Big Data Analysis with Machine Learning
Introduction to machine learning
Types of machine-learning algorithms
Supervised machine-learning algorithms
Linear regression
Linear regression with R
Linear regression with R and Hadoop
Logistic regression
Logistic regression with R
Logistic regression with R and Hadoop
Unsupervised machine learning algorithm
Clustering
Clustering with R
Performing clustering with R and Hadoop
Recommendation algorithms
Steps to generate recommendations in R
Generating recommendations with R and Hadoop
Summary
7. Importing and Exporting Data from Various DBs
Learning about data files as database
Understanding different types of files
Installing R packages
Importing the data into R
Exporting the data from R
Understanding MySQL
Installing MySQL
Installing RMySQL
Learning to list the tables and their structure
Importing the data into R
Understanding data manipulation
Understanding Excel
Installing Excel
Importing data into R
Understanding data manipulation with R and Excel
Exporting the data to Excel
Understanding MongoDB
Installing MongoDB
Mapping SQL to MongoDB
Mapping SQL to MongoQL
Installing rmongodb
Importing the data into R
Understanding data manipulation
Understanding SQLite
Understanding features of SQLite
Installing SQLite
Installing RSQLite
Importing the data into R
Understanding data manipulation
Understanding PostgreSQL
Understanding features of PostgreSQL
Installing PostgreSQL
Installing RPostgreSQL
Exporting the data from R
Understanding Hive
Understanding features of Hive
Installing Hive
Setting up Hive configurations
Installing RHive
Understanding RHive operations
Understanding HBase
Understanding HBase features
Installing HBase
Installing thrift
Installing RHBase
Importing the data into R
Understanding data manipulation
Summary
A. References
R + Hadoop help materials
R groups
Hadoop groups
R + Hadoop groups
Popular R contributors
Popular Hadoop contributors
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Cover
Next
Next Chapter
Big Data Analytics with R and Hadoop
Table of Contents
Big Data Analytics with R and Hadoop
Credits
About the Author
Acknowledgment
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers and more
Why Subscribe?
Free Access for Packt account holders
Preface
Introducing R
Understanding features of R
Studying the popularity of R
Introducing Big Data
Getting information about popular organizations that hold Big Data
Introducing Hadoop
Exploring Hadoop features
Studying Hadoop components
Understanding the reason for using R and Hadoop together
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Getting Ready to Use R and Hadoop
Installing R
Installing RStudio
Understanding the features of R language
Using R packages
Performing data operations
Increasing community support
Performing data modeling in R
Installing Hadoop
Understanding different Hadoop modes
Understanding Hadoop installation steps
Installing Hadoop on Linux, Ubuntu flavor (single node cluster)
Installing Hadoop on Linux, Ubuntu flavor (multinode cluster)
Installing Cloudera Hadoop on Ubuntu
Understanding Hadoop features
Understanding HDFS
Understanding the characteristics of HDFS
Understanding MapReduce
Learning the HDFS and MapReduce architecture
Understanding the HDFS architecture
Understanding HDFS components
Understanding the MapReduce architecture
Understanding MapReduce components
Understanding the HDFS and MapReduce architecture by plot
Understanding Hadoop subprojects
Summary
2. Writing Hadoop MapReduce Programs
Understanding the basics of MapReduce
Introducing Hadoop MapReduce
Listing Hadoop MapReduce entities
Understanding the Hadoop MapReduce scenario
Loading data into HDFS
Executing the Map phase
Shuffling and sorting
Reducing phase execution
Understanding the limitations of MapReduce
Understanding Hadoop's ability to solve problems
Understanding the different Java concepts used in Hadoop programming
Understanding the Hadoop MapReduce fundamentals
Understanding MapReduce objects
Deciding the number of Maps in MapReduce
Deciding the number of Reducers in MapReduce
Understanding MapReduce dataflow
Taking a closer look at Hadoop MapReduce terminologies
Writing a Hadoop MapReduce example
Understanding the steps to run a MapReduce job
Learning to monitor and debug a Hadoop MapReduce job
Exploring HDFS data
Understanding several possible MapReduce definitions to solve business problems
Learning the different ways to write Hadoop MapReduce in R
Learning RHadoop
Learning RHIPE
Learning Hadoop streaming
Summary
3. Integrating R and Hadoop
Introducing RHIPE
Installing RHIPE
Installing Hadoop
Installing R
Installing protocol buffers
Environment variables
The rJava package installation
Installing RHIPE
Understanding the architecture of RHIPE
Understanding RHIPE samples
RHIPE sample program (Map only)
Word count
Understanding the RHIPE function reference
Initialization
HDFS
MapReduce
Introducing RHadoop
Understanding the architecture of RHadoop
Installing RHadoop
Understanding RHadoop examples
Word count
Understanding the RHadoop function reference
The hdfs package
The rmr package
Summary
4. Using Hadoop Streaming with R
Understanding the basics of Hadoop streaming
Understanding how to run Hadoop streaming with R
Understanding a MapReduce application
Understanding how to code a MapReduce application
Understanding how to run a MapReduce application
Executing a Hadoop streaming job from the command prompt
Executing the Hadoop streaming job from R or an RStudio console
Understanding how to explore the output of MapReduce application
Exploring an output from the command prompt
Exploring an output from R or an RStudio console
Understanding basic R functions used in Hadoop MapReduce scripts
Monitoring the Hadoop MapReduce job
Exploring the HadoopStreaming R package
Understanding the hsTableReader function
Understanding the hsKeyValReader function
Understanding the hsLineReader function
Running a Hadoop streaming job
Executing the Hadoop streaming job
Summary
5. Learning Data Analytics with R and Hadoop
Understanding the data analytics project life cycle
Identifying the problem
Designing data requirement
Preprocessing data
Performing analytics over data
Visualizing data
Understanding data analytics problems
Exploring web pages categorization
Identifying the problem
Designing data requirement
Understanding the required Google Analytics data attributes
Collecting data
Preprocessing data
Performing analytics over data
Visualizing data
Computing the frequency of stock market change
Identifying the problem
Designing data requirement
Preprocessing data
Performing analytics over data
Visualizing data
Predicting the sale price of blue book for bulldozers – case study
Identifying the problem
Designing data requirement
Preprocessing data
Performing analytics over data
Understanding Poisson-approximation resampling
Fitting random forests with RHadoop
Summary
6. Understanding Big Data Analysis with Machine Learning
Introduction to machine learning
Types of machine-learning algorithms
Supervised machine-learning algorithms
Linear regression
Linear regression with R
Linear regression with R and Hadoop
Logistic regression
Logistic regression with R
Logistic regression with R and Hadoop
Unsupervised machine learning algorithm
Clustering
Clustering with R
Performing clustering with R and Hadoop
Recommendation algorithms
Steps to generate recommendations in R
Generating recommendations with R and Hadoop
Summary
7. Importing and Exporting Data from Various DBs
Learning about data files as database
Understanding different types of files
Installing R packages
Importing the data into R
Exporting the data from R
Understanding MySQL
Installing MySQL
Installing RMySQL
Learning to list the tables and their structure
Importing the data into R
Understanding data manipulation
Understanding Excel
Installing Excel
Importing data into R
Understanding data manipulation with R and Excel
Exporting the data to Excel
Understanding MongoDB
Installing MongoDB
Mapping SQL to MongoDB
Mapping SQL to MongoQL
Installing rmongodb
Importing the data into R
Understanding data manipulation
Understanding SQLite
Understanding features of SQLite
Installing SQLite
Installing RSQLite
Importing the data into R
Understanding data manipulation
Understanding PostgreSQL
Understanding features of PostgreSQL
Installing PostgreSQL
Installing RPostgreSQL
Exporting the data from R
Understanding Hive
Understanding features of Hive
Installing Hive
Setting up Hive configurations
Installing RHive
Understanding RHive operations
Understanding HBase
Understanding HBase features
Installing HBase
Installing thrift
Installing RHBase
Importing the data into R
Understanding data manipulation
Summary
A. References
R + Hadoop help materials
R groups
Hadoop groups
R + Hadoop groups
Popular R contributors
Popular Hadoop contributors
Index
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset