Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Index

A

B

C

D

E

EdgeRDD operations
- about / VertexRDD and EdgeRDD operations
- mapping / Mapping VertexRDD and EdgeRDD
- joining / Joining EdgeRDDs
- edge directions, reversing / Reversing edge directions
ensemble algorithms
- about / Supervised learning
Enterprise Data Warehouse (EDW) optimization / Real-life use cases
explicit feedback
- versus implicit feedback / Explicit versus implicit feedback
external databases
- DataFrames, creating from / Creating DataFrames from external databases
external data sources
- about / External sources
Extract, Transform, and Load (ETL) / Big Data analytics and the role of Hadoop and Spark
Extract, Transform and Load (ETL) / Avro
Extra Packages for Enterprise Linux (EPEL)
- about / Installing and configuring R

F

G

H

I

J

Java Management Extensions (JMX)
- about / Features of HDFS
Java serialization / Serialization
JDBC
- working with / Working with JDBC
join operation
- about / Join
JSON
- working with / Working with JSON
Jupyter
- about / Introducing Jupyter
- installing / Installing Jupyter
- analytics / Analytics with Jupyter
- versus Apache Zeppelin / Jupyter versus Zeppelin

K

k-means model
- using / Using the k-means model
Kafka
- Spark Streaming / Spark Streaming with Kafka and HBase
- receiver-based approach / Receiver-based approach
- direct approach / Direct approach (no receivers)
Kerberos Security Enabled Spark Cluster
- Spark applications, connecting to / Connecting to the Kerberos Security Enabled Spark Cluster
Kinesis Client Library (KCL)
- about / Advanced sources
Kryo serialization / Serialization

L

M

N

O

Online Analytical Processing (OLAP) / Tools and techniques
optimization algorithms
- about / Optimization
- Stochastic Gradient Descent / Optimization
- Limited-memory BFGS (L-BFGS) / Optimization
Optimized Row Columnar (ORC)
- about / RCFile and ORCFile
- working with / Working with ORC
- reference / Working with ORC
output operations
- about / Output operations
- print()/pprint() / Output operations
- saveAsTextFiles / Output operations
- saveAsObjectFiles / Output operations
- saveAsHadoopFile / Output operations
- saveAsNewAPIHadoopDataset / Output operations
- saveToCassandra / Output operations
- foreachRDD(func) / Output operations
output stores
- about / Input sources and output stores, Output stores

P

packages, Spark
- reference / Introducing SQL, Datasources, DataFrame, and Dataset APIs
PageRank
- about / GraphX algorithms
Pair RDDs / Pair RDDs
Pandas
- working with / Working with Pandas
parallelism, in RDDs / Parallelism in RDDs
Parquet
- about / Parquet
- use case / Parquet
- working with / Working with Parquet
- reference / Working with Parquet
partition pruning / Working with ORC
performance tuning parameters, Spark SQL
- spark.sql.inMemoryColumnarStorage.compressed / Performance optimizations
- spark.sql.inMemoryColumnarStorage.batchSize / Performance optimizations
- spark.sql.autoBroadcastJoinThreshold / Performance optimizations
- spark.sql.files.maxPartitionBytes / Performance optimizations
- spark.sql.shuffle.partitions / Performance optimizations
- spark.sql.planner.externalSort / Performance optimizations
persistence / Persistence and caching
personally identifiable information (PII) / Identifying the necessary data
pipelining / Pipelining
Power Iteration Clustering (PIC)
- about / Machine learning algorithms
predicate pushdown / Working with ORC
Pregel API
- implementing / Pregel API
Principal Component Analysis (PCA)
- about / Machine learning algorithms
protocol buffers
- about / Protocol buffers and thrift
public movielens data
- URL / Preparing the environment
Python DataFrame operations
- reference / Speed

R

R
- about / Introducing R and SparkR, What is R?
- features / What is R?
- limitations / What is R?
- installing / Installing and configuring R
- configuring / Installing and configuring R
Random Forests
- about / Supervised learning
RDD actions
- reference / Transformations and actions
RDD operations
- transformations / Transformations and actions
- actions / Transformations and actions
- about / RDD operations
RDDs
- issues / What's wrong with RDDs?
- using, scenarios / When to use RDDs, Datasets, and DataFrames?
- DataFrames, creating from / Creating DataFrames from RDDs
- DataFrames, converting to / Converting DataFrames to RDDs
- sharing / Sharing SparkContexts and RDDs
- creating, for recommendation system with MLlib / Creating RDDs
RDD transformations
- reference / Transformations and actions
RDD Transformations
- versus Dataset and DataFrame Transformations / RDD Transformations versus Dataset and DataFrames Transformations
Read, Evaluate, Print, and Loop (REPL)
- about / Introducing web-based notebooks
real-life use cases
- about / Real-life use cases
real-time processing
- about / Introducing real-time processing
- Spark Streaming, pros and cons / Pros and cons of Spark Streaming
- Spark Streaming, history / History of Spark Streaming
receiver-based approach, Kafka
- about / Receiver-based approach
- Zookeeper / Role of Zookeeper
receivers
- reliability / Receiver reliability
- reliable receiver / Receiver reliability
- unreliable receiver / Receiver reliability
recommendation system, with MLlib
- building / A recommendation system with MLlib
- environment, preparing / Preparing the environment
- RDDs, creating / Creating RDDs
- data, exploring with DataFrames / Exploring the data with DataFrames
- testing dataset, creating / Creating training and testing datasets
- training dataset, creating / Creating training and testing datasets
- model, creating / Creating a model
- predictions, creating / Making predictions
- model, evaluating with testing data / Evaluating the model with testing data
- model accuracy, checking / Checking the accuracy of the model
- explicit feedback, versus implicit feedback / Explicit versus implicit feedback
recommendation systems
- building / Building recommendation systems
- examples / Building recommendation systems
- content-based filtering / Content-based filtering
- collaborative filtering / Collaborative filtering
- limitations / Limitations of a recommendation system
recommender systems
- about / Recommender systems
- collaborative filtering / Recommender systems
Record Columnar File (RCFile)
- about / RCFile and ORCFile
regression
- about / Supervised learning
- Linear Regression / Supervised learning
- Logistic Regression / Supervised learning
- Support Vector Machines / Supervised learning
Relational Database Management Service (RDBMS) / Evolution of DataFrames and Datasets
Relational Database Management Systems (RDBMS) / Big Data analytics and the role of Hadoop and Spark
reliable receiver
- about / Receiver reliability
REPL (read-eval-print loop) / Spark Shell
resilient distributed dataset (RDD) / MapReduce issues
Resilient Distributed Dataset (RDD) / Learning Spark core concepts
- collection, parallelizing / Method 1 – parallelizing a collection
- data, reading from file / Method 2 – reading from a file
- files, reading from HDFS / Reading files from HDFS
- High Availability (HA), used for reading files from HDFS / Reading files from HDFS with HA enabled
Resilient Distributed Dataset (RDDs)
- about / Resilient Distributed Dataset
- parallelism / Parallelism in RDDs
Resilient Distributed Datasets (RDD)
- about / What is a graph?
ResourceManager
- about / YARN
REST API
- URL / Introducing Apache Zeppelin
reverse operator
- about / Modifying graphs
R project
- URL / What is R?
RStudio
- SparkR, using / Using SparkR with RStudio

S

T

U

union operation
- about / Union
universal recommendation system
- building, with Mahout and search tool / Building a universal recommendation system with Mahout and search tool
unreliable receiver
- about / Receiver reliability
unsupervised learning
- about / Unsupervised learning
- clustering algorithms / Unsupervised learning
- Dimensionality Reduction / Unsupervised learning
updateStateByKey operation
- about / updateStateByKey
user-based collaborative filtering
- about / User-based collaborative filtering
User Defined Functions (UDFs)
- about / Introducing Hivemall
User Defined Table Functions (UDTFs)
- about / Introducing Hivemall

V

W

WAL
- driver failures, recovering / Recovering with WAL
web-based notebooks
- about / Introducing web-based notebooks
window operations
- about / Window operations
- window / Window operations
- countByWindow / Window operations
- reduceByWindow / Window operations
- reduceByKeyAndWindow / Window operations
- countByValueAndWindow / Window operations
write-ahead logs (WAL)
- about / History of Spark Streaming
Write Once and Read Many (WORM)
- about / Features of HDFS

X

XML
- working with / Working with XML

Y

YARN
- about / YARN
- dynamic resource allocation / Dynamic resource allocation
- client mode, versus cluster mode / Client mode versus cluster mode
- Sparkling Water, submitting / An application flow on YARN
YARN settings
- reference / Client mode versus cluster mode
Yet Another Resource Negotiator (YARN)
- about / Introducing Apache Hadoop, YARN

Z

Zeppelin
- SparkR, using / Using SparkR with Zeppelin
ZeppelinHub Viewer
- URL / Analytics with Zeppelin
Zeppelin notebooks
- URL / Analytics with Zeppelin
Zookeeper
- about / Role of Zookeeper

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

R

S

T

U

V

W

X

Y

Z

Table of Contents for
Index