Home Page Icon
Home Page
Table of Contents for
Fundamental operations on graphs
Close
Fundamental operations on graphs
by Rishi Yadav
Apache Spark 2.x Cookbook
www.PacktPub.com
Preface
What this book covers
What you need for this book
Who this book is for
Sections
Getting ready
How to do it...
How it works...
There's more...
See also
Conventions
Reader feedback
Customer support
Downloading the color images of this book
Errata
Piracy
Questions
Getting Started with Apache Spark
Introduction
Leveraging Databricks Cloud
How to do it...
How it works...
Cluster
Notebook
Table
Library
Deploying Spark using Amazon EMR
What it represents is much bigger than what it looks
EMR's architecture
How to do it...
How it works...
EC2 instance types
T2 - Free Tier Burstable (EBS only)
M4 - General purpose (EBS only)
C4 - Compute optimized
X1 - Memory optimized
R4 - Memory optimized
P2 - General purpose GPU
I3 - Storage optimized
D2 - Storage optimized
Installing Spark from binaries
Getting ready
How to do it...
Building the Spark source code with Maven
Getting ready
How to do it...
Launching Spark on Amazon EC2
Getting ready
How to do it...
See also
Deploying Spark on a cluster in standalone mode
Getting ready
How to do it...
How it works...
See also
Deploying Spark on a cluster with Mesos
How to do it...
Deploying Spark on a cluster with YARN
Getting ready
How to do it...
How it works...
Understanding SparkContext and SparkSession
SparkContext
SparkSession
Understanding resilient distributed dataset - RDD
How to do it...
Developing Applications with Spark
Introduction
Exploring the Spark shell
How to do it...
There's more...
Developing a Spark applications in Eclipse with Maven
Getting ready
How to do it...
Developing a Spark applications in Eclipse with SBT
How to do it...
Developing a Spark application in IntelliJ IDEA with Maven
How to do it...
Developing a Spark application in IntelliJ IDEA with SBT
How to do it...
Developing applications using the Zeppelin notebook
How to do it...
Setting up Kerberos to do authentication
How to do it...
There's more...
Enabling Kerberos authentication for Spark
How to do it...
There's more...
Securing data at rest
Securing data in transit
Spark SQL
Understanding the evolution of schema awareness
Getting ready
DataFrames
Datasets
Schema-aware file formats
Understanding the Catalyst optimizer
Analysis
Logical plan optimization
Physical planning
Code generation
Inferring schema using case classes
How to do it...
There's more...
Programmatically specifying the schema
How to do it...
How it works...
Understanding the Parquet format
How to do it...
How it works...
Partitioning
Predicate pushdown
Parquet Hive interoperability
Loading and saving data using the JSON format
How to do it...
How it works...
Loading and saving data from relational databases
Getting ready
How to do it...
Loading and saving data from an arbitrary source
How to do it...
There's more...
Understanding joins
Getting ready
How to do it...
How it works...
Shuffle hash join
Broadcast hash join
The cartesian join
There's more...
Analyzing nested structures
Getting ready
How to do it...
Working with External Data Sources
Introduction
Loading data from the local filesystem
How to do it...
Loading data from HDFS
How to do it...
Loading data from Amazon S3
How to do it...
Loading data from Apache Cassandra
How to do it...
How it works
CAP Theorem
Cassandra partitions
Consistency levels
Spark Streaming
Introduction
Classic Spark Streaming
Structured Streaming
WordCount using Structured Streaming
How to do it...
Taking a closer look at Structured Streaming
How to do it...
There's more...
Streaming Twitter data
How to do it...
Streaming using Kafka
Getting ready
How to do it...
Understanding streaming challenges
Late arriving/out-of-order data
Maintaining the state in between batches
Message delivery reliability
Streaming is not an island
Getting Started with Machine Learning
Introduction
Creating vectors
Getting ready
How to do it...
How it works...
Calculating correlation
Getting ready
How to do it...
Understanding feature engineering
Feature selection
Quality of features
Number of features
Feature scaling
Feature extraction
TF-IDF
Term frequency
Inverse document frequency
How to do it...
Understanding Spark ML
Getting ready
How to do it...
Understanding hyperparameter tuning
How to do it...
Supervised Learning with MLlib — Regression
Introduction
Using linear regression
Getting ready
How to do it...
There's more...
Understanding the cost function
There's more...
Doing linear regression with lasso
Bias versus variance
How to do it...
Doing ridge regression
Supervised Learning with MLlib — Classification
Introduction
Doing classification using logistic regression
Getting ready
How to do it...
There's more...
What is ROC?
Doing binary classification using SVM
Getting ready
How to do it...
Doing classification using decision trees
Getting ready
How to do it...
How it works...
There's more...
Doing classification using random forest
Getting ready
How to do it...
Doing classification using gradient boosted trees
Getting ready
How to do it...
Doing classification with Naïve Bayes
Getting ready
How to do it...
Unsupervised Learning
Introduction
Clustering using k-means
Getting ready
How to do it...
Dimensionality reduction with principal component analysis
Getting ready
How to do it...
Dimensionality reduction with singular value decomposition
Getting ready
How to do it...
Recommendations Using Collaborative Filtering
Introduction
Collaborative filtering using explicit feedback
Getting ready
How to do it...
Adding my recommendations and then testing predictions
There's more...
Collaborative filtering using implicit feedback
How to do it...
Graph Processing Using GraphX and GraphFrames
Introduction
Fundamental operations on graphs
Getting ready
How to do it...
Using PageRank
Getting ready
How to do it...
Finding connected components
Getting ready
How to do it...
Performing neighborhood aggregation
Getting ready
How to do it...
Understanding GraphFrames
How to do it...
Optimizations and Performance Tuning
Optimizing memory
How to do it...
How it works...
Garbage collection
Mark and sweep
G1
Spark memory allocation
Leveraging speculation
How to do it...
Optimizing joins
How to do it...
Using compression to improve performance
How to do it...
Using serialization to improve performance
How to do it...
There's more...
Optimizing the level of parallelism
How to do it...
Understanding project Tungsten
How to do it...
How it works...
Tungsten phase 1
Bypassing GC
Cache conscious computation
Code generation for expression evaluation
Tungsten phase 2
Wholesale code generation
In-memory columnar format
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Introduction
Next
Next Chapter
Getting ready
Fundamental operations on graphs
In this recipe, we will learn how to create graphs and do basic operations on them.
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset