Home Page Icon
Home Page
Table of Contents for
Cover
Close
Cover
by Amit Nandi
Spark for Python Developers
Spark for Python Developers
Table of Contents
Spark for Python Developers
Credits
About the Author
Acknowledgment
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Setting Up a Spark Virtual Environment
Understanding the architecture of data-intensive applications
Infrastructure layer
Persistence layer
Integration layer
Analytics layer
Engagement layer
Understanding Spark
Spark libraries
PySpark in action
The Resilient Distributed Dataset
Understanding Anaconda
Setting up the Spark powered environment
Setting up an Oracle VirtualBox with Ubuntu
Installing Anaconda with Python 2.7
Installing Java 8
Installing Spark
Enabling IPython Notebook
Building our first app with PySpark
Virtualizing the environment with Vagrant
Moving to the cloud
Deploying apps in Amazon Web Services
Virtualizing the environment with Docker
Summary
2. Building Batch and Streaming Apps with Spark
Architecting data-intensive apps
Processing data at rest
Processing data in motion
Exploring data interactively
Connecting to social networks
Getting Twitter data
Getting GitHub data
Getting Meetup data
Analyzing the data
Discovering the anatomy of tweets
Exploring the GitHub world
Understanding the community through Meetup
Previewing our app
Summary
3. Juggling Data with Spark
Revisiting the data-intensive app architecture
Serializing and deserializing data
Harvesting and storing data
Persisting data in CSV
Persisting data in JSON
Setting up MongoDB
Installing the MongoDB server and client
Running the MongoDB server
Running the Mongo client
Installing the PyMongo driver
Creating the Python client for MongoDB
Harvesting data from Twitter
Exploring data using Blaze
Transferring data using Odo
Exploring data using Spark SQL
Understanding Spark dataframes
Understanding the Spark SQL query optimizer
Loading and processing CSV files with Spark SQL
Querying MongoDB from Spark SQL
Summary
4. Learning from Data Using Spark
Contextualizing Spark MLlib in the app architecture
Classifying Spark MLlib algorithms
Supervised and unsupervised learning
Additional learning algorithms
Spark MLlib data types
Machine learning workflows and data flows
Supervised machine learning workflows
Unsupervised machine learning workflows
Clustering the Twitter dataset
Applying Scikit-Learn on the Twitter dataset
Preprocessing the dataset
Running the clustering algorithm
Evaluating the model and the results
Building machine learning pipelines
Summary
5. Streaming Live Data with Spark
Laying the foundations of streaming architecture
Spark Streaming inner working
Going under the hood of Spark Streaming
Building in fault tolerance
Processing live data with TCP sockets
Setting up TCP sockets
Processing live data
Manipulating Twitter data in real time
Processing Tweets in real time from the Twitter firehose
Building a reliable and scalable streaming app
Setting up Kafka
Installing and testing Kafka
Developing producers
Developing consumers
Developing a Spark Streaming consumer for Kafka
Exploring flume
Developing data pipelines with Flume, Kafka, and Spark
Closing remarks on the Lambda and Kappa architecture
Understanding Lambda architecture
Understanding Kappa architecture
Summary
6. Visualizing Insights and Trends
Revisiting the data-intensive apps architecture
Preprocessing the data for visualization
Gauging words, moods, and memes at a glance
Setting up wordcloud
Creating wordclouds
Geo-locating tweets and mapping meetups
Geo-locating tweets
Displaying upcoming meetups on Google Maps
Summary
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Next
Next Chapter
Table of Contents
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset