Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Index

A

Amazon Web Services (AWS)
- apps, deploying with / Deploying apps in Amazon Web Services
- about / Deploying apps in Amazon Web Services
Anaconda
- defining / Understanding Anaconda
Anaconda Installer
- URL / Installing Anaconda with Python 2.7
Anaconda stack
- Anaconda / Understanding Anaconda
- Conda / Understanding Anaconda
- Numba / Understanding Anaconda
- Blaze / Understanding Anaconda
- Bokeh / Understanding Anaconda
- Wakari / Understanding Anaconda
analytics layer / Analytics layer
Apache Kafka
- about / Setting up Kafka
- properties / Setting up Kafka
Apache Spark
- about / Displaying upcoming meetups on Google Maps
APIs (Application Programming Interface)
- about / Connecting to social networks
app
- previewing / Previewing our app
apps
- deploying, with Amazon Web Services (AWS) / Deploying apps in Amazon Web Services
architecture, data-intensive applications
- about / Understanding the architecture of data-intensive applications
- infrastructure layer / Infrastructure layer
- persistence layer / Persistence layer
- integration layer / Integration layer
- analytics layer / Analytics layer
- engagement layer / Engagement layer
Asynchronous JavaScript (AJAX)
- about / Processing live data with TCP sockets
AWS console
- URL / Deploying apps in Amazon Web Services

B

Big Data, with Apache Spark
- references / Virtualizing the environment with Vagrant
Blaze
- used, for exploring data / Exploring data using Blaze
BSON (Binary JSON)
- about / Setting up MongoDB

C

Catalyst
- about / Exploring data using Spark SQL
Chef
- about / Infrastructure layer
Clustering
- K-Means / Supervised and unsupervised learning
- Gaussian Mixture / Supervised and unsupervised learning
- Power Iteration Clustering (PIC) / Supervised and unsupervised learning
- Latent Dirichlet Allocation (LDA) / Supervised and unsupervised learning
Cluster manager
- about / The Resilient Distributed Dataset
comma-separated values (CSV)
- about / Harvesting and storing data
Continuum
- URL / Understanding Anaconda
Couchbase
- about / Persistence layer

D

D3.js
- about / Revisiting the data-intensive apps architecture
- URL / Revisiting the data-intensive apps architecture
DAG (Directed Acyclic Graph)
- about / The Resilient Distributed Dataset, Serializing and deserializing data
data
- serializing / Serializing and deserializing data
- deserializing / Serializing and deserializing data
- harvesting / Harvesting and storing data
- storing / Harvesting and storing data
- persisting, in CSV / Persisting data in CSV
- persisting, in JSON / Persisting data in JSON
- MongoDB, setting up / Setting up MongoDB
- , harvesting from Twitter / Harvesting data from Twitter
- exploring, Blaze used / Exploring data using Blaze
- transferring, Odo used / Transferring data using Odo
- exploring, Spark SQL used / Exploring data using Spark SQL
- pre-processing, for visualization / Preprocessing the data for visualization
data-intensive apps
- architecting / Architecting data-intensive apps
- latency / Architecting data-intensive apps
- scalability / Architecting data-intensive apps
- fault tolerance / Architecting data-intensive apps
- flexibility / Architecting data-intensive apps
- data at rest, processing / Processing data at rest
- data in motion, processing / Processing data in motion
- data, exploring / Exploring data interactively
data-intensive apps architecture
- about / Revisiting the data-intensive apps architecture
data analysis
- defining / Analyzing the data
- Tweets anatomy, discovering / Discovering the anatomy of tweets
Data Driven Documents (D3)
- about / Revisiting the data-intensive apps architecture
data flows
- about / Machine learning workflows and data flows
data intensive apps architecture
- defining / Revisiting the data-intensive app architecture
data lifecycle
- Connect / Integration layer
- Correct / Integration layer
- Collect / Integration layer
- Compose / Integration layer
- Consume / Integration layer
- Control / Integration layer
Data Science London
- about / Displaying upcoming meetups on Google Maps
data types, Spark MLlib
- local vector / Spark MLlib data types
- labeled point / Spark MLlib data types
- local matrix / Spark MLlib data types
- distributed matrix / Spark MLlib data types
Decision Trees
- about / Supervised and unsupervised learning
Dimensionality Reduction
- Singular Value Decomposition (SVD) / Supervised and unsupervised learning
- Principal Component Analysis (PCA) / Supervised and unsupervised learning
Docker
- about / Infrastructure layer
- environment, virtualizing with / Virtualizing the environment with Docker
- references / Virtualizing the environment with Docker
DStream (Discretized Stream)
- defining / Going under the hood of Spark Streaming

E

elements, Flume
- Event / Exploring flume
- Client / Exploring flume
- Source / Exploring flume
- Sink / Exploring flume
- Channel / Exploring flume
engagement layer / Engagement layer
Ensembles of trees
- about / Supervised and unsupervised learning
environment
- virtualizing, with Vagrant / Virtualizing the environment with Vagrant
- virtualizing, with Docker / Virtualizing the environment with Docker

F

First App
- building, with PySpark / Building our first app with PySpark
Flume
- about / Exploring flume
- advantages / Exploring flume
- elements / Exploring flume

G

ggplot
- about / Revisiting the data-intensive apps architecture
- URL / Revisiting the data-intensive apps architecture
GitHub
- URL / Getting GitHub data
- about / Exploring the GitHub world
- operating, with Meetup API / Understanding the community through Meetup
Google Maps
- upcoming meetups, displaying on / Displaying upcoming meetups on Google Maps

H

Hadoop MongoDB connector
- URL / Querying MongoDB from Spark SQL
Hbase and Cassandra
- about / Persistence layer
HDFS (Hadoop Distributed File System)
- about / Understanding Spark

I

infrastructure layer / Infrastructure layer
Ingest Mode
- Batch Data Transport / Building a reliable and scalable streaming app
- Micro Batch / Building a reliable and scalable streaming app
- Pipelining / Building a reliable and scalable streaming app
- Message Queue / Building a reliable and scalable streaming app
integration layer / Integration layer

J

Java 8
- installing / Installing Java 8
JRE (Java Runtime Environment)
- about / Installing Java 8
JSON (JavaScript Object Notation)
- about / Connecting to social networks, Harvesting and storing data

K

Kafka
- setting up / Setting up Kafka
- installing / Installing and testing Kafka
- testing / Installing and testing Kafka
- URL / Installing and testing Kafka
- producers, developing / Developing producers
- consumers, developing / Developing consumers
- Spark Streaming consumer, developing for / Developing a Spark Streaming consumer for Kafka
Kappa architecture
- defining / Closing remarks on the Lambda and Kappa architecture, Understanding Kappa architecture

L

Lambda architecture
- defining / Closing remarks on the Lambda and Kappa architecture, Understanding Lambda architecture
Linear Regression Models
- about / Supervised and unsupervised learning

M

Machine Learning
- about / Displaying upcoming meetups on Google Maps
machine learning pipelines
- building / Building machine learning pipelines
machine learning workflows
- about / Machine learning workflows and data flows
Massive Open Online Courses (MOOCs)
- about / Virtualizing the environment with Vagrant
Matplotlib
- about / Revisiting the data-intensive apps architecture
- URL / Revisiting the data-intensive apps architecture
Meetup API
- URL / Getting Meetup data
meetups
- mapping / Geo-locating tweets and mapping meetups
MLlib algorithms
- Collaborative filtering / Additional learning algorithms
- feature extraction and transformation / Additional learning algorithms
- optimization / Additional learning algorithms
- Limited-memory BFGS (L-BFGS) / Additional learning algorithms
models
- defining, for processing streams of data / Laying the foundations of streaming architecture
MongoDB
- about / Persistence layer
- setting up / Setting up MongoDB
- server and client, installing / Installing the MongoDB server and client
- server, running / Running the MongoDB server
- Mongo client, running / Running the Mongo client
- PyMongo driver, installing / Installing the PyMongo driver
- Python client, creating for / Creating the Python client for MongoDB
- references / Querying MongoDB from Spark SQL
MongoDB, from Spark SQL
- URL / Querying MongoDB from Spark SQL
Multi-Dimensional Scaling (MDS) algorithm
- about / Applying Scikit-Learn on the Twitter dataset
Mumrah, on GitHub
- URL / Installing and testing Kafka
MySQL
- about / Persistence layer

N

Naive Bayes
- about / Supervised and unsupervised learning
Neo4j
- about / Persistence layer
network_wordcount.py
- URL / Processing live data

O

Odo
- about / Transferring data using Odo
- used, for transferring data / Transferring data using Odo
operations, on RDDs
- transformations / The Resilient Distributed Dataset
- action / The Resilient Distributed Dataset

P

persistence layer / Persistence layer
PIL (Python Imaging Library)
- about / Setting up wordcloud
PostgreSQL
- about / Persistence layer
Puppet
- about / Infrastructure layer
PySpark
- First App, building with / Building our first app with PySpark

R

RDD (Resilient Distributed Dataset)
- about / The Resilient Distributed Dataset
Resilient Distributed Datasets (RDD)
- about / Spark Streaming inner working
REST (Representation State Transfer)
- about / Connecting to social networks
RPC (Remote Procedure Call)
- about / Laying the foundations of streaming architecture

S

SDK (Software Development Kit)
- about / Installing Java 8
Seaborn
- about / Revisiting the data-intensive apps architecture
- URL / Revisiting the data-intensive apps architecture
social networks
- connecting to / Connecting to social networks
- Twitter data, obtaining / Getting Twitter data
- GitHub data, obtaining / Getting GitHub data
- Meetup data, obtaining / Getting Meetup data
Spark
- defining / Understanding Spark
- Batch / Understanding Spark
- Streaming / Understanding Spark
- Iterative / Understanding Spark
- Interactive / Understanding Spark
- libraries / Spark libraries
- URL / Installing Spark
- Clustering / Supervised and unsupervised learning
- Dimensionality Reduction / Supervised and unsupervised learning
- Regression and Classification / Supervised and unsupervised learning
- Isotonic Regression / Supervised and unsupervised learning
- MLlib algorithms / Additional learning algorithms
Spark, on EC2
- URL / Deploying apps in Amazon Web Services
SparkContext
- about / Spark Streaming inner working
Spark dataframes
- defining / Understanding Spark dataframes
Spark libraries
- SparkSQL / Spark libraries
- SparkMLLIB / Spark libraries
- Spark Streaming / Spark libraries
- Spark GraphX / Spark libraries
- PySpark, defining / PySpark in action
- RDD (Resilient Distributed Dataset) / The Resilient Distributed Dataset
Spark MLlib
- contextualizing, in app architecture / Contextualizing Spark MLlib in the app architecture
- data types / Spark MLlib data types
Spark MLlib algorithms
- classifying / Classifying Spark MLlib algorithms
- supervised learning / Supervised and unsupervised learning
- unsupervised learning / Supervised and unsupervised learning
- additional learning algorithms / Additional learning algorithms
Spark Powered Environment
- setting up / Setting up the Spark powered environment
- Oracle VirtualBox, setting up with Ubuntu / Setting up an Oracle VirtualBox with Ubuntu
- Anaconda, installing with Python 2.7 / Installing Anaconda with Python 2.7
- Java 8, installing / Installing Java 8
- Spark, installing / Installing Spark
- IPython Notebook, enabling / Enabling IPython Notebook
Spark SQL
- used, for exploring data / Exploring data using Spark SQL
- about / Exploring data using Spark SQL
- CSV files, loading with / Loading and processing CSV files with Spark SQL
- CSV files, processing with / Loading and processing CSV files with Spark SQL
- MongoDB, querying from / Querying MongoDB from Spark SQL
SparkSQL module
- about / Analytics layer
Spark SQL query optimizer
- defining / Understanding the Spark SQL query optimizer
Spark streaming
- defining / Spark Streaming inner working, Going under the hood of Spark Streaming
- building, in fault tolerance / Building in fault tolerance
Stochastic Gradient Descent
- about / Classifying Spark MLlib algorithms
streaming app
- building / Building a reliable and scalable streaming app
- Kafka, setting up / Setting up Kafka
- flume, exploring / Exploring flume
- data pipelines, developing with Flume / Developing data pipelines with Flume, Kafka, and Spark
- data pipelines, developing with Kafka / Developing data pipelines with Flume, Kafka, and Spark
- data pipelines, developing with Spark / Developing data pipelines with Flume, Kafka, and Spark
streaming architecture
- about / Laying the foundations of streaming architecture
StreamingContext
- about / Spark Streaming inner working
supervised machine learning workflow
- about / Supervised machine learning workflows

T

TCP Sockets
- live data, processing with / Processing live data with TCP sockets, Processing live data
- setting up / Setting up TCP sockets
TF-IDF (Term Frequency - Inverse Document Frequency)
- about / Classifying Spark MLlib algorithms
Trident
- about / Laying the foundations of streaming architecture
tweets
- geo-locating / Geo-locating tweets and mapping meetups, Geo-locating tweets
Twitter
- URL / Getting Twitter data
Twitter API, on dev console
- URL / Getting Twitter data
Twitter data
- manipulating / Manipulating Twitter data in real time
- tweets, processing from Twitter firehose / Processing Tweets in real time from the Twitter firehose
Twitter dataset
- clustering / Clustering the Twitter dataset
- SciKit-Learn, applying on / Applying Scikit-Learn on the Twitter dataset
- dataset, preprocessing / Preprocessing the dataset
- clustering algorithm, running / Running the clustering algorithm
- model and results, evaluating / Evaluating the model and the results

U

Ubuntu 14.04.1 LTS release
- URL / Setting up an Oracle VirtualBox with Ubuntu
unified log
- properties / Understanding Kappa architecture
Unified Log
- properties / Building a reliable and scalable streaming app
unsupervised machine learning workflow
- about / Unsupervised machine learning workflows

V

Vagrant
- about / Infrastructure layer
- environment, virtualizing with / Virtualizing the environment with Vagrant
- reference / Virtualizing the environment with Vagrant
VirtualBox VM
- URL / Setting up an Oracle VirtualBox with Ubuntu
visualization
- data, pre-processing for / Preprocessing the data for visualization

W

wordclouds
- creating / Gauging words, moods, and memes at a glance, Creating wordclouds
- setting up / Setting up wordcloud
- URL / Setting up wordcloud

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.