Index
A
- Amazon Web Services (AWS)
- Anaconda
- Anaconda Installer
- Anaconda stack
- analytics layer / Analytics layer
- Apache Kafka
- Apache Spark
- APIs (Application Programming Interface)
- app
- apps
- architecture, data-intensive applications
- Asynchronous JavaScript (AJAX)
- AWS console
B
- Big Data, with Apache Spark
- Blaze
- BSON (Binary JSON)
C
- Catalyst
- Chef
- Clustering
- Cluster manager
- comma-separated values (CSV)
- Continuum
- Couchbase
D
- D3.js
- DAG (Directed Acyclic Graph)
- data
- data-intensive apps
- data-intensive apps architecture
- data analysis
- Data Driven Documents (D3)
- data flows
- data intensive apps architecture
- data lifecycle
- Data Science London
- data types, Spark MLlib
- Decision Trees
- Dimensionality Reduction
- Docker
- DStream (Discretized Stream)
E
- elements, Flume
- engagement layer / Engagement layer
- Ensembles of trees
- environment
F
G
- ggplot
- GitHub
- Google Maps
H
- Hadoop MongoDB connector
- Hbase and Cassandra
- HDFS (Hadoop Distributed File System)
I
J
- Java 8
- JRE (Java Runtime Environment)
- JSON (JavaScript Object Notation)
K
L
- Lambda architecture
- Linear Regression Models
M
- Machine Learning
- machine learning pipelines
- machine learning workflows
- Massive Open Online Courses (MOOCs)
- Matplotlib
- Meetup API
- meetups
- MLlib algorithms
- models
- MongoDB
- MongoDB, from Spark SQL
- Multi-Dimensional Scaling (MDS) algorithm
- Mumrah, on GitHub
- MySQL
N
- Naive Bayes
- Neo4j
- network_wordcount.py
O
P
- persistence layer / Persistence layer
- PIL (Python Imaging Library)
- PostgreSQL
- Puppet
- PySpark
R
- RDD (Resilient Distributed Dataset)
- Resilient Distributed Datasets (RDD)
- REST (Representation State Transfer)
- RPC (Remote Procedure Call)
S
- SDK (Software Development Kit)
- Seaborn
- social networks
- Spark
- Spark, on EC2
- SparkContext
- Spark dataframes
- Spark libraries
- Spark MLlib
- Spark MLlib algorithms
- Spark Powered Environment
- Spark SQL
- SparkSQL module
- Spark SQL query optimizer
- Spark streaming
- Stochastic Gradient Descent
- streaming app
- building / Building a reliable and scalable streaming app
- Kafka, setting up / Setting up Kafka
- flume, exploring / Exploring flume
- data pipelines, developing with Flume / Developing data pipelines with Flume, Kafka, and Spark
- data pipelines, developing with Kafka / Developing data pipelines with Flume, Kafka, and Spark
- data pipelines, developing with Spark / Developing data pipelines with Flume, Kafka, and Spark
- streaming architecture
- StreamingContext
- supervised machine learning workflow
T
- TCP Sockets
- TF-IDF (Term Frequency - Inverse Document Frequency)
- Trident
- tweets
- Twitter
- Twitter API, on dev console
- Twitter data
- Twitter dataset
U
- Ubuntu 14.04.1 LTS release
- unified log
- Unified Log
- unsupervised machine learning workflow
V
- Vagrant
- VirtualBox VM
- visualization
W
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.