Previous Chapter

Index

A

access keys
- about / Downloading data from a social network
accuracy
- improving, dictionary used / Improving accuracy using a dictionary
activation function
- about / Artificial neural networks
Adult dataset
- URL / Representing reality in models
Advertisements dataset
- URL / Feature creation
affinity analysis
- example / A simple affinity analysis example
- defining / What is affinity analysis?
- product recommendations / Product recommendations
- dataset, loading with NumPy / Loading the dataset with NumPy
- ranking of rules, implementing / Implementing a simple ranking of rules
- ranking, to find best rules / Ranking to find the best rules
- about / Affinity analysis
- algorithms / Algorithms for affinity analysis
- parameters, selecting / Choosing parameters
Amazon S3 console
- URL / Training on Amazon's EMR infrastructure
API endpoint
- URL / Using a Web API to get data
application
- defining / Application
- word counts, extracting / Extracting word counts
- dictionaries, converting to matrix / Converting dictionaries to a matrix
- Naive Bayes classifier, training / Training the Naive Bayes classifier
- about / Application, Application
- data, obtaining / Getting the data, Getting the data
- neural network, creating / Creating the neural network
- neural network, training with training dataset / Putting it all together
- Naive Bayes algorithm / Naive Bayes prediction
apps, Twitter account
- URL / Downloading data from a social network
Apriori algorithm / The Apriori algorithm
Apriori implementation
- about / The Apriori implementation
- Apriori algorithm / The Apriori algorithm
- defining / Implementation
arbitrary websites
- text, extracting from / Extracting text from arbitrary websites, Putting it all together
- stories, finding / Finding the stories in arbitrary websites
- data mining, using / Putting it all together
- nodes, ignoring / Putting it all together
- HTML file, parsing / Putting it all together
Artificial Neural Networks
- about / Artificial neural networks
association rules
- extracting / Extracting association rules
- evaluating / Evaluation
authorship analysis
- defining / Attributing documents to authors
- applications / Applications and use cases
- use cases / Applications and use cases
- about / Applications and use cases
- authorship attribution / Attributing authorship
- data, obtaining / Getting the data
authorship analysis, problems
- authorship profiling / Attributing documents to authors
- authorship verification / Attributing documents to authors
- authorship clustering / Attributing documents to authors
authorship attribution / Attributing authorship
AWS CLI
- installing / Training on Amazon's EMR infrastructure
AWS console
- URL / Running our code on a GPU

B

back propagation (backprop) algorithm / Back propagation
bagging
- about / Random forests
BatchIterator instance
- creating / Creating the neural network
Bayes' theorem / Bayes' theorem
- about / Bayes' theorem
- equation / Bayes' theorem
bias
- about / How do ensembles work?
big data
- about / Big data
- use cases / Application scenario and goals
Bleeding Edge code
- installing / Scalability with the nearest neighbor
- URL / Scalability with the nearest neighbor
blog posts
- extracting / Extracting the blog posts
blogs dataset
- about / Blogs dataset

C

CAPTCHA
- creating / Drawing basic CAPTCHAs
CAPTCHAs
- references / Better (worse?) CAPTCHAs
- defining / Better (worse?) CAPTCHAs
CART (Classification and Regression Trees)
- about / Decision trees
character n-grams
- about / Character n-grams
- extracting / Extracting character n-grams
CIFAR-10
- about / Application scenario and goals
- URL / Application scenario and goals
class
- about / A simple classification example
classification
- example / A simple classification example
- about / What is classification?
- examples / What is classification?
- dataset, loading / Loading and preparing the dataset
- dataset, preparing / Loading and preparing the dataset
- OneR algorithm, implementing / Implementing the OneR algorithm
- algorithm, testing / Testing the algorithm
classifiers
- comparing / Comparing classifiers
closed problem
- about / Attributing authorship
cluster evaluation
- URL / Evaluating the results
clustering
- about / Grouping news articles
coassociation matrix
- defining / Evidence accumulation
complex algorithms
- references / More complex algorithms
complex features
- references / More complex features
confidence
- about / Implementing a simple ranking of rules
- computing / Implementing a simple ranking of rules
connected components
- about / Connected components
Cosine distance
- about / Distance metrics
Coursera
- about / More resources
- references / More resources
Coval font I, Open Font Library
- URL / Drawing basic CAPTCHAs
CPU
- defining / When to use GPUs for computation
cross-fold validation framework
- defining / Running the algorithm
CSV (Comma Separated Values)
- about / Collecting the data

D

data, blogging
- URL / Getting the data
data, Corpus
- URL / Getting the data
Dataframe
- about / Using pandas to load the dataset
data mining
- defining / Introducing data mining
dataset
- loading / Loading the dataset, Loading the dataset, An introduction to Lasagne
- data, collecting / Collecting the data
- URL / Collecting the data
- loading, pandas used / Using pandas to load the dataset
- cleaning up / Cleaning up the dataset
- new features, extracting / Extracting new features
- classifying, with existing model / Classifying with an existing model
- follower information, obtaining from Twitter / Getting follower information from Twitter
- network, building / Building the network
- graph, creating / Creating a graph
- Similarity graph, creating / Creating a similarity graph
- creating / Creating the dataset
- CAPTCHAs, drawing / Drawing basic CAPTCHAs
- image, splitting into individual letters / Splitting the image into individual letters
- training dataset, creating / Creating a training dataset
- training dataset, adjusting to methodology / Adjusting our training dataset to our methodology
datasets
- about / Introducing data mining
- samples / Introducing data mining
- features / Introducing data mining
- example / Introducing data mining
- URL / Obtaining the dataset, Extending the IPython Notebook
- references / New datasets
decision tree implementation
- min_samples_split / Parameters in decision trees
- min_samples_leaf / Parameters in decision trees
decision trees
- about / Decision trees
- parameters / Parameters in decision trees
- Gini impurity / Parameters in decision trees
- Information gain / Parameters in decision trees
- using / Using decision trees
dictionary
- used, for improving accuracy / Improving accuracy using a dictionary
- ranking mechanisms, for words / Ranking mechanisms for words
- improved prediction function, testing / Putting it all together
DictVectorizer class
- about / Converting dictionaries to a matrix
disambiguation
- about / Disambiguation
- data, downloading from social network / Downloading data from a social network
- dataset, loading / Loading and classifying the dataset
- dataset, classifying / Loading and classifying the dataset
- replicable dataset, creating from Twitter / Creating a replicable dataset from Twitter
discretization
- about / Common feature patterns
discretization algorithm
- defining / Loading and preparing the dataset
documents
- attributing, to authors / Attributing documents to authors

E

EC2 service console
- URL / Running our code on a GPU
Eclat algorithm
- about / Algorithms for affinity analysis
- URL / The Eclat algorithm
- implementing / The Eclat algorithm
Elastic Map Reduce (EMR)
- about / Training on Amazon's EMR infrastructure
Enron dataset
- using / Using the Enron dataset
- accessing / Accessing the Enron dataset
- URL / Accessing the Enron dataset
- dataset loader, creating / Creating a dataset loader
- existing parameter space, using / Putting it all together
- classifier, using / Putting it all together
- evaluation / Evaluation
ensembles
- clustering / Clustering ensembles
- evidence accumulation / Evidence accumulation
- working / How it works
- implementing / Implementation
environment
- setting up / Setting up the environment
epochs
- about / Back propagation
Euclidean distance
- about / Distance metrics
evaluation, of clustering algorithms
- references / Evaluation
Evidence Accumulation Clustering (EAC)
- about / Evidence accumulation
- defining / Evidence accumulation
Excel, pandas
- URL / More on pandas

F

f1-score
- about / Evaluation using the F1-score
- computing / Evaluation using the F1-score
- using / Evaluation using the F1-score
feature-based normalization
- about / Standard preprocessing
feature creation
- about / Feature creation
- Principal Component Analysis (PCA) / Principal Component Analysis
feature extraction
- about / Feature extraction
- reality, representing in models / Representing reality in models
- common feature patterns / Common feature patterns
- good features, creating / Creating good features
features, dataset
- URL / More complex pipelines
feature selection
- about / Feature selection
- best individual features, selecting / Selecting the best individual features
feed-forward neural network
- about / An introduction to neural networks
filename, data
- Blogger ID / Getting the data
- Gender / Getting the data
- Age / Getting the data
- Industry / Getting the data
- Star Sign / Getting the data
FP-growth algorithm
- about / Algorithms for affinity analysis
frequent itemsets
- about / Algorithms for affinity analysis
functions, transformer
- fit() / The transformer API
- transform() / The transformer API
function words
- about / Function words
- counting / Counting function words
- classifying with / Classifying with function words

G

GPU
- using, for computation / When to use GPUs for computation
- benefits / When to use GPUs for computation
- avenues, defining / When to use GPUs for computation
- code, running on / Running our code on a GPU
GPU optimization
- about / GPU optimization
graph
- creating / Creating a graph
gzip
- about / Accessing the Enron dataset

H

Hadoop
- about / Hadoop MapReduce
- Distributed File System (HDFS) / Hadoop MapReduce
- YARN / Hadoop MapReduce
- Pig / Hadoop MapReduce
- Hive / Hadoop MapReduce
- HBase / Hadoop MapReduce
- courses / Courses on Hadoop
Hadoop MapReduce
- about / Hadoop MapReduce
hash function
- about / Finding the stories in arbitrary websites
hidden layer
- about / An introduction to neural networks
- creating / An introduction to Lasagne
hierarchical clustering
- about / Evidence accumulation

I

image
- extracting / Application scenario and goals
image datasets
- URL / Mahotas
input layer
- about / An introduction to neural networks
installation instructions, scikit-learn
- URL / Installing scikit-learn
instructions, AWS CLI
- URL / Training on Amazon's EMR infrastructure
intra-cluster distance
- about / Optimizing criteria
Ionosphere
- about / Loading the dataset
- URL / Loading the dataset
Ionosphere Nearest Neighbor
- about / Loading the dataset
IPython
- installing / Installing IPython
- URL / Installing IPython
IPython Notebook
- creating / Downloading data from a social network
- URL / Extending the IPython Notebook
IPython notebook
- using / Using Python and the IPython Notebook
Iris Setosa / Loading and preparing the dataset
Iris Versicolour / Loading and preparing the dataset
Iris Virginica / Loading and preparing the dataset

J

Jaccard Similarity
- about / Creating a similarity graph
JQuery library
- about / Loading and classifying the dataset
JSON
- about / Loading and classifying the dataset
- and dataset, comparing / Loading and classifying the dataset

K

k-means algorithm
- about / The k-means algorithm
- assignment phase / The k-means algorithm
- updating phase / The k-means algorithm
Kaggle
- URL / More resources
- about / More resources
karma
- about / Reddit as a data source
Keras
- URL / Keras and Pylearn2
kernel
- about / Loading and classifying the dataset
kernel parameter
- about / Kernels
kernels / Kernels

L

Lasagne
- about / An introduction to Lasagne
- URL / An introduction to Lasagne
Levenshtein edit distance
- about / Ranking mechanisms for words
- computing / Ranking mechanisms for words
Locality-Sensitive Hashing (LSH)
- about / Scalability with the nearest neighbor
local n-grams
- references / Local n-grams
- about / Local n-grams
local optima
- about / Back propagation
log probabilities
- using / Putting it all together

M

machine-learning workflow
- training / Testing the algorithm
- testing / Testing the algorithm
Mahotas
- about / Mahotas
- references / Mahotas
Manhattan distance
- about / Distance metrics
MapReduce
- about / MapReduce
- defining / Intuition
- WordCount example / A word count example
- Hadoop MapReduce / Hadoop MapReduce
matplotlib
- URL / scikit-learn estimators
MD5 algorithm
- using / Finding the stories in arbitrary websites
metadata
- about / Disambiguation
MiniBatchKMeans
- about / Implementation
Minimum Spanning Tree (MST)
- about / Evidence accumulation
- computing / Evidence accumulation
movie recommendation problem
- about / The movie recommendation problem
- dataset, obtaining / Obtaining the dataset
- loading, with pandas / Loading with pandas
- sparse data formats / Sparse data formats
mrjob
- URL / Training on Amazon's EMR infrastructure
mrjob package / The mrjob package
multiple SVMs
- creating / Classifying with SVMs

N

n-gram
- about / Character n-grams
n-grams
- about / N-grams
- disadvantages / N-grams
- advantages / N-grams
Naive Bayes
- about / Naive Bayes
- Bayes' theorem / Bayes' theorem
- algorithm / Naive Bayes algorithm
- working / How it works
Naive Bayes algorithm
- mrjob package / The mrjob package
- blog posts, extracting / Extracting the blog posts
- Naive Bayes model, training / Training Naive Bayes
- classifier, running / Putting it all together
- Amazon's EMR infrastructure, training / Training on Amazon's EMR infrastructure
Naive Bayes model
- training / Training Naive Bayes
NaN (Not a Number)
- about / Feature creation
National Basketball Association (NBA)
- about / Loading the dataset
- URL / Collecting the data
Natural Language ToolKit (NLTK)
- about / Bag-of-words
nearest neighbor
- about / scikit-learn estimators
nearest neighbor algorithm
- URL / Scalability with the nearest neighbor
Nearest neighbors
- about / Nearest neighbors
network
- building / Building the network
networks
- defining / Deeper networks
NetworkX
- URL / Creating a similarity graph, NetworkX
- defining / NetworkX
NetworkX package
- about / Creating a graph
neural network
- training / Training and classifying
- classifying / Training and classifying
- back propagation (backprop) algorithm / Back propagation
- words, predicting / Predicting words
neural network layers, Lasagne
- network-in-network layers / An introduction to Lasagne
- dropout layers / An introduction to Lasagne
- noise layers / An introduction to Lasagne
Neural networks
- about / An introduction to neural networks
neural networks
- about / scikit-learn estimators, Artificial neural networks, Deep neural networks
- training / Deep neural networks
- defining / Intuition
- implementing / Implementation
- Theano, defining / An introduction to Theano
- Lasagne, defining / An introduction to Lasagne
- implementing, with nolearn / Implementing neural networks with nolearn
- URL / More resources
neurons
- about / Artificial neural networks
news articles
- obtaining / Obtaining news articles
- web API used, for obtaining data / Using a Web API to get data
- Reddit, as data source / Reddit as a data source
- data, obtaining / Getting the data
- clustering / Grouping news articles
- k-means algorithm / The k-means algorithm
- results, evaluating / Evaluating the results
- topic information, extracting from clusters / Extracting topic information from clusters
- clustering algorithms, using as transformers / Using clustering algorithms as transformers
NLTK
- references / Natural language processing and part-of-speech tagging
NLTK installation instructions
- URL / Application
noise
- adding / Adding noise
nolearn package
- neural networks, implementing with / Implementing neural networks with nolearn
nonprogrammers, for Python language
- URL / Installing Python
n_neighbors
- about / Setting parameters

O

object classification
- about / Object classification
one-versus-all classifier
- creating / Classifying with SVMs
OneR
- about / Implementing the OneR algorithm
online learning
- about / Online learning
- defining / An introduction to online learning
- implementing / Implementation
ordinal
- about / Common feature patterns
output layer
- about / An introduction to neural networks
overfitting
- about / Testing the algorithm

P

pagination
- about / Getting follower information from Twitter
pandas
- URL / Collecting the data, More on pandas
- references / More on pandas
pandas (Python Data Analysis)
- about / Collecting the data
pandas.read_csv function
- about / Cleaning up the dataset
pandas documentation
- URL / Engineering new features
parameters, ensemble process
- n_estimators / Parameters in Random forests
- oob_score / Parameters in Random forests
- n_jobs / Parameters in Random forests
petal length / Loading and preparing the dataset
petal width / Loading and preparing the dataset
pip
- about / Installing Python, Creating a graph
Pipeline
- creating / Putting it all together
pipeline
- creating / Application
- NLTKBOW transformer / Putting it all together
- DictVectorizer transformer / Putting it all together
- BernoulliNB classifier / Putting it all together
pipelines
- about / Pipelines
Pipelines
- URL / More complex pipelines
precision
- about / Evaluation using the F1-score
preprocessing, using pipelines
- about / Preprocessing using pipelines
- features / Preprocessing using pipelines
- features, of animal / Preprocessing using pipelines
- example / An example
- standard preprocessing / Standard preprocessing
- workflow, creating / Putting it all together
pricing alerts
- URL / Training on Amazon's EMR infrastructure
Principal Component Analysis (PCA)
- about / Principal Component Analysis
prior belief
- about / Bayes' theorem
probabilistic graphical models
- URL / More resources
probabilities
- computing / Putting it all together
programmers, for Python language
- URL / Installing Python
Project Gutenberg
- URL / Getting the data
Pydoop
- about / Pydoop
- URL / Pydoop
Pylearn2
- about / Keras and Pylearn2
- URL / Keras and Pylearn2
Python
- using / Using Python and the IPython Notebook
- installing / Installing Python
- URL / Installing Python
- defining / Disambiguation
Python 3.4
- about / Installing Python

Q

quotequail package
- about / Creating a dataset loader

R

RandomForestClassifier
- about / Parameters in Random forests
random forests
- about / scikit-learn estimators
- defining / Random forests
- ensembles, working / How do ensembles work?
- parameters / Parameters in Random forests
- applying / Applying Random forests
- new features, engineering / Engineering new features
README
- about / Extracting association rules
real-time clusterings
- about / Real-time clusterings
reasons, feature selection
- complexity, reducing / Feature selection
- noise, reducing / Feature selection
- readable models, creating / Feature selection
recall
- about / Evaluation using the F1-score
recommendation engine
- building / Recommendation engine
- URL / Recommendation engine
reddit
- about / Obtaining news articles, Using a Web API to get data
- references / Using a Web API to get data
Reddit
- about / Reddit as a data source
- URL / Reddit as a data source
regularization
- URL / Principal Component Analysis
reinforcement learning
- URL / Reinforcement learning
RESTful interface (Representational State Transfer)
- about / Using a Web API to get data
rules
- support / Implementing a simple ranking of rules
- confidence / Implementing a simple ranking of rules
- finding / Ranking to find the best rules

S

sample size
- increasing / Increasing the sample size
scikit-learn
- installing / Installing scikit-learn
- URL / Installing scikit-learn
scikit-learn estimators
- defining / scikit-learn estimators
- fit() / scikit-learn estimators
- predict() / scikit-learn estimators
- Nearest neighbors / Nearest neighbors
- distance metrics / Distance metrics
- dataset, loading / Loading the dataset
- standard workflow, defining / Moving towards a standard workflow
- fit() function / Moving towards a standard workflow
- predict() function / Moving towards a standard workflow
- algorithm, running / Running the algorithm
- parameters, setting / Setting parameters
scikit-learn package
- references / Evaluation
Scikit-learn tutorials
- URL / Scikit-learn tutorials
self-posts
- about / Reddit as a data source
sepal length / Loading and preparing the dataset
sepal width / Loading and preparing the dataset
shapes adding, CAPTCHAs
- URL / Better (worse?) CAPTCHAs
Silhouette Coefficient
- about / Optimizing criteria
- computing / Optimizing criteria
- parameters / Optimizing criteria
Similarity graph
- creating / Creating a similarity graph
SNAP
- URL / NetworkX
softmax nonlinearity
- about / An introduction to Lasagne
Spam detection
- references / Spam detection
spam filter
- about / Evaluation using the F1-score
sparse matrix
- about / Distance metrics
sparse matrix format
- about / Sparse data formats
sports outcome prediction
- about / Sports outcome prediction
- features / Sports outcome prediction
stacking
- about / Putting it all together
StackOverflow question
- URL / More on pandas
standings
- loading / Putting it all together
standings data
- obtaining / Putting it all together
- URL / Putting it all together
Stratified K Fold
- about / Running the algorithm
style sheets
- about / Extracting text from arbitrary websites
stylometry
- about / Attributing documents to authors
subgraphs
- finding / Finding subgraphs
- connected components / Connected components
- criteria, optimizing / Optimizing criteria
subreddits
- about / Obtaining news articles, Reddit as a data source
support / Implementing a simple ranking of rules
support vector machines (SVM)
- about / scikit-learn estimators
SVMs
- about / Support vector machines
- URL / Support vector machines
- classifying with / Classifying with SVMs
- kernels / Kernels
system
- building, for taking image as input / Application scenario and goals

T

temporal analysis
- about / Temporal analysis
text
- about / Disambiguation
- extracting, from arbitrary websites / Extracting text from arbitrary websites
text transformers
- defining / Text transformers
- word, counting in dataset / Bag-of-words
- bag-of-words model / Bag-of-words
- n-grams / N-grams
- features / Other features
tf-idf
- about / Bag-of-words
Theano
- about / An introduction to Theano
- using / An introduction to Theano
- URL / Running our code on a GPU
Torch
- URL / Keras and Pylearn2
train_feature_value() function
- about / Implementing the OneR algorithm
transformer
- creating / Creating your own transformer
- API / The transformer API
- implementing / Implementation details
- unit testing / Unit testing
tutorial, Google
- URL / Courses on Hadoop
tutorial, Yahoo
- URL / Courses on Hadoop
tweet
- about / Disambiguation
tweets
- loading / Putting it all together
- F1-score, used for evaluation / Evaluation using the F1-score
- features, obtaining from models / Getting useful features from models
Twitter
- follower information, obtaining from / Getting follower information from Twitter
Twitter account
- URL / Downloading data from a social network
twitter documentation
- URL / Downloading data from a social network

U

UCL Machine Learning data repository
- URL / Loading the dataset
univariate feature
- about / Selecting the best individual features
unstructured format
- about / Disambiguation
use cases, computer vision
- about / Use cases

V

V's, big data
- volume / Big data
- velocity / Big data
- variety / Big data
- veracity / Big data
variance
- about / How do ensembles work?, Principal Component Analysis
virtualenv
- URL / Setting up the environment, Scalability with the nearest neighbor
vocabulary
- about / Counting function words
Vowpal Wabbit
- about / Vowpal Wabbit
- URL / Vowpal Wabbit

W

web-based API, considerations
- authorization methods / Using a Web API to get data
- rate limiting / Using a Web API to get data
- API Endpoints / Using a Web API to get data
weight
- about / An introduction to neural networks
weighted edge
- about / Creating a similarity graph

Z

7-zip
- URL / Accessing the Enron dataset

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Index

Create new playlist

Sign In

Sign Up

Index

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

Z

Table of Contents for
Index