Index
A
- access keys
- accuracy
- activation function
- Adult dataset
- Advertisements dataset
- affinity analysis
- Amazon S3 console
- API endpoint
- application
- apps, Twitter account
- Apriori algorithm / The Apriori algorithm
- Apriori implementation
- arbitrary websites
- Artificial Neural Networks
- association rules
- authorship analysis
- authorship analysis, problems
- authorship attribution / Attributing authorship
- AWS CLI
- AWS console
B
- back propagation (backprop) algorithm / Back propagation
- bagging
- BatchIterator instance
- Bayes' theorem / Bayes' theorem
- bias
- big data
- Bleeding Edge code
- blog posts
- blogs dataset
C
- CAPTCHA
- CAPTCHAs
- CART (Classification and Regression Trees)
- character n-grams
- CIFAR-10
- class
- classification
- classifiers
- closed problem
- cluster evaluation
- clustering
- coassociation matrix
- complex algorithms
- complex features
- confidence
- connected components
- Cosine distance
- Coursera
- Coval font I, Open Font Library
- CPU
- cross-fold validation framework
- CSV (Comma Separated Values)
D
- data, blogging
- data, Corpus
- Dataframe
- data mining
- dataset
- loading / Loading the dataset, Loading the dataset, An introduction to Lasagne
- data, collecting / Collecting the data
- URL / Collecting the data
- loading, pandas used / Using pandas to load the dataset
- cleaning up / Cleaning up the dataset
- new features, extracting / Extracting new features
- classifying, with existing model / Classifying with an existing model
- follower information, obtaining from Twitter / Getting follower information from Twitter
- network, building / Building the network
- graph, creating / Creating a graph
- Similarity graph, creating / Creating a similarity graph
- creating / Creating the dataset
- CAPTCHAs, drawing / Drawing basic CAPTCHAs
- image, splitting into individual letters / Splitting the image into individual letters
- training dataset, creating / Creating a training dataset
- training dataset, adjusting to methodology / Adjusting our training dataset to our methodology
- datasets
- decision tree implementation
- decision trees
- dictionary
- DictVectorizer class
- disambiguation
- discretization
- discretization algorithm
- documents
E
- EC2 service console
- Eclat algorithm
- Elastic Map Reduce (EMR)
- Enron dataset
- ensembles
- environment
- epochs
- Euclidean distance
- evaluation, of clustering algorithms
- Evidence Accumulation Clustering (EAC)
- Excel, pandas
F
- f1-score
- feature-based normalization
- feature creation
- feature extraction
- features, dataset
- feature selection
- feed-forward neural network
- filename, data
- FP-growth algorithm
- frequent itemsets
- functions, transformer
- function words
G
- GPU
- GPU optimization
- graph
- gzip
H
- Hadoop
- Hadoop MapReduce
- hash function
- hidden layer
- hierarchical clustering
I
J
- Jaccard Similarity
- JQuery library
- JSON
K
- k-means algorithm
- Kaggle
- karma
- Keras
- kernel
- kernel parameter
- kernels / Kernels
L
- Lasagne
- Levenshtein edit distance
- Locality-Sensitive Hashing (LSH)
- local n-grams
- local optima
- log probabilities
M
- machine-learning workflow
- Mahotas
- Manhattan distance
- MapReduce
- matplotlib
- MD5 algorithm
- metadata
- MiniBatchKMeans
- Minimum Spanning Tree (MST)
- movie recommendation problem
- mrjob
- mrjob package / The mrjob package
- multiple SVMs
N
- n-gram
- n-grams
- Naive Bayes
- Naive Bayes algorithm
- Naive Bayes model
- NaN (Not a Number)
- National Basketball Association (NBA)
- Natural Language ToolKit (NLTK)
- nearest neighbor
- nearest neighbor algorithm
- Nearest neighbors
- network
- networks
- NetworkX
- NetworkX package
- neural network
- neural network layers, Lasagne
- Neural networks
- neural networks
- neurons
- news articles
- NLTK
- NLTK installation instructions
- noise
- nolearn package
- nonprogrammers, for Python language
- n_neighbors
O
- object classification
- one-versus-all classifier
- OneR
- online learning
- ordinal
- output layer
- overfitting
P
- pagination
- pandas
- pandas (Python Data Analysis)
- pandas.read_csv function
- pandas documentation
- parameters, ensemble process
- petal length / Loading and preparing the dataset
- petal width / Loading and preparing the dataset
- pip
- Pipeline
- pipeline
- pipelines
- Pipelines
- precision
- preprocessing, using pipelines
- pricing alerts
- Principal Component Analysis (PCA)
- prior belief
- probabilistic graphical models
- probabilities
- programmers, for Python language
- Project Gutenberg
- Pydoop
- Pylearn2
- Python
- Python 3.4
Q
R
- RandomForestClassifier
- random forests
- README
- real-time clusterings
- reasons, feature selection
- recall
- recommendation engine
- reddit
- Reddit
- regularization
- reinforcement learning
- RESTful interface (Representational State Transfer)
- rules
S
- sample size
- scikit-learn
- scikit-learn estimators
- scikit-learn package
- Scikit-learn tutorials
- self-posts
- sepal length / Loading and preparing the dataset
- sepal width / Loading and preparing the dataset
- shapes adding, CAPTCHAs
- Silhouette Coefficient
- Similarity graph
- SNAP
- softmax nonlinearity
- Spam detection
- spam filter
- sparse matrix
- sparse matrix format
- sports outcome prediction
- stacking
- StackOverflow question
- standings
- standings data
- Stratified K Fold
- style sheets
- stylometry
- subgraphs
- subreddits
- support / Implementing a simple ranking of rules
- support vector machines (SVM)
- SVMs
- system
T
- temporal analysis
- text
- text transformers
- tf-idf
- Theano
- Torch
- train_feature_value() function
- transformer
- tutorial, Google
- tutorial, Yahoo
- tweet
- tweets
- Twitter
- Twitter account
- twitter documentation
U
- UCL Machine Learning data repository
- univariate feature
- unstructured format
- use cases, computer vision
V
- V's, big data
- variance
- virtualenv
- vocabulary
- Vowpal Wabbit
W
- web-based API, considerations
- weight
- weighted edge
Z
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.