Home Page Icon
Home Page
Table of Contents for
Part 3: Contributions to Unsupervised Classification — Algorithms to Detect the Optimal Number of Clusters
Close
Part 3: Contributions to Unsupervised Classification — Algorithms to Detect the Optimal Number of Clusters
by Wolfgang Minker, Amparo Albalate
Semi-Supervised and Unsupervised Machine Learning: Novel Strategies
Cover
Title Page
Copyright
Part 1: State of the Art
Chapter 1: Introduction
1.1. Organization of the book
1.2. Utterance corpus
1.3. Datasets from the UCI repository
1.3.1. Wine dataset (wine)
1.3.2. Wisconsin breast cancer dataset (breast)
1.3.3. Handwritten digits dataset (Pendig)
1.3.4. Pima Indians diabetes (diabetes)
1.3.5. Iris dataset (Iris)
1.4. Microarray dataset
1.5. Simulated datasets
1.5.1. Mixtures of Gaussians
1.5.2. Spatial datasets with non-homogeneous inter-cluster distance
Chapter 2: State of the Art in Clustering and Semi-Supervised Techniques
2.1. Introduction
2.2. Unsupervised machine learning (clustering)
2.3. A brief history of cluster analysis
2.4. Cluster algorithms
2.4.1. Hierarchical algorithms
2.4.1.1. Agglomerative clustering
2.4.1.1.1. Comparison of agglomerative criteria
2.4.1.2. Divisive algorithms
2.4.2. Model-based clustering
2.4.2.1. The expectation maximization (EM) algorithm
2.4.2.1.1. Example: mixtures of Gaussians
2.4.3. Partitional competitive models
2.4.3.1. K-means
2.4.3.1.1. Advantages and drawbacks
2.4.3.2. Neural gas
2.4.3.2.1. Advantages and drawbacks
2.4.3.3. Partitioning around Medoids (PAM)
2.4.3.3.1. Build step
2.4.3.3.2. Swap phase
2.4.3.3.3. Advantages and drawbacks
2.4.3.4. Self-organizing maps
2.4.3.4.1. Advantages and drawbacks
2.4.4. Density-based clustering
2.4.4.1. Direct density reachability
2.4.4.2. Density reachability
2.4.4.3. Density connection
2.4.4.4. Border points
2.4.4.5. Noise points
2.4.4.6. DBSCAN algorithm
2.4.4.6.1. Advantages and drawbacks
2.4.5. Graph-based clustering
2.4.5.1. Pole-based overlapping clustering
2.4.5.1.1. Definition of a dissimilarity graph
2.4.5.1.2. Pole construction
2.4.5.1.3. Pole restriction
2.4.6. Affectation stage
2.4.6.1. Advantages and drawbacks
2.5. Applications of cluster analysis
2.5.1. Image segmentation
2.5.2. Molecular biology
2.5.2.1. Biological considerations
2.5.3. Information retrieval and document clustering
2.5.3.1. Document pre-processing
2.5.3.1.1. Word selection
2.5.3.1.2. Stop word filtering
2.5.3.1.3. Word lemmatizing/stemming
2.5.3.2. Boolean model representation
2.5.3.3. Vector space model
2.5.3.4. Term weighting
2.5.3.4.1. Term frequency component
2.5.3.4.2. Collection frequency component
2.5.3.4.3. Length normalization component
2.5.3.5. Probabilistic models
2.5.3.5.1. Binary independence retrieval model
2.5.3.5.2. The 2-Poisson model
2.5.3.5.3. Okapi weighting
2.5.4. Clustering documents in information retrieval
2.5.4.1. Clustering of presented results
2.5.4.2. Post-retrieval document browsing (Scatter-Gather)
2.6. Evaluation methods
2.7. Internal cluster evaluation
2.7.1. Entropy
2.7.2. Purity
2.7.3. Normalized mutual information
2.8. External cluster validation
2.8.1. Hartigan
2.8.2. Davies Bouldin index
2.8.3. Krzanowski and Lai index
2.8.4. Silhouette
2.8.5. Gap statistic
2.9. Semi-supervised learning
2.9.1. Self training
2.9.2. Co-training
2.9.3. Generative models
2.10. Summary
Part 2: Approaches to Semi-Supervised Classification
Chapter 3: Semi-Supervised Classification Using Prior Word Clustering
3.1. Introduction
3.2. Dataset
3.3. Utterance classification scheme
3.3.1. Pre-processing
3.3.1.1. Utterance vector representation
3.3.2. Utterance classification
3.4. Semi-supervised approach based on term clustering
3.4.1. Term clustering
3.4.2. Semantic term dissimilarity
3.4.2.1. Term vector of lexical co-occurrences
3.4.2.2. Metric of dissimilarity
3.4.3. Term vector truncation
3.4.4. Term clustering
3.4.5. Feature extraction and utterance feature vector
3.4.6. Evaluation
3.5. Disambiguation
3.5.1. Evaluation
3.6. Summary
Chapter 4: Semi-Supervised Classification Using Pattern Clustering
4.1. Introduction
4.2. New semi-supervised algorithm using the cluster and label strategy
4.2.1. Block diagram
4.2.1.1. Dataset
4.2.1.2. Clustering
4.2.1.3. Optimum cluster labeling
4.2.1.4. Classification
4.3. Optimum cluster labeling
4.3.1. Problem definition
4.3.2. The Hungarian algorithm
4.3.2.1. Weighted complete bipartite graph
4.3.2.2. Matching, perfect matching and maximum weight matching
4.3.2.3. Objective of Hungarian method
4.3.2.4. Complexity considerations
4.3.3. Genetic algorithms
4.3.3.1. Reproduction operators
4.3.3.1.1. Crossover
4.3.3.1.2. Mutation
4.3.3.2. Forming the next generation
4.3.3.2.1. Generational replacement
4.3.3.2.2. Elitism with generational replacement
4.3.3.2.3. Steady state representation
4.3.3.3. GAs applied to optimum cluster labeling
4.3.3.4. Comparison of methods
4.4. Supervised classification block
4.4.1. Support vector machines
4.4.1.1. The kernel trick for nonlinearly separable classes
4.4.1.2. Multi-class classification
4.4.2. Example
4.5. Datasets
4.5.1. Mixtures of Gaussians
4.5.2. Datasets from the UCI repository
4.5.2.1. Iris dataset (Iris)
4.5.2.2. Wine dataset (wine)
4.5.2.3. Wisconsin breast cancer dataset (breast)
4.5.2.4. Handwritten digits dataset (Pendig)
4.5.2.5. Pima Indians diabetes (diabetes)
4.5.3. Utterance dataset
4.6. An analysis of the bounds for the cluster and label approaches
4.7. Extension through cluster pruning
4.7.1. Determination of silhouette thresholds
4.7.2. Evaluation of the cluster pruning approach
4.8. Simulations and results
4.9. Summary
Part 3: Contributions to Unsupervised Classification — Algorithms to Detect the Optimal Number of Clusters
Chapter 5: Detection of the Number of Clusters through Non-Parametric Clustering Algorithms
5.1. Introduction
5.2. New hierarchical pole-based clustering algorithm
5.2.1. Pole-based clustering basis module
5.2.2. Hierarchical pole-based clustering
5.3. Evaluation
5.3.1. Cluster evaluation metrics
5.4. Datasets
5.4.1. Results
5.4.2. Complexity considerations for large databases
5.5. Summary
Chapter 6: Detecting the Number of Clusters through Cluster Validation
6.1. Introduction
6.2. Cluster validation methods
6.2.1. Dunn index
6.2.2. Hartigan
6.2.3. Davies Bouldin index
6.2.4. Krzanowski and Lai index
6.2.5. Silhouette
6.2.6. Hubert’s γ
6.2.7. Gap statistic
6.3. Combination approach based on quantiles
6.4. Datasets
6.4.1. Mixtures of Gaussians
6.4.2. Cancer DNA-microarray dataset
6.4.3. Iris dataset
6.5. Results
6.5.1. Validation results of the five Gaussian dataset
6.5.2. Validation results of the mixture of seven Gaussians
6.5.3. Validation results of the NCI60 dataset
6.5.4. Validation results of the Iris dataset
6.5.5. Discussion
6.6. Application of speech utterances
6.7. Summary
Bibliography
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Chapter 4: Semi-Supervised Classification Using Pattern Clustering
Next
Next Chapter
Chapter 5: Detection of the Number of Clusters through Non-Parametric Clustering Algorithms
P
ART
3
Contributions to Unsupervised Classification — Algorithms to Detect the Optimal Number of Clusters
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset