Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Families of algorithms

There are tons of algorithms in the machine learning universe and more are devised each year. There is tremendous research happening in this space and hence the ever increasing list of algorithms. It is also a fact that the more these algorithms are being used, the more improvements in them are being discovered. Machine learning is one space where industry and academia are running hand in hand.

But, as Spider-Man was told that with great power comes great responsibility, the reader should also understand the responsibility at hand. With so many algorithms available, it is necessary to understand what they are and where they fit. It can feel overwhelming and confusing at first but that is when categorizing them into families helps.

Machine learning algorithms can be categorized in many ways. The most common way is to group them into supervised learning algorithms and unsupervised learning algorithms.

Supervised learning algorithms

Supervised learning refers to algorithms which are trained on a predefined data set called the training data set. The training data set is usually a two element tuple consisting of an input element and a desired output element or signal. In general, the input element is a vector. The supervised learning algorithm uses the training data set to produce the desired function. The function so produced (or rather inferred) is then utilized to correctly map new data, better termed as the test data.

An algorithm which has learnt well will be able to correctly determine the outputs for unseen data in a reasonable way. This brings in the concepts of generalization and overfitting.

Briefly, generalization refers to the concept wherein an algorithm generalizes the desired function based upon the (limited) training data to handle unseen data in a correct manner. Overfitting is exactly the opposite concept of generalization, wherein an algorithm infers a function such that it maps exactly to the training data set (including the noise). This may result in huge errors when the function learnt by the algorithm is checked against new/unseen data.

Both generalization and overfitting revolve around the random errors or noise in the input data. While generalization tries to minimize the effect of noise, overfitting does the opposite by fitting in noise as well.

Problems to be solved using supervised methods can be divided into the following steps:

Prepare training data: Data preparation is the most important step for all machine learning algorithms. Since supervised learning utilizes labeled input data sets (data sets which consist of corresponding outputs for given inputs), this step becomes even more important. This data is usually labeled by human experts or from measurements.
Prepare the model: The model is the representation of the input data set and the learnt pattern. The model representation is affected by factors such as input features and the learning algorithm itself. The accuracy of the inferred function also depends on how this representation is formed.
Choose an algorithm: Based on the problem being solved and the input information, an algorithm is then chosen to learn and solve the problem.
Examine and fine tune: This is an iterative step where the algorithm is run on the input data set and the parameters are fine tuned to achieve the desired level of output. The algorithm is then tested on the test data set to evaluate its performance and measure the error.

Under supervised learning, two major sub categories are:

Regression based machine learning: Learning algorithms which help us answer quantitative questions such as how many? or how much? The outputs are generally continuous values. More formally, these algorithms predict the output values for unseen/new data based on the training data and the model formed. The output values are continuous in this case. Linear regression, multivariate regression, regression trees, and so on are a few supervised regression algorithms.
Classification based machine learning: Learning algorithms which help us answer objective questions or yes-or-no predictions. For example, questions such as is this component faulty? or can this tumor cause cancer? More formally, these algorithms predict the class labels for unseen or new data based upon the training data and model formed. Support Vector Machines (SVM), decision trees, random forests, and so on are a few commonly used supervised classification algorithms.

Let us look at some supervised learning algorithms in detail.

Linear regression

Regression, as mentioned previously, helps us answer quantitative questions. Regression has its roots in the statistics domain. Researchers use a linear relationship to predict the output value Y for a given input value X. This linear relationship is called a linear regression or regression line.

Mathematically, linear regression is represented as:

Where, b₀ is the intercept or the point where the line crosses the y axis.

b₁ is the slope of the line, that is, the change in y over change in x.

The preceding equation is pretty similar to how a straight line is represented and hence the name linear regression.

Now, how do we decide which line we fit for our input so that it predicts well for unknown data? Well, for this we need an error measure. There can be various error measures; the most commonly used is the least squares method.

Before we define the least squares method, we first need to understand the term residual. Residual is simply the deviation of Y from a fitted value. Mathematically:

Where, ŷ_i is the deviated value of y.

The least squares method states that the most optimal fit of a model to data occurs when the sum of the squares of residuals is minimum.

Mathematically:

We use calculus to minimize the sum of squares for residuals and find the corresponding coefficients.

Now that we understand linear regression, let us take a real world example to see it in action.

Suppose we have data related to the height and weight of school children. The data scientist in you suddenly starts thinking about whether there is any relation between the weights and heights of these children. Formally, could the weight of a child be predicted based upon his/her given height?

To fit in linear regression, the first step is to understand the data and see whether there is a correlation between the two variables (weight and height). Since in this case we are dealing with just two dimensions, visualizing the data using a scatter plot will help us understand it quickly. This will also enable us to determine if the variables have some linear relationship or not.

Let us prepare our data first and visualize it on a scatter plot along with the correlation coefficient.

#Height and weight vectors for 19 children

height <- c(69.1,56.4,65.3,62.8,63,57.3,59.8,62.5,62.5,59.0,51.3,64,56.4,66.5,72.2,65.0,67.0,57.6,66.6)

weight <- c(113,84,99,103,102,83,85,113,84,99,51,90,77,112,150,128,133,85,112)

plot(height,weight)
cor(height,weight)

Output:

[1] 0.8848454

The scatter plot looks like this:

Figure displaying data points across weight and height dimensions

The preceding scatter plot proves that our intuition about weight and height having a linear relationship was correct. This can further be confirmed using the correlation function, which gives us a value of 0.88.

Time to prepare the model for our data set! We use the inbuilt utility lm or the linear model utility to find the coefficients b₀ and b₁.

#Fitting the linear model

model <- lm(weight ~ height) # weight = slope*weight + intercept

#get the intercept(b0) and the slope(b1) values
model

The output looks like this:

You can experiment a bit more to find out more details calculated by the lm utility using the following commands. We encourage you to go ahead and try these out.

#check all attributes calculated by lm
attributes(model)

#getting only the intercept
model$coefficients[1] #or model$coefficients[[1]]

#getting only the slope
model$coefficients[2] #or model$coefficients[[2]]

#checking the residuals
residuals(model)

#predicting the weight for a given height, say 60 inches
model$coefficients[[2]]*50 + model$coefficients[[1]]

#detailed information about the model
summary(model)

As the final piece, let us visualize the regression line on our scatter plot itself.

#plot data points
 plot(height,weight)

#draw the regression line
abline(model)

The scatter plot looks like the following figure:

Scatter plot with regression calculated regression line

Thus, we saw how the relationship between two variables can be identified and predictions can be made using a few lines of code. But we are not done yet. There are a couple of caveats which the reader must understand before deciding whether to use linear regression or not.

Linear regression can be used to predict the output values for given inputs, if and only if:

The scatter plot forms a linear pattern
The correlation between them is moderate to strong (beyond 0.5 or -0.5)

Cases where only one of the previous two conditions are met can lead to incorrect predictions or altogether invalid models. For example, if we only check the correlation and find it to be strong and skip the step where we look at the scatter plot, then that may lead to invalid predictions as you might have tried to fit a straight line when the data itself is following a curved shape (note that curved data sets can also have high correlation values and hence the mistake).

It is important to remember that correlation does not imply causation. In simple words, correlation between two variables does not necessarily imply that one causes the other. There might be a case that cause and effect are indirectly related due to a third variable termed as the cofounding variable. The most common example used to describe this issue is the relationship between shoe size and reading ability. From the survey data (if one existed!) it could be inferred that larger shoe sizes relate to higher reading ability, but this clearly does not mean that large feet cause good reading skills to be developed. It might be rather interesting to note that young children have small feet and have not yet been taught to read. In such a case, the two variables are more accurately correlated with age.

You should now be coming to similar conclusion about the weight to height example we used earlier. Well yes, the earlier example also suffers from similar fallacy, yet it served as an easy to use scenario. Feel free to look around and model cases which do not suffer from this.

Linear regression finds application in the field of finance where it is used for things such as quantification of risks on investments. It is also used widely in the field of economics for trendline analysis and so on.

Apart from linear regression, logistic regression, stepwise regression, Multivariate Adaptive Regression Splines (MARS), and others are a few more supervised regression learning algorithms.

K-Nearest Neighbors (KNN)

K-Nearest Neighbors or KNN algorithms are amongst the simplest algorithms from an implementation and understanding point of view. They are yet another type of supervised learning algorithm which helps us classify data.

KNN can be easily described using the quote like and like strike together—Plato, that is, similar things are likely have similar properties. KNN utilizes this very concept to label a data point based upon its similarity with respect to its neighbors.

Formally, KNN can be described as the process to classify unlabeled (or unseen) data points by assigning them the class of the most similar labeled data points (or training samples).

KNN is a supervised learning algorithm. Hence, it begins with a training data set of examples which are classified into different classes. The algorithm then picks each data point in the test data set and, based upon a chosen similarity measure, identifies its k nearest neighbors (where k is specified in advance). The data point is then assigned the class of the majority of the k nearest neighbors.

The trick up the KNN's sleeve is the similarity measure. There are various similarity measures available to us. The decision to choose one is based on the complexity of the problem, the type of data, and so on. Euclidean distance is one such measure which is widely used. Euclidean distance is the shortest direct route between two points. Mathematically it is given as:

Manhattan distance, Cosine distance, and Minkowski distance are some other types of distance measures which can be used for finding the nearest neighbors.

The next parameter for KNN algorithm is the k in the K-Nearest Neighbors. The value of k determines how well the KNN model generalizes to the test data. The balance between overfitting and underfitting the training data depends upon the value of k. With slight deliberation, it is easy to understand that a large k will minimize the impact of variance caused by noisy data, but at the same time, it will also undermine small but important patterns in the data. This problem is called the bias-variance tradeoff.

The optimal value of k, even though hard to determine, lies between the extremes of k=1 to k=total number of training samples. A common practice is to set the value of k equal to the square root of training instances, usually between 3 and 10. Though a common practice, the value of k is dependent on the complexity of the concept to be learned and the number of training examples.

The next step in pursuit of the KNN algorithm is preparation of data. Features used to prepare the input vectors should be on similar scales. The rationale for this step is that the distance formula is dependent on how the features are measured. For example, if certain features have large range of values as compared to others, the distance measurements will then be dominated by such measures. The method of scaling features to a similar scale is called normalization. Very much like the distance measure, there are various normalization methods available. One such method is min-max normalization, given mathematically as:

Before we begin with our example to understand KNN, let us outline the steps to be performed to execute KNN:

Collect data and explore data: We need to collect data relevant to the concept to be learnt. We also need to explore the data to understand various features, know the range of their values, and determine the class labels.
Normalize data: As discussed previously, KNN's dependence on distance measure makes it very important that we normalize the data to remove any inconsistency or bias in calculations.
Create training and test data sets: Since it is important to learn a concept and prepare a model which generalizes to acceptable levels for unseen data, we need to prepare training and test data sets. The test data set, even though labeled, is used to determine the accuracy and the ability of the model to generalize the concept learnt. A usual practice is to divide the input sample into two-third and one-third portions for training and test data sets respectively. It is equally important that the two data sets are a good mix of all the class labels and data points, that is both the data sets should be representative subsets of the full data.
Train the model: Now that we have all the things in place, we can use the training data set, test data set, the labels, and the value of k to train our model and label the data points in the test data set.
Evaluate the model: The final step is to evaluate the learnt pattern. In this step, we determine how well the algorithm has predicted the class labels of the test data set as compared to their known labels. Usually a confusion matrix is prepared for the same.

Now, let us see KNN in action. The problem at hand is to classify different species of flowers based upon certain features. For this particular example, we will be using the Iris data set. This data set comes built in with the default installation of R.

Collecting and exploring data

Step one is to collect and explore the data. Let us first gather the data.

To check if your system has the required dataset, type in just the name:

iris
#this should print the contents of data set onto the console.

If you do not have the data set available, no worries! You can download it as follows:

#skip these steps if you already have iris on your system
iris <- read.csv(url("http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"), header = FALSE)

#assign proper headers
names(iris) <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width", "Species")

Now that we have the data, it is time to explore and understand it. For exploring the data set and its attributes, we use the following commands:

#to view top few rows of data
head(iris)

Output:

Collecting and exploring data

#to view data types, sample values, categorical values, etc
str(iris)

Output:

Collecting and exploring data

#detailed view of the data set
summary(iris)

Output:

Collecting and exploring data

The summary command helps us understand the data in a better way. It clearly shows different attributes along with min, max, median, and other such statistics. These help us in the coming steps where we might have to scale or normalize the data or features.

During step one is where we usually label our input data. Since our current data set is already labeled, we can skip this step for this example problem. Let us visually see how the species are spread. We take help of the famous scatter plot again, but this time we use a package called ggvis.

You can install ggvis as:

install.packages("ggvis")

For visualizing petal widths and lengths for all 3 species, we use the following code snippet:

#load the package
library(ggvis)

#plot the species
iris %>% ggvis(~Petal.Length, ~Petal.Width, fill = ~factor(Species)) %>% layer_points()

Note

The ggvis package is an interactive graphics package in R. It follows a unique way of expressing inputs to generate visualizations. The preceding snippet of code uses the pipe operator, %>%, to pass input data to ggvis and again uses the pipe operator to pass on the output to layer_points for final plotting. The ~ operator signifies to ggvis that Petal.Length is a variable in the input dataset (iris). Read more about ggvis at http://ggvis.rstudio.com/ggvis-basics.html.

The preceding plot clearly shows that there is a high correlation between petal widths and lengths for Iris-setosa flowers, while it is a little less for the other two species.

Note

Try visualizing sepal width versus sepal length as well and see if you can spot any correlation.

Normalizing data

The next step is to normalize the data so that all the features are on the same scale. As seen from the data exploration step, the values of all the attributes are more or less in a comparable range. But, for the sake of this example, let us write a min-max normalization function:

#normalization function

min_max_normalizer <- function(x)
{
num <- x - min(x) 
denom <- max(x) - min(x)
return (num/denom)
}

Remember, normalization does not alter the data, it simply scales it. Therefore, even though our data does not require normalization, doing so will not cause any harm.

Note

Note

In the following steps, we will be using un-normalized data for clarity of output.

#normalizing iris data set
normalized_iris <- as.data.frame(lapply(iris[1:4], min_max_normalizer))

#viewing normalized data
summary(normalized_iris)

The following is a summary of the normalized data frame:

Normalizing data

Creating training and test data sets

Now that we have our data normalized, we can divide it into training and test data sets. We will follow the usual two-third one-third rule of splitting the data into two. As mentioned earlier, both data sets should be representative of the whole data and hence we need to pick proper samples. We will utilize R's sample() function to prepare our samples.

#checking the data constituency
table(iris$Species)

Output:

Creating training and test data sets

#set seed for randomization
set.seed(1234)

# setting the training-test split to 67% and 33% respectively
random_samples <- sample(2, nrow(iris), replace=TRUE, prob=c(0.67, 0.33))

# training data set
iris.training <- iris[
random_samples ==1, 1:4] 

#training labels
iris.trainLabels <- iris[
random_samples ==1, 5]


# test data set
iris.test <- iris[
random_samples ==2, 1:4]

#testing labels
iris.testLabels <- iris[
random_samples ==2, 5]

Learning from data/training the model

Once we have the data ready in our training and test data sets, we can proceed to the next step and learn from the data using KNN. The KNN implementation in R is present in the class library. The KNN function takes the following inputs:

train: The data frame containing the training data.
test: The data frame containing the test data.
class: A vector containing the class labels. Also called the factor vector.
k: The value of k-nearest neighbors.

For the current case, let us assume the value of k to be 3. Odd numbers are usually good at breaking ties. KNN is executed as:

#setting library
library(class)

#executing knn for k=3
iris_model <- knn(train = iris.training, test = iris.test, cl = iris.trainLabels, k=3)

#summary of the model learnt
iris_model

Output:

Learning from data/training the model

A quick scan of the output shows everything correct except one versicolor label amongst virginica's. Though this one was easy to spot, there are better ways to evaluate the model.

Evaluating the model

This brings us to the last step where we evaluate the model. We do this by preparing a confusion matrix or a cross table which helps us understand how the predicted labels are with respect to the known labels of the test data. R provides us with yet another utility function called CrossTable() which is present in the gmodels library. Let us see the output:

#setting library
library(gmodels)

#Preparing cross table
CrossTable(x = iris.testLabels, y = iris_model, prop.chisq=FALSE)

Output:

Evaluating the model

From the preceding output we can conclude that the model has labeled one instance of virginica as versicolor while all the other test data points have been correctly labeled. This also helps us infer that the choice of k=3 was indeed good enough. We urge the reader to try the same example with different values of k and see the change in results.

KNN is a simple yet powerful algorithm which makes no assumptions about the underlying data distribution and hence can be used in cases where relationships between features and classes are complex or difficult to understand.

On the downside, KNN is a resource intensive algorithm as it requires a large amount of memory to process the data. Dependency on distance measures and missing data requires additional processing which is another overhead with this algorithm.

Despite its limitations, KNN is used in a number of real life applications such as for text mining, predicting heart attacks, predicting cancer and so on. KNN also finds application in the field of finance and agriculture as well.

Decision trees, random forests, and support vector machines are some of the most popular and widely used supervised classification algorithms.

Unsupervised learning algorithms

Unsupervised learning refers to algorithms which learn a concept(s) on their own. Now that we are familiar with the concept of supervised learning, let us utilize our knowledge to understand unsupervised learning.

Unlike supervised learning algorithms, which require a labeled input training data set, unsupervised learning algorithms are tasked with finding relationships and patterns in the data without any labeled training data set. These algorithms process the input data to mine for rules, detect patterns, summarize and group the data points which helps in deriving meaningful insights, and describing the data to the users. In the case of unsupervised learning algorithms, there is no concept of training and test data sets. Rather, as mentioned before, input data is analyzed and used to derive patterns and relationships.

Similar to supervised learning, unsupervised learning algorithms can also be divided into two main categories:

Association rule based machine learning: These algorithms mine the input data to identify patterns and rules. The rules explain interesting relationships between the variables in the data set to depict frequent itemsets and patterns which occur in the data. These rules in turn help discover useful insights for any business or organization from their huge data repositories. Popular algorithms include Apriori and FP-Growth.
Clustering based machine learning: Similar to supervised learning based classification algorithms, the main objective of these algorithms is to cluster or group the input data points into different classes or categories using just features derived from the input data alone and no other external information. Unlike classification, the output labels are not known beforehand in clustering. Some popular clustering algorithms include k-means, k-medoids, and hierarchical clustering.

Let us look at some unsupervised learning algorithms.

Apriori algorithm

This algorithm which took the world by storm was proposed by Agarwal and Srikant in 1993. The algorithm is designed to handle transactional data, where each transaction is a set of items, or itemset. The algorithm in short identifies the item sets which are subsets of at least C transactions in the data set.

Formally, let ┬ be a set of items and D be a set of transactions, where each transaction T is a subset of ┬. Mathematically:

Then an association rule is an implication of the form X → Y, where a transaction T contains X as a subset of ┬ and:

The implication X → Y holds true in the transaction set D with a confidence factor c if c% of the transactions in D that contain X also contain Y. The association rule X → Y is said to have a support factor of s if s% of transactions in D contain X U Y. Hence, given a set of transactions D, the task of identifying association rules implies generating all such rules that have confidence and support greater than a user defined thresholds called minsup (for minimum support threshold) and minconf (for minimum confidence threshold).

Broadly, the algorithm works in two steps. The first one being identification of the itemsets whose occurrence exceeds a predefined threshold. Such itemsets are called Frequent Itemsets. The second step is to generate association rules from the identified frequent itemsets which satisfy the constraints of minimum confidence and support.

The same two steps can be explained better using the following pseudo-code:

Now let us see the algorithm in action. The data set in consideration is the UCI machine learning repository's Adult data set. The data set contains census data with attributes such as gender, age, marital status, native country, and occupation, along with economic attributes such as work class, income, and so on. We will use this data set to identify if there are association rules between census information and the income of the person.

The Apriori algorithm is present in the arules library and the data set in consideration is named Adult. It is also available with default R installation.

# setting the apriori library
library(arules)

# loading data
data("Adult");

Time to explore our data set and see a few sample records:

# summary of data set
summary(Adult);

# Sample 5 records
inspect(Adult[0:5]);

We get to know that the data set contains some 48k transactions with 115 columns. We also get information regarding the distribution of itemsets based on their sizes. The inspect function gives us a peek into sample transactions and the values each of the columns hold.

Now, let us build some relationships:

# executing apriori with support=50% confidence =80%
rules <- apriori(Adult, parameter=list(support=0.5, confidence=0.8,target="rules"));

# view a summary
summary(rules);

#view top 3 rules
as(head(sort(rules, by = c("confidence", "support")), n=3), "data.frame")

The Apriori algorithm uses the Adult data set as input to identify rules and patterns in the transactional data. On viewing the summary, we can see that the algorithm successfully identified 84 rules meeting the support and confidence constraints of 50% and 80% respectively. Now that we have identified the rules, let us see what they are:

The rules are of the form X→ Y where X is the lhs or left-hand side and Y is the rhs or right-hand side. The preceding image displays corresponding confidence and support values as well. From the output we can infer that if people are working full time then their chances of facing capital loss is almost none (confidence factor 95.8%). Another rule helps us infer that people who work for a private employer also have close to no chance of facing a capital loss. Such rules can be used in preparing policies or schemes for social welfare, economic reforms, and so on.

Apart from Apriori, there are other association rule mining algorithms such as FP Growth, ECLAT, and many others which have been used for various applications over the years.

K-Means

In the world of unsupervised clustering algorithms, the most simple and widely used algorithm is K-Means. As we have seen recently, unsupervised learning algorithms process the input data without any prior labels or training to come up with patterns and relationships. Clustering algorithms in particular help us cluster or partition the data points.

By definition, clustering refers to the task of grouping objects into groups such that elements of a group are more similar to each other than those in other groups. K-Means does the same in an unsupervised manner.

Mathematically, given a set of n observations {x1,x2,…,xn}, where each observation is a d dimensional vector, the algorithm tries to partition these n observations into k (≤ n) sets by minimizing an objective function.

As with the other algorithms, there can be different objective functions. For the sake of simplicity, we will use the most widely used function called the with-in cluster sum of squares or WCSS function.

Here μ_i is the mean of points in the partition S_i.

The algorithm follows a simple two step iterative process, where the first step is called the assignment step, followed by the update step.

Initialize by setting the means for k partitions: m1,m2…mk
Until the mean does not change or the change is lower than a certain threshold:
1. Assignment step: Assign each observation to a partition for which the with-in cluster sum of squares value is minimum, that is, assign the observation to a partition whose mean is closest to the observation.
2. Update step: For i in 1 to k, update each mean mi based on all the observations in that partition.

The algorithm can use different initialization methods. The most common ones are Forgy and Random Partition methods. I encourage you to read more on these. Also, apart from the input data set, the algorithm requires the value of k, that is the number of clusters to be formed. The optimal value may depend on various factors and is generally decided based on the use case.

Let us see the algorithm in action.

We will again use the Iris flower data set which we already used for the KNN algorithm. For KNN we already had the species labeled and then tried to learn and classify the data points in the test data set into correct classes.

With K-Means, we also aim to achieve the same partitioning of the data but without any labeled training data set (or supervision).

# prepare a copy of iris data set
kmean_iris <- iris

#Erase/ Nullify species labels
kmean_iris$Species <- NULL

#apply k-means with k=3
(clusters <- kmeans(kmean_iris, 3))

Now that we have the output from k-means, let us see how well it has partitioned the various species. Remember, k-means does not have partition labels and simply groups the data points.

# comparing cluster labels with actual iris  species labels.
table(iris$Species, clusters$cluster)

Output:

K-Means

The output shows that the species setosa matches cluster label 2, versicolor matches label 3, and so on. Visually, it is easy to make out how the data points are clustered:

# plot the clustered points along sepal length and width
plot(kmean_iris[c("Sepal.Length", "Sepal.Width")], col=clusters$cluster,pch = c(15,16,17)[as.numeric(clusters$cluster)])

points(clusters$centers[,c("Sepal.Length", "Sepal.Width")], col=1:3, pch=8, cex=2)

K-Means finds wide usage in areas like computer graphics for color quantization; it is combined with other algorithms and used for natural language processing, computer vision, and so on.

There are different variations of k-means (R itself provides three different variations). Apart from k-means, other unsupervised clustering algorithms are k-medoids, hierarchical clustering, and others.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Families of algorithms

Create new playlist

Sign In

Sign Up

Families of algorithms

Supervised learning algorithms

Linear regression

K-Nearest Neighbors (KNN)

Collecting and exploring data

Note

Note

Normalizing data

Note

Creating training and test data sets

Learning from data/training the model

Evaluating the model

Unsupervised learning algorithms

Apriori algorithm

K-Means

Table of Contents for
Families of algorithms