Solution:
The following steps will help you to complete this activity:
def all_moves_from_board_list(board_list, sign):
move_list = []
for board in board_list:
move_list.extend(all_moves_from_board(board, sign))
return move_list
In the preceding code snippet, we have defined the all_moves_from_board function, which will enumerate all the possible moves from the board and add the move to a list called move_list.
board = EMPTY_SIGN * 9
all_moves = all_moves_from_board(board, AI_SIGN )
all_moves
The expected output is this:
['X........',
'.X.......',
'..X......',
'...X.....',
'....X....',
'.....X...',
'......X..',
'.......X.',
'........X']
def filter_wins(move_list, ai_wins, opponent_wins):
for board in move_list:
won_by = game_won_by(board)
if won_by == AI_SIGN:
ai_wins.append(board)
move_list.remove(board)
elif won_by == OPPONENT_SIGN:
opponent_wins.append(board)
move_list.remove(board)
In the preceding code snippet, we have defined a filter_wins function, which will add the winning state of the board for each player to a list.
def count_possibilities():
board = EMPTY_SIGN * 9
move_list = [board]
ai_wins = []
opponent_wins = []
for i in range(9):
print('step ' + str(i) + '. Moves: '
+ str(len(move_list)))
sign = AI_SIGN if
i % 2 == 0 else OPPONENT_SIGN
move_list = all_moves_from_board_list
(move_list, sign)
filter_wins(move_list, ai_wins,
opponent_wins)
print('First player wins: ' + str(len(ai_wins)))
print('Second player wins: ' + str(len(opponent_wins)))
print('Draw', str(len(move_list)))
print('Total', str(len(ai_wins)
+ len(opponent_wins) + len(move_list)))
return len(ai_wins), len(opponent_wins),
len(move_list), len(ai_wins)
+ len(opponent_wins) + len(move_list)
We have up to 9 steps in each state. In the 0th, 2nd, 4th, 6th, and 8th iterations, the AI player moves. In all the other iterations, the opponent moves. We create all possible moves in all steps and take out the completed games from the move list.
first_player, second_player,
draw, total = count_possibilities()
The expected output is this:
step 0. Moves: 1
step 1. Moves: 9
step 2. Moves: 72
step 3. Moves: 504
step 4. Moves: 3024
step 5. Moves: 13680
step 6. Moves: 49402
step 7. Moves: 111109
step 8. Moves: 156775
First player wins: 106279
Second player wins: 68644
Draw 91150
Total 266073
As you can see, the tree of the board states consists of a total of 266073 leaves. The count_possibilities function essentially implements a BFS algorithm to traverse all the possible states of the game. Notice that we count these states multiple times because placing an X in the top-right corner in Step 1 and placing an X in the top-left corner in Step 3 leads to similar possible states as starting with the top-left corner and then placing an X in the top-right corner. If we implemented the detection of duplicate states, we would have to check fewer nodes. However, at this stage, due to the limited depth of the game, we will omit this step.
A decision tree, however, is identical to the data structure examined by count_possibilities. In a decision tree, we explore the utility of each move by investigating all possible future steps up to a certain extent. In our example, we could calculate the utility of the initial moves by observing the number of wins and losses after fixing the first few moves.
Note
The root of the tree is the initial state. An internal state of the tree is a state in which a game has not been ended and moves are still possible. A leaf of the tree contains a state where a game has ended.
To access the source code for this specific section, please refer to https://packt.live/3doxPog.
You can also run this example online at https://packt.live/3dpnuIz.
You must execute the entire Notebook in order to get the desired result.
Solution:
The following steps will help you to complete this activity:
In each iteration, it checks whether the game can be won by the player.
def player_can_win(board, sign):
next_moves = all_moves_from_board(board, sign)
for next_move in next_moves:
if game_won_by(next_move) == sign:
return True
return False
def ai_move(board):
new_boards = all_moves_from_board(board, AI_SIGN)
for new_board in new_boards:
if game_won_by(new_board) == AI_SIGN:
return new_board
safe_moves = []
for new_board in new_boards:
if not player_can_win(new_board, OPPONENT_SIGN):
safe_moves.append(new_board)
return choice(safe_moves)
if len(safe_moves) > 0 else new_boards[0]
In the preceding code snippet, we have defined the ai_move function, which tells the AI how to move by looking at the list of all the possibilities and choosing one where the player cannot win in the next move. If you test our new application, you will find that the AI has made the correct move.
def all_moves_from_board(board, sign):
move_list = []
for i, v in enumerate(board):
if v == EMPTY_SIGN:
new_board = board[:i] + sign + board[i+1:]
move_list.append(new_board)
if game_won_by(new_board) == AI_SIGN:
return [new_board]
if sign == AI_SIGN:
safe_moves = []
for move in move_list:
if not player_can_win(move, OPPONENT_SIGN):
safe_moves.append(move)
return safe_moves if len(safe_moves) > 0 else move_list[0:1]
else:
return move_list
In the preceding code snippet, we have defined a function that generates all possible moves. As soon as we find the next move that can make the player win, we return a move to counter it. We do not care whether the player has multiple options to win the game in one move – we just return the first possibility. If the AI cannot stop the player from winning, we return all possible moves.
Let's see what this means in terms of counting all of the possibilities at each step.
first_player, second_player,
draw, total = count_possibilities()
The expected output is this:
step 0. Moves: 1
step 1. Moves: 9
step 2. Moves: 72
step 3. Moves: 504
step 4. Moves: 3024
step 5. Moves: 5197
step 6. Moves: 18606
step 7. Moves: 19592
step 8. Moves: 30936
First player wins: 20843
Second player wins: 962
Draw 20243
Total 42048
We are doing better than before. We not only got rid of almost 2/3 of possible games again, but, most of the time, the AI player either wins or settles for a draw.
Note
To access the source code for this specific section, please refer to https://packt.live/2B0G9xf.
You can also run this example online at https://packt.live/2V7qLpO.
You must execute the entire Notebook in order to get the desired result.
Solution:
The following steps will help you to complete this activity:
def all_moves_from_board(board, sign):
if sign == AI_SIGN:
empty_field_count = board.count(EMPTY_SIGN)
if empty_field_count == 9:
return [sign + EMPTY_SIGN * 8]
elif empty_field_count == 7:
return [board[:8] + sign if board[8] ==
EMPTY_SIGN else board[:4] + sign + board[5:]]
move_list = []
for i, v in enumerate(board):
if v == EMPTY_SIGN:
new_board = board[:i] + sign + board[i+1:]
move_list.append(new_board)
if game_won_by(new_board) == AI_SIGN:
return [new_board]
if sign == AI_SIGN:
safe_moves = []
for move in move_list:
if not player_can_win(move, OPPONENT_SIGN):
safe_moves.append(move)
return safe_moves if len(safe_moves) > 0 else move_list[0:1]
else:
return move_list
first_player, second_player, draw, total = count_possibilities()
The expected output is this:
step 0. Moves: 1
step 1. Moves: 1
step 2. Moves: 8
step 3. Moves: 8
step 4. Moves: 48
step 5. Moves: 38
step 6. Moves: 108
step 7. Moves: 76
step 8. Moves: 90
First player wins: 128
Second player wins: 0
Draw 60
Total 188
After fixing the first two steps, we only need to deal with 8 possibilities instead of 504. We also guided the AI into a state where the hardcoded rules were sufficient enough for it to never lose a game. Fixing the steps is not important because we would give the AI hardcoded steps to start with, but it is important because it is a tool that is used to evaluate and compare each step. After fixing the first two steps, we only need to deal with 8 possibilities instead of 504. We also guided the AI into a state, where the hardcoded rules were sufficient for never losing a game. As you can see, the AI is now nearly invincible and will only win or make a draw.
The best that a player can hope to get against this AI is a draw.
Note
To access the source code for this specific section, please refer to https://packt.live/2YnUcpA.
You can also run this example online at https://packt.live/318TBtq.
You must execute the entire Notebook in order to get the desired result.
Solution:
Let's set up the TwoPlayersGame framework by writing the init method.
from easyAI import TwoPlayersGame, Human_Player
class ConnectFour(TwoPlayersGame):
def __init__(self, players):
self.players = players
self.board = [0 for i in range(42)]
self.nplayer = 1
def generate_winning_tuples():
tuples = []
# horizontal
tuples += [list(range(row*7+column,
row*7+column+4, 1))
for row in range(6)
for column in range(4)]
# vertical
tuples += [list(range(row*7+column,
row*7+column+28, 7))
for row in range(3)
for column in range(7)]
# diagonal forward
tuples += [list(range(row*7+column,
row*7+column+32, 8))
for row in range(3)
for column in range(4)]
# diagonal backward
tuples += [list(range(row*7+column,
row*7+column+24, 6))
for row in range(3)
for column in range(3, 7, 1)]
return tuples
self.tuples = generate_winning_tuples()
def possible_moves(self):
return [column+1
for column in range(7)
if any([self.board[column+row*7] == 0
for row in range(6)])
]
def make_move(self, move):
column = int(move) - 1
for row in range(5, -1, -1):
index = column + row*7
if self.board[index] == 0:
self.board[index] = self.nplayer
return
# optional method (speeds up the AI)
def unmake_move(self, move):
column = int(move) - 1
for row in range(6):
index = column + row*7
if self.board[index] != 0:
self.board[index] = 0
return
def lose(self):
return any([all([(self.board[c] == self.nopponent)
for c in line])
for line in self.tuples])
def is_over(self):
return (self.possible_moves() == []) or self.lose()
def show(self):
print(' '+' '.join([
' '.join([['.', 'O', 'X']
[self.board[7*row+column]]
for column in range(7)])
for row in range(6)]))
def scoring(self):
return -100 if self.lose() else 0
if __name__ == "__main__":
from easyAI import AI_Player, Negamax
ai_algo = Negamax(6)
ConnectFour([Human_Player(),
AI_Player(ai_algo)]).play()
The expected output is this:
By completing this activity, you have seen that the opponent is not perfect, but that it plays reasonably well. If you have a strong computer, you can increase the parameter of the Negamax algorithm. We encourage you to come up with a better heuristic.
Note
To access the source code for this specific section, please refer to https://packt.live/3esk2hI.
You can also run this example online at https://packt.live/3dnkfS5.
You must execute the entire Notebook in order to get the desired result.
Solution:
import numpy as np
import pandas as pd
from sklearn import preprocessing
from sklearn import model_selection
from sklearn import linear_model
from sklearn.preprocessing import PolynomialFeatures
file_url = 'https://raw.githubusercontent.com/'
'PacktWorkshops/'
'The-Applied-Artificial-Intelligence-Workshop/'
'master/Datasets/boston_house_price.csv'
df = pd.read_csv(file_url)
The output of df is as follows:
Earlier in this chapter, you learned that most of the required packages to perform linear regression come from sklearn. We need to import the preprocessing module to scale the data, the linear_model module to train linear regression, the PolynomialFeatures module to transform the inputs for the polynomial regression, and the model_selection module to evaluate the performance of each model.
features = np.array(df.drop('MEDV', 1))
label = np.array(df['MEDV'])
scaled_features = preprocessing.scale(features)
The output for features is as follows:
As you can see, our features have been converted into a NumPy array.
The output for the label is as follows:
As you can see, our labels have been converted into a NumPy array.
The output for scaled_features is as follows:
array([[-0.41978194, 0.28482986, -1.2879095 , ...,
-0.66660821, -1.45900038, -1.0755623 ],
[-0.41733926, -0.48772236, -0.59338101, ...,
-0.98732948, -0.30309415, -0.49243937],
[-0.41734159, -0.48772236, -0.59338101, ...,
-0.98732948, -0.30309415, -1.2087274 ],
...,
[-0.41344658, -0.48772236, 0.11573841, ...,
-0.80321172, 1.17646583, -0.98304761],
[-0.40776407, -0.48772236, 0.11573841, ...,
-0.80321172, 1.17646583, -0.86530163],
[-0.41500016, -0.48772236, 0.11573841, ...,
-0.80321172, 1.17646583, -0.66905833]])
As you can see, our features have been properly scaled.
As we don't have any missing values and we are not trying to predict a future value as we did in Exercise 2.03, Preparing the Quandl Data for Prediction, we can directly convert the label ('MEDV') and features into NumPy arrays. Then, we can scale the arrays of features using the preprocessing.scale() function.
poly_1_scaled_features = PolynomialFeatures(degree=1)
.fit_transform(scaled_features)
poly_2_scaled_features = PolynomialFeatures(degree=2)
.fit_transform(scaled_features)
poly_3_scaled_features = PolynomialFeatures(degree=3)
.fit_transform(scaled_features)
The output for poly_1_scaled_features is as follows:
array([[ 1. , -0.41978194, 0.28482986, ..., -0.66660821,
-1.45900038, -1.0755623 ],
[ 1. , -0.41733926, -0.48772236, ..., -0.98732948,
-0.30309415, -0.49243937],
[ 1. , -0.41734159, -0.48772236, ..., -0.98732948,
-0.30309415, -1.2087274 ],
...,
[ 1. , -0.41344658, -0.48772236, ..., -0.80321172,
1.17646583, -0.98304761],
[ 1. , -0.40776407, -0.48772236, ..., -0.80321172,
1.17646583, -0.86530163],
[ 1. , -0.41500016, -0.48772236, ..., -0.80321172,
1.17646583, -0.66905833]])
Our scaled_features variable has been properly transformed for the polynomial regression of degree 1.
The output for poly_2_scaled_features is as follows:
Our scaled_features variable has been properly transformed for the polynomial regression of degree 2.
The output for poly_3_scaled_features is as follows:
array([[ 1. , -0.41978194, 0.28482986, ..., -2.28953024,
-1.68782164, -1.24424733],
[ 1. , -0.41733926, -0.48772236, ..., -0.04523847,
-0.07349928, -0.11941484],
[ 1. , -0.41734159, -0.48772236, ..., -0.11104103,
-0.4428272 , -1.76597723],
...,
[ 1. , -0.41344658, -0.48772236, ..., -1.36060852,
1.13691611, -0.9500001 ],
[ 1. , -0.40776407, -0.48772236, ..., -1.19763962,
0.88087515, -0.64789192],
[ 1. , -0.41500016, -0.48772236, ..., -0.9260248 ,
0.52663205, -0.29949664]])
Our scaled_features variable has been properly transformed for the polynomial regression of degree 3.
We had to transform the scaled features in three different ways as each degree of polynomial regression required a different input transformation.
(poly_1_features_train, poly_1_features_test,
poly_label_train, poly_label_test) =
model_selection.train_test_split(poly_1_scaled_features,
label,
test_size=0.1,
random_state=8)
(poly_2_features_train, poly_2_features_test,
poly_label_train, poly_label_test) =
model_selection.train_test_split(poly_2_scaled_features,
label,
test_size=0.1,
random_state=8)
(poly_3_features_train, poly_3_features_test,
poly_label_train, poly_label_test) =
model_selection.train_test_split(poly_3_scaled_features,
label,
test_size=0.1,
random_state=8)
As we have three different sets of scaled transformed features but the same set of labels, we had to perform three different splits. By using the same set of labels and random_state in each splitting, we ensure that we obtain the same poly_label_train and poly_label_test for every split.
model_1 = linear_model.LinearRegression()
model_1.fit(poly_1_features_train, poly_label_train)
model_1_score_train = model_1.score(poly_1_features_train,
poly_label_train)
model_1_score_test = model_1.score(poly_1_features_test,
poly_label_test)
The output for model_1_score_train is as follows:
0.7406006443486721
The output for model_1_score_test is as follows:
0.6772229017901507
To estimate whether a model is overfitting or not, we need to compare the scores of the model applied to the training set and testing set. If the score for the training set is much higher than the test set, we are overfitting. This is the case here where the polynomial regression of degree 1 achieved a score of 0.74 for the training set compared to 0.68 for the testing set.
model_2 = linear_model.LinearRegression()
model_2.fit(poly_2_features_train, poly_label_train)
model_2_score_train = model_2.score(poly_2_features_train,
poly_label_train)
model_2_score_test = model_2.score(poly_2_features_test,
poly_label_test)
The output for model_2_score_train is as follows:
0.9251199698832675
The output for model_2_score_test is as follows:
0.8253870684280571
Like with the polynomial regression of degree 1, our polynomial regression of degree 2 is overfitting even more than degree 1, but has managed to achieve better results at the end.
model_3 = linear_model.LinearRegression()
model_3.fit(poly_3_features_train, poly_label_train)
model_3_score_train = model_3.score(poly_3_features_train,
poly_label_train)
model_3_score_test = model_3.score(poly_3_features_test,
poly_label_test)
The output for model_3_score_train is as follows:
0.9910498071894897
The output for model_3_score_test is as follows:
-8430.781888645262
These results are very interesting because the polynomial regression of degree 3 managed to achieve a near-perfect score with 0.99 (1 is the maximum). This is a warning sign that our model is overfitting too much. We have the confirmation of this warning when the model is applied to the testing set and achieves a very low negative score of -8430. As a reminder, a score of 0 can be achieved by using the mean of the data as a prediction. This means that our third model managed to make worse predictions than just using the mean.
model_1_prediction = model_1.predict(poly_1_features_test)
model_2_prediction = model_2.predict(poly_2_features_test)
model_3_prediction = model_3.predict(poly_3_features_test)
df_prediction = pd.DataFrame(poly_label_test)
df_prediction.rename(columns = {0:'label'}, inplace = True)
df_prediction['model_1_prediction'] =
pd.DataFrame(model_1_prediction)
df_prediction['model_2_prediction'] =
pd.DataFrame(model_2_prediction)
df_prediction['model_3_prediction'] =
pd.DataFrame(model_3_prediction)
The output of df_prediction is as follows:
After applying the predict function for each model on their respective testing set, in order to get the predicted values, we convert them into a single df_prediction DataFrame with the label values. Increasing the number of degrees in polynomial regressions does not necessarily mean that the model will perform better compared to one with a lower degree. In fact, increasing the degree will lead to more overfitting on the training data.
Note
To access the source code for this specific section, please refer to https://packt.live/3eD8gAY.
You can also run this example online at https://packt.live/3etadjp.
You must execute the entire Notebook in order to get the desired result.
In this activity, we learned how to perform polynomial regressions of degrees 1 to 3 with multiple variables on the Boston House Price dataset and saw how increasing the degrees led to overfitted models.
Solution:
from sklearn import neighbors
def fit_knn(k, p, features_train, label_train,
features_test, label_test):
classifier = neighbors.KNeighborsClassifier(n_neighbors=k, p=p)
classifier.fit(features_train, label_train)
return classifier.score(features_train, label_train),
classifier.score(features_test, label_test)
acc_train_1, acc_test_1 = fit_knn(5, 2, features_train,
label_train,
features_test, label_test)
acc_train_1, acc_test_1
The expected output is this:
(0.78625, 0.75)
With k=5 and p=2, KNN achieved a good accuracy score close to 0.78. But the score is quite different from the training and testing sets, which means the model is overfitting.
acc_train_2, acc_test_2 = fit_knn(10, 2, features_train,
label_train,
features_test, label_test)
acc_train_2, acc_test_2
The expected output is this:
(0.775, 0.785)
Increasing the number of neighbors to 10 has decreased the accuracy score of the training set, but now it is very close to the testing set.
acc_train_3, acc_test_3 = fit_knn(15, 2, features_train,
label_train,
features_test, label_test)
acc_train_3, acc_test_3
The expected output is this:
(0.76625, 0.79)
With k=15 and p=2, the difference between the training and testing sets has increased.
acc_train_4, acc_test_4 = fit_knn(25, 2, features_train,
label_train,
features_test, label_test)
acc_train_4, acc_test_4
The expected output is this:
(0.7375, 0.77)
Increasing the number of neighbors to 25 has a significant impact on the training set. However, the model is still overfitting.
acc_train_5, acc_test_5 = fit_knn(50, 2, features_train,
label_train,
features_test, label_test)
acc_train_5, acc_test_5
The expected output is this:
(0.70625, 0.775)
Bringing the number of neighbors to 50 neither improved the model's performance or the overfitting issue.
acc_train_6, acc_test_6 = fit_knn(5, 1, features_train,
label_train,
features_test, label_test)
acc_train_6, acc_test_6
The expected output is this:
(0.8, 0.735)
Changing to the Manhattan distance has helped increase the accuracy of the training set, but the model is still overfitting.
acc_train_7, acc_test_7 = fit_knn(10, 1, features_train,
label_train,
features_test, label_test)
acc_train_7, acc_test_7
The expected output is this:
(0.77, 0.785)
With k=10, the accuracy score for the training and testing sets are quite close to each other: around 0.78.
acc_train_8, acc_test_8 = fit_knn(15, 1, features_train,
label_train,
features_test, label_test)
acc_train_8, acc_test_8
The expected output is this:
(0.7575, 0.775)
Bumping k to 15, the model achieved a better accuracy score and is not overfitting very much.
acc_train_9, acc_test_9 = fit_knn(25, 1, features_train,
label_train,
features_test, label_test)
acc_train_9, acc_test_9
The expected output is this:
(0.745, 0.8)
With k=25, the difference between the training and testing sets' accuracy is increasing, so the model is overfitting.
acc_train_10, acc_test_10 = fit_knn(50, 1, features_train,
label_train,
features_test, label_test)
acc_train_10, acc_test_10
The expected output is this:
(0.70875, 0.78)
With k=50, the model's performance on the training set dropped significantly and the model is definitely overfitting.
In this activity, we tried multiple combinations of hyperparameters for n_neighbors and p. The best one we found was for n_neighbors=10 and p=2. With these hyperparameters, the model is not overfitting much and it achieved an accuracy score of around 78% for both the training and testing sets.
Note
To access the source code for this specific section, please refer to https://packt.live/2V5TOtG.
You can also run this example online at https://packt.live/2Bx0yd8.
You must execute the entire Notebook in order to get the desired result.
Solution:
from sklearn import svm
def fit_svm(features_train, label_train,
features_test, label_test,
kernel="linear", C=1,
degree=3, gamma='scale'):
classifier = svm.SVC(kernel=kernel, C=C,
degree=degree, gamma=gamma)
classifier.fit(features_train, label_train)
return classifier.score(features_train, label_train),
classifier.score(features_test, label_test)
acc_train_1,
acc_test_1 = fit_svm(features_train,
label_train,
features_test,
label_test)
acc_train_1, acc_test_1
The expected output is this:
(0.71625, 0.75)
With the default hyperparameter values (linear model), the performance of the model is quite different between the training and the testing set.
acc_train_2,
acc_test_2 = fit_svm(features_train, label_train,
features_test, label_test,
kernel="poly", C=1,
degree=4, gamma=0.05)
acc_train_2, acc_test_2
The expected output is this:
(0.68875, 0.745)
With a fourth-degree polynomial, the model is not performing well on the training set.
acc_train_3,
acc_test_3 = fit_svm(features_train,
label_train, features_test,
label_test, kernel="poly",
C=2, degree=4, gamma=0.05)
acc_train_3, acc_test_3
The expected output is this:
(0.68875, 0.745)
Increasing the regularization parameter, C, didn't impact the model's performance at all.
acc_train_4,
acc_test_4 = fit_svm(features_train,
label_train, features_test,
label_test, kernel="poly",
C=1, degree=4, gamma=0.25)
acc_train_4, acc_test_4
The expected output is this:
(0.84625, 0.775)
Increasing the value of gamma to 0.25 has significantly improved the model's performance on the training set. However, the accuracy on the testing set is much lower, so the model is overfitting.
acc_train_5,
acc_test_5 = fit_svm(features_train,
label_train, features_test,
label_test, kernel="poly",
C=1, degree=4, gamma=0.5)
acc_train_5, acc_test_5
The expected output is this:
(0.9575, 0.73)
Increasing the value of gamma to 0.5 has drastically improved the model's performance on the training set, but it is definitely overfitting as the accuracy score on the testing set is much lower.
acc_train_6,
acc_test_6 = fit_svm(features_train, label_train,
features_test, label_test,
kernel="poly", C=1,
degree=4, gamma=0.16)
acc_train_6, acc_test_6
The expected output is this:
(0.76375, 0.785)
With gamma=0.16, the model achieved a better accuracy score than it did for the best KNN model. Both the training and testing sets have a score of around 0.77.
acc_train_7,
acc_test_7 = fit_svm(features_train, label_train,
features_test, label_test,
kernel="sigmoid")
acc_train_7, acc_test_7
The expected output is this:
(0.635, 0.66)
The sigmoid kernel achieved a low accuracy score.
acc_train_8,
acc_test_8 = fit_svm(features_train,
label_train, features_test,
label_test, kernel="rbf",
gamma=0.15)
acc_train_8, acc_test_8
The expected output is this:
(0.7175, 0.765)
The rbf kernel achieved a good score with gamma=0.15. The model is overfitting a bit, though.
acc_train_9,
acc_test_9 = fit_svm(features_train,
label_train, features_test,
label_test, kernel="rbf",
gamma=0.25)
acc_train_9, acc_test_9
The expected output is this:
(0.74, 0.765)
The model performance got better with gamma=0.25, but it is still overfitting.
acc_train_10,
acc_test_10 = fit_svm(features_train, label_train,
features_test, label_test,
kernel="rbf", gamma=0.35)
acc_train_10, acc_test_10
The expected output is this:
(0.78125, 0.775)
With the rbf kernel and gamma=0.35, we got very similar results for the training and testing sets and the model's performance is higher than the best KNN we trained in the previous activity. This is our best model for the German credit dataset.
Note
To access the source code for this specific section, please refer to https://packt.live/3fPZlMQ.
You can also run this example online at https://packt.live/3hVlEm3.
You must execute the entire Notebook in order to get the desired result.
In this activity, we tried different values for the main hyperparameters of the SVM classifier: kernel, gamma, C, and degrees. We saw how they affected the model's performance and their tendency to overfit. With trial and error, we finally found the best hyperparameter combination and achieved an accuracy score close to 0.78. This process is called hyperparameter tuning and is an important step for any data science project.
Solution:
import pandas as pd
file_url = 'https://raw.githubusercontent.com/'
'PacktWorkshops/'
'The-Applied-Artificial-Intelligence-Workshop/'
'master/Datasets/car.csv'
df = pd.read_csv(file_url)
df.head()
The output will be as follows:
from sklearn import preprocessing
def encode(data_frame, column):
label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(data_frame[column].unique())
return label_encoder.transform(data_frame[column])
for column in df.columns:
df[column] = encode(df, column)
df.head()
The output will be as follows:
label = df.pop('class')
from sklearn import model_selection
features_train, features_test, label_train, label_test =
model_selection.train_test_split(df, label,
test_size=0.1,
random_state=88)
from sklearn.tree import DecisionTreeClassifier
decision_tree = DecisionTreeClassifier()
decision_tree.fit(features_train, label_train)
The output will be as follows:
decision_tree.score( features_test, label_test )
The output will be as follows:
0.953757225433526
The decision tree is achieving an accuracy score of 0.95 for our first try. This is remarkable.
from sklearn.metrics import classification_report
print(classification_report(label_test,
decision_tree.predict(features_test)))
The output will be as follows:
From this classification report, we can see that our model is performing quite well for the precision scores for all four classes. Regarding the recall score, we can see that it didn't perform as well for the last class.
Note
To access the source code for this specific section, please refer to https://packt.live/3hQDLtr.
You can also run this example online at https://packt.live/2NkEEML.
You must execute the entire Notebook in order to get the desired result.
By completing this activity, you have prepared the car dataset and trained a decision tree model. You have learned how to get its accuracy score and a classification report so that you can analyze its precision and recall scores.
Solution:
from sklearn.ensemble import RandomForestClassifier
random_forest_classifier =
RandomForestClassifier(n_estimators=100,
max_depth=6, random_state=168)
random_forest_classifier.fit(features_train, label_train)
The output will be as follows:
These are the logs of the RandomForest classifier with its hyperparameter values.
rf_preds_test = random_forest_classifier.fit(features_train,
label_train)
rf_preds_test
The output will be as follows:
from sklearn.metrics import classification_report
print(classification_report(label_test, rf_preds_test))
The output will be as follows:
The F1 score in the preceding report shows us that the random forest is performing well on class 2 but not as good for classes 0 and 3. The model is unable to predict accurately for class 1, but there were only 9 observations in the testing set. The accuracy score is 0.84, while the F1 score is 0.82.
from sklearn.metrics import confusion_matrix
confusion_matrix(label_test, rf_preds_test)
The output will be as follows:
array([[ 32, 0, 10, 0],
[ 8, 0, 0, 1],
[ 5, 0, 109, 0],
[ 3, 0, 0, 5]])
From this confusion matrix, we can see that the RandomForest model is having difficulties accurately predicting the first class. It incorrectly predicted 16 cases (8 + 5 + 3) for this class.
rf_varimp = random_forest_classifier.feature_importances_
rf_varimp
The output will be as follows:
array([0.12676384, 0.10366314, 0.02119621, 0.35266673,
0.05915769, 0.33655239])
The preceding output shows us that the most important features are the fourth and sixth ones, which correspond to persons and safety, respectively.
from sklearn.ensemble import ExtraTreesClassifier
extra_trees_classifier =
ExtraTreesClassifier(n_estimators=100,
max_depth=6, random_state=168)
extra_trees_classifier.fit(features_train, label_train)
The output will be as follows:
These are the logs of the extratrees classifier with its hyperparameter values.
et_preds_test = extra_trees_classifier.predict(features_test)
et_preds_test
The output will be as follows:
print(classification_report(label_test,
extra_trees_classifier.predict(features_test)))
The output will be as follows:
The F1 score shown in the preceding report shows us that the random forest is performing well on class 2 but not as good for class 0. The model is unable to predict accurately for classes 1 and 3, but there were only 9 and 8 observations in the testing set, respectively. The accuracy score is 0.82, while the F1 score is 0.78. So, our RandomForest classifier performed better with extratrees.
confusion_matrix(label_test, et_preds_test)
The output will be as follows:
array([[ 28, 0, 14, 0],
[ 9, 0, 0, 0],
[ 2, 0, 112, 0],
[ 7, 0, 0, 1]])
From this confusion matrix, we can see that the extratrees model is having difficulties accurately predicting the first and third classes.
et_varimp = extra_trees_classifier.feature_importances_
et_varimp
The output will be as follows:
array([0.08844544, 0.0702334 , 0.01440408, 0.37662014, 0.05965896,
0.39063797])
The preceding output shows us that the most important features are the sixth and fourth ones, which correspond to safety and persons, respectively. It is interesting to see that RandomForest has the same two most important features but in a different order.
Note
To access the source code for this specific section, please refer to https://packt.live/2YoUY5t.
You can also run this example online at https://packt.live/3eswBcW.
You must execute the entire Notebook in order to get the desired result.
Solution:
import pandas as pd
file_url = 'https://raw.githubusercontent.com/'
'PacktWorkshops/'
'The-Applied-Artificial-Intelligence-Workshop/'
'master/Datasets/'
'Sales_Transactions_Dataset_Weekly.csv'
df = pd.read_csv(file_url)
df
The output of df is as follows:
If you look at the output, you will notice that our dataset contains 811 rows, with each row representing a product. It also contains 107 columns, with the first column being the product code, then 52 columns starting with W representing the sale quantity for each week, and finally, the normalized version of the 52 columns, starting with the Normalized columns. The normalized columns will be a better choice to work with rather than the absolute sales columns, W, as they will help our k-means algorithms to find the center of each cluster faster. Since we are going to work on the normalized columns, we can remove every W column plus the Product_Code column. We can also remove the MIN and MAX columns as they do not bring any value to our clustering. Also notice that the weeks run from 0 to 51 and not 1 to 52.
df2 = df.drop(df.iloc[:, 0:55], inplace = False, axis = 1)
The output of df2 is as follows:
In the preceding code snippet, we used the drop function of the pandas DataFrame in order to remove the first 55 columns. We also set the inplace parameter to False in order to not remove the column of our original df DataFrame. As a result, we should only have the normalized columns from 0 to 51 in df2 and df should still be unchanged.
from sklearn.cluster import KMeans
k_means_model = KMeans(n_clusters=8, random_state=8)
k_means_model.fit(df2)
We build a k-means model with the default value for every parameter except for n_clusters=8 with random_state=8 in order to obtain 8 clusters and reproducible results.
labels = k_means_model.labels_
labels
The output of labels will be as follows:
It is very hard to make sense out of this output, but each index of labels represents the cluster that the product has been assigned, based on similar weekly sales trends. We can now use these cluster labels to group products together.
df.drop(df.iloc[:, 53:], inplace = True, axis = 1)
df.drop('Product_Code', inplace = True, axis = 1)
df['label'] = labels
df
In the preceding code snippet, we removed all the unneeded columns and added labels as a new column in the DataFrame.
The output of df will be as follows:
Now that we have the label, we can perform aggregation on the label column in order to calculate the yearly average sales of each cluster.
df_agg = df.groupby('label').sum()
df_final = df[['label','W0']].groupby('label').count()
df_final=df_final.rename(columns = {'W0':'count_product'})
df_final['total_sales'] = df_agg.sum(axis = 1)
df_final['yearly_average_sales']=
df_final['total_sales'] / df_final['count_product']
df_final.sort_values(by='yearly_average_sales',
ascending=False, inplace = True)
df_final
In the preceding code snippet, we first used the groupby function with the sum() method of the DataFrame to calculate the sum of every product's sales for each W column and cluster, and stored the results in df_agg. We then used the groupby function with the count() method on a single column (an arbitrary choice) of df to obtain the total number of products per cluster (note that we also had to rename the W0 column after the aggregation). The next step was to sum all the sales columns of df_agg in order to obtain the total sales for each cluster. Finally, we calculated the yearly_average_sales for each cluster by dividing total_sales by count_product. We also included a final step to sort out the cluster by the highest yearly_average_sales.
The output of df_final will be as follows:
Now, with this output, we see that our k-means model has managed to put similarly performing products together. We can easily see that the 115 products in cluster 3 are the best-selling products, whereas the 123 products of cluster 1 are performing very badly. This is very valuable for any business, as it helps them automatically identify and group together a number of similarly performing products without having any bias in the product name or description.
Note
To access the source code for this specific section, please refer to https://packt.live/3fVpSbT.
You can also run this example online at https://packt.live/3hW24Gk.
You must execute the entire Notebook in order to get the desired result.
By completing this activity, you have learned how to perform k-means clustering on multiple columns for many products. You have also learned how useful clustering can be for a business, even without label data.
Solution:
import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.cluster import MeanShift
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram
import scipy.cluster.hierarchy as sch
from sklearn import metrics
file_url = 'https://raw.githubusercontent.com/'
'PacktWorkshops/'
'The-Applied-Artificial-Intelligence-Workshop/'
'master/Datasets/winequality-red.csv'
df = pd.read_csv(file_url,sep=';')
df
The output of df is as follows:
Note
The output from the preceding screenshot is truncated.
Our dataset contains 1599 rows, with each row representing a red wine. It also contains 12 columns, with the last column being the quality of the wine. We can see that the remaining 11 columns will be our features, and we need to scale them in order to help the accuracy and speed of our models.
features = df.drop('quality', 1)
label = df['quality']
scaled_features = preprocessing.scale(features)
In the preceding code snippet, we separated the label (quality) from the features. Then we used preprocessing.scale function from sklearn in order to scale our features, as this will improve our models.
mean_shift_model = MeanShift()
mean_shift_model.fit(scaled_features)
n_cluster_mean_shift = len(mean_shift_model.cluster_centers_)
label_mean_shift = mean_shift_model.labels_
n_cluster_mean_shift
The output of n_cluster_mean_shift will be as follows:
10
Our mean shift model has created 10 clusters, which is already more than the number of groups that we have in our quality label. This will probably affect our extrinsic scores and might be an early indicator that wines sharing similar physicochemical properties don't belong in the same quality group.
The output of label_mean_shift will be as follows:
This is a very interesting output because it clearly shows that most wines in our dataset are very similar; there are a lot more wines in cluster 0 than in the other clusters.
dendrogram = sch.dendrogram(sch.linkage(scaled_features,
method='ward'))
agglomerative_model =
AgglomerativeClustering(n_clusters=7,
affinity='euclidean',
linkage='ward')
agglomerative_model.fit(scaled_features)
label_agglomerative = agglomerative_model.labels_
The output of dendrogram will be as follows:
From this output, we can see that seven clusters seems to be the optimal number for our model. We get this number by searching for the highest difference on the y axis between the lowest branch and the highest branch. In our case, for seven clusters, the lowest branch has a value of 29 and the highest branch has a value of 41.
The output of label_agglomerative will be as follows:
We can see that we have a predominant cluster, 1, but not as much as was the case in the mean shift model.
a. Begin with the adjusted Rand index:
ARI_mean=metrics.adjusted_rand_score(label, label_mean_shift)
ARI_agg=metrics.adjusted_rand_score(label, label_agglomerative)
ARI_mean
The output of ARI_mean will be as follows:
0.0006771608724007207
Next, enter ARI_agg to get the expected values:
ARI_agg
The output of ARI_agg will be as follows:
0.05358047852603172
Our agglomerative model has a much higher adjusted_rand_score than the mean shift model, but both scores are very close to 0, which means that neither model is performing very well with regard to the true labels.
b. Next, calculate the adjusted mutual information:
AMI_mean = metrics.adjusted_mutual_info_score(label,
label_mean_shift)
AMI_agg = metrics.adjusted_mutual_info_score(label,
label_agglomerative)
AMI_mean
The output of AMI_mean will be as follows:
0.004837187596124968
Next, enter AMI_agg to get the expected values:
AMI_agg
The output of AMI_agg will be as follows:
0.05993098663692826
Our agglomerative model has a much higher adjusted_mutual_info_score than the mean shift model, but both scores are very close to 0, which means that neither model is performing very well with regard to the true labels.
c. Calculate the V-Measure:
V_mean = metrics.v_measure_score(label,
label_mean_shift, beta=1)
V_agg = metrics.v_measure_score(label,
label_agglomerative, beta=1)
V_mean
The output of V_mean will be as follows:
0.021907254751144124
Next, enter V_agg to get the expected values:
V_agg
The output of V_agg will be as follows:
0.07549735446050691
Our agglomerative model has a higher V-Measure than the mean shift model, but both scores are very close to 0, which means that neither model is performing very well with regard to the true labels.
d. Next, find the Fowlkes-Mallows score:
FM_mean = metrics.fowlkes_mallows_score(label,
label_mean_shift)
FM_agg= metrics.fowlkes_mallows_score(label,
label_agglomerative)
FM_mean
The output of FM_mean will be as follows:
0.5721233634622408
Next, enter FM_agg to get the expected values:
FM_agg
The output of FM_agg will be as follows:
0.3300681478007641
This time, our mean shift model has a higher Fowlkes-Mallows score than the agglomerative model, but both scores are still on the lower range of the score, which means that neither model is performing very well with regard to the true labels.
In conclusion, with the extrinsic approach evaluation, neither of our models were able to find clusters containing wines of a similar quality based on their physicochemical properties. We will confirm this by using the intrinsic approach evaluation to ensure that our models' clusters are well defined and are properly grouping similar wines together.
a. Begin with the Silhouette Coefficient:
Sil_mean = metrics.silhouette_score(scaled_features,
label_mean_shift)
Sil_agg = metrics.silhouette_score(scaled_features,
label_agglomerative)
Sil_mean
The output of Sil_mean will be as follows:
0.32769323700400077
Next, enter Sil_agg to get the expected values:
Sil_agg
The output of Sil_agg will be as follows:
0.1591882574407987
Our mean shift model has a higher Silhouette Coefficient than the agglomerative model, but both scores are very close to 0, which means that both models have overlapping clusters.
b. Next, find the Calinski-Harabasz index:
CH_mean = metrics.calinski_harabasz_score(scaled_features,
label_mean_shift)
CH_agg = metrics.calinski_harabasz_score(scaled_features,
label_agglomerative)
CH_mean
The output of CH_mean will be as follows:
44.62091774102674
Next, enter CH_agg to get the expected values:
CH_agg
The output of CH_agg will be as follows:
223.5171774491095
Our agglomerative model has a much higher Calinski-Harabasz index than the mean shift model, which means that the agglomerative model has much more dense and well-defined clusters than the mean shift model.
c. Finally, find the Davies-Bouldin index:
DB_mean = metrics.davies_bouldin_score(scaled_features,
label_mean_shift)
DB_agg = metrics.davies_bouldin_score(scaled_features,
label_agglomerative)
DB_mean
The output of DB_mean will be as follows:
0.8106334674570222
Next, enter DB_agg to get the expected values:
DB_agg
The output of DB_agg will be as follows:
1.4975443816135114
Our agglomerative model has a higher David-Bouldin index than the mean shift model, but both scores are close to 0, which means that both models are performing well with regard to the definition of their clusters.
Note
To access the source code for this specific section, please refer to https://packt.live/2YXMl0U.
You can also run this example online at https://packt.live/2Bs7sAp.
You must execute the entire Notebook in order to get the desired result.
In conclusion, with the intrinsic approach evaluation, both our models were well defined and confirm our intuition on the red wine dataset, that is, similar physicochemical properties are not associated with similar quality. We were also able to see that in most of our scores, the agglomerative hierarchical model performs better than the mean shift model.
Solution:
import tensorflow.keras.datasets.mnist as mnist
(features_train, label_train),
(features_test, label_test) = mnist.load_data()
label_train
The expected output is this:
array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)
The label column contains numeric values that correspond to the 10 handwritten digits: 0 to 9.
features_train.shape
The expected output is this:
(60000, 28, 28)
The training set is composed of 60,000 observations of shape 28 by 28. We will need to flatten the input for our neural network.
features_test.shape
The expected output is this:
(10000, 28, 28)
The testing set is composed of 10,000 observations of shape 28 by 28.
features_train = features_train / 255.0
features_test = features_test / 255.0
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
np.random.seed(8)
tf.random.set_seed(8)
model = tf.keras.Sequential()
input_layer = layers.Flatten(input_shape=(28,28))
layer1 = layers.Dense(128, activation='relu')
final_layer = layers.Dense(10, activation='softmax')
model.add(input_layer)
model.add(layer1)
model.add(layers.Dropout(0.25))
model.add(final_layer)
optimizer = tf.keras.optimizers.Adam(0.001)
model.compile(loss='sparse_categorical_crossentropy',
optimizer=optimizer,
metrics=['accuracy'])
model.summary()
The expected output is this:
This output summarizes the architecture of our neural networks. We can see it is composed of four layers with one flatten layer, two dense layers, and one dropout layer.
callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
patience=5)
model.fit(features_train, label_train, epochs=10,
validation_split = 0.2,
callbacks=[callback], verbose=2)
The expected output is this:
We achieved an accuracy score of 0.9825 for the training set and 0.9779 for the validation set for recognizing hand-written digits after just 10 epochs. These are amazing results. In this section, you learned how to build and train a neural network from scratch using TensorFlow to classify digits.
Note
To access the source code for this specific section, please refer to https://packt.live/37UWf7E.
You can also run this example online at https://packt.live/317R2b3.
You must execute the entire Notebook in order to get the desired result.
Solution:
import tensorflow.keras.datasets.fashion_mnist as fashion_mnist
(features_train, label_train),
(features_test, label_test) = fashion_mnist.load_data()
features_train.shape
The expected output is this:
(60000, 28, 28)
The training set is composed of 60,000 images of size 28*28.
features_test.shape
The expected output is this:
(10000, 28, 28)
The testing set is composed of 10,000 images of size 28*28.
features_train = features_train.reshape(60000, 28, 28, 1)
features_test = features_test.reshape(10000, 28, 28, 1)
features_train = features_train / 255.0
features_test = features_test / 255.0
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
np.random.seed(8)
tf.random.set_seed(8)
model = tf.keras.Sequential()
conv_layer1 = layers.Conv2D(64, (3,3),
activation='relu', input_shape=(28, 28, 1))
conv_layer2 = layers.Conv2D(64, (3,3), activation='relu')
fc_layer1 = layers.Dense(128, activation='relu')
fc_layer2 = layers.Dense(10, activation='softmax')
model.add(conv_layer1)
model.add(layers.MaxPooling2D(2, 2))
model.add(conv_layer2)
model.add(layers.MaxPooling2D(2, 2))
model.add(layers.Flatten())
model.add(fc_layer1)
model.add(fc_layer2)
optimizer = tf.keras.optimizers.Adam(0.001)
model.compile(loss='sparse_categorical_crossentropy',
optimizer=optimizer, metrics=['accuracy'])
model.summary()
The expected output is this:
The summary shows us that there are more than 240,000 parameters to be optimized with this model.
model.fit(features_train, label_train,
epochs=5, validation_split = 0.2, verbose=2)
The expected output is this:
After training for 5 epochs, we achieved an accuracy score of 0.925 for the training set and 0.9042 for the validation set. Our model is overfitting a bit.
model.evaluate(features_test, label_test)
The expected output is this:
10000/10000 [==============================] - 1s 108us/sample - loss: 0.2746 - accuracy: 0.8976
[0.27461639745235444, 0.8976]
We achieved an accuracy score of 0.8976 on the testing set for predicting images of clothing from the Fashion MNIST dataset. You can try on your own to improve this score and reduce the overfitting.
Note
To access the source code for this specific section, please refer to https://packt.live/2Nzt6pn.
You can also run this example online at https://packt.live/2NlM5nd.
You must execute the entire Notebook in order to get the desired result.
In this activity, we designed and trained a CNN architecture for recognizing images of clothing from the Fashion MNIST dataset.
Solution:
import pandas as pd
import numpy as np
file_url = 'https://raw.githubusercontent.com/'
'PacktWorkshops/'
'The-Applied-Artificial-Intelligence-Workshop/'
'master/Datasets/yahoo_spx.csv'
df = pd.read_csv(file_url)
stock_data = df.iloc[:, 1:2].values
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler()
stock_data_scaled = sc.fit_transform(stock_data)
X_data = []
y_data = []
window = 30
for i in range(window, len(df)):
X_data.append(stock_data_scaled[i - window:i, 0])
y_data.append(stock_data_scaled[i, 0])
y_data will contain the opening stock price for each day and X_data will contain the last 30 days' stock prices.
X_data = np.array(X_data)
y_data = np.array(y_data)
X_data = np.reshape(X_data, (X_data.shape[0],
X_data.shape[1], 1))
features_train = X_data[:1000]
label_train = y_data[:1000]
features_test = X_data[:1000]
label_test = y_data[:1000]
import numpy as np
import tensorflow as tf
from tensorflow.keras import layers
np.random.seed(8)
tf.random.set_seed(8)
model = tf.keras.Sequential()
lstm_layer1 = layers.LSTM(units=50,return_sequences=True,
input_shape=(features_train.shape[1], 1))
lstm_layer2 = layers.LSTM(units=50,return_sequences=True)
lstm_layer3 = layers.LSTM(units=50,return_sequences=True)
lstm_layer4 = layers.LSTM(units=50)
fc_layer = layers.Dense(1)
model.add(lstm_layer1)
model.add(layers.Dropout(0.2))
model.add(lstm_layer2)
model.add(layers.Dropout(0.2))
model.add(lstm_layer3)
model.add(layers.Dropout(0.2))
model.add(lstm_layer4)
model.add(layers.Dropout(0.2))
model.add(fc_layer)
optimizer = tf.keras.optimizers.Adam(0.001)
model.compile(loss='mean_squared_error',
optimizer=optimizer, metrics=['mse'])
model.summary()
The expected output is this:
The summary shows us that there are more than 71,051 parameters to be optimized with this model.
model.fit(features_train, label_train, epochs=10,
validation_split = 0.2, verbose=2)
The expected output is this:
After training for 10 epochs, we achieved a mean squared error score of 0.0025 for the training set and 0.0033 for the validation set. Our model is overfitting a little bit.
model.evaluate(features_test, label_test)
The expected output is this:
1000/1000 [==============================] - 0s 279us/sample - loss: 0.0016 - mse: 0.0016
[0.00158528157370165, 0.0015852816]
We achieved a mean squared error score of 0.0017 on the testing set, which means we can quite accurately predict the stock price of Yahoo using the last 30 days' stock price data as features.
Note
To access the source code for this specific section, please refer to https://packt.live/3804U8P.
You can also run this example online at https://packt.live/3hWtU5l.
You must execute the entire Notebook in order to get the desired result.
In this activity, we designed and trained an RNN model to predict the Yahoo stock price from the previous 30 days of data.