Appendix

About

This section is included to assist the students to perform the activities in the book. It includes detailed steps that are to be performed by the students to achieve the objectives of the activities.

Chapter 1: Principles of AI

In the code, backslash () indicates a line break, where the code does not fit a line. A backslash at the end of the line escapes the newline character. This means that the content in the line following the backslash should be read as if it started where the backslash character is.

Activity 1: Generating All Possible Sequences of Steps in the tic-tac-toe Game

This section will explore the combinatoric explosion possible when two players play randomly. We will be using a program, building on the previous results that generate all possible sequences of moves between a computer player and a human player. Determine the number of different wins, losses, and draws in terms of action sequences. Assume that the human player may make any possible move. In this example, given that the computer player is playing randomly, we will examine the wins, losses, and draws belonging to two randomly playing players:

Create a function that maps the all_moves_from_board function on each element of a list of boards. This way, we will have all of the nodes of a decision tree in each depth:
def all_moves_from_board(board, sign):
    move_list = []
    for i, v in enumerate(board):
        if v == EMPTY_SIGN:
            move_list.append(board[:i] + sign + board[i+1:])
    return move_list
The decision tree starts with [ EMPTY_SIGN * 9 ], and expands after each move:
all_moves_from_board_list( [ EMPTY_SIGN * 9 ], AI_SIGN )
The output is as follows:
['X........',
'.X.......',
'..X......',
'...X.....',
'....X....',
'.....X...',
'......X..',
'.......X.',
'........X']
['XO.......',
'X.O......',
'X..O.....',
'X...O....',
'X....O...',
'X.....O..',
'X......O.',
.
.
.
.
'......OX.',
'.......XO',
'O.......X',
'.O......X',
'..O.....X',
'...O....X',
'....O...X',
'.....O..X',
'......O.X',
'.......OX']
Let's create a filter_wins function that takes the ended games out from the list of moves and appends them in an array containing the board states won by the AI player and the opponent player:
def filter_wins(move_list, ai_wins, opponent_wins):
    for board in move_list:
        won_by = game_won_by(board)
        if won_by == AI_SIGN:
            ai_wins.append(board)
            move_list.remove(board)
        elif won_by == OPPONENT_SIGN:
            opponent_wins.append(board)
            move_list.remove(board)
In this function, the three lists can be considered as reference types. This means that the function does not return a value, instead but it manipulating these three lists without returning them.
Let's finish this section. Then with a count_possibilities function that prints the number of decision tree leaves that ended with a draw, won by the first player, and won by the second player:
def count_possibilities():
    board = EMPTY_SIGN * 9
    move_list = [board]
    ai_wins = []
    opponent_wins = []
    for i in range(9):
        print('step ' + str(i) + '. Moves: ' +         str(len(move_list)))
        sign = AI_SIGN if i % 2 == 0 else OPPONENT_SIGN
        move_list = all_moves_from_board_list(move_list, sign)
        filter_wins(move_list, ai_wins, opponent_wins)
    print('First player wins: ' + str(len(ai_wins)))
    print('Second player wins: ' + str(len(opponent_wins)))
    print('Draw', str(len(move_list)))
    print('Total', str(len(ai_wins) + len(opponent_wins) +     len(move_list)))
We have up to 9 steps in each state. In the 0th, 2nd, 4th, 6th, and 8th iteration, the AI player moves. In all other iterations, the opponent moves. We create all possible moves in all steps and take out the ended games from the move list.
Then execute the number of possibilities to experience the combinatoric explosion.
count_possibilities()
The output is as follows:
step 0. Moves: 1
step 1. Moves: 9
step 2. Moves: 72
step 3. Moves: 504
step 4. Moves: 3024
step 5. Moves: 13680
step 6. Moves: 49402
step 7. Moves: 111109
step 8. Moves: 156775
First player wins: 106279
Second player wins: 68644
Draw 91150
Total 266073

As you can see, the tree of board states consists of 266,073 leaves. The count_possibilities function essentially implements a breadth first search algorithm to traverse all the possible states of the game. Notice that we do count these states multiple times, because placing an X on the top-right corner on step 1 and placing an X on the top-left corner on step 3 leads to similar possible states as starting with the top-left corner and then placing an X on the top-right corner. If we implemented a detection of duplicate states, we would have to check less nodes. However, at this stage, due to the limited depth of the game, we omit this step.

Chapter 2: AI with Search Techniques and Games

Activity 2: Teach the agent realize situations when it defends against losses

Follow these steps to complete the activity:

Create a function player_can_win such that it takes all moves from the board using the all_moves_from_board function and iterates over it using a variable next_move. On each iteration, it checks if the game can be won by the sign, then it return true else false.
def player_can_win(board, sign):
    next_moves = all_moves_from_board(board, sign)
    for next_move in next_moves:
        if game_won_by(next_move) == sign:
            return True
    return False
We will extend the AI move such that it prefers making safe moves. A move is safe if the opponent cannot win the game in the next step.
def ai_move(board):
    new_boards = all_moves_from_board(board, AI_SIGN)
    for new_board in new_boards:
        if game_won_by(new_board) == AI_SIGN:
            return new_board
    safe_moves = []
    for new_board in new_boards:
        if not player_can_win(new_board, OPPONENT_SIGN):
            safe_moves.append(new_board)
    return choice(safe_moves) if len(safe_moves) > 0 else         new_boards[0]
You can test our new application. You will find the AI has made the correct move.
We will now place this logic in the state space generator and check how well the computer player is doing by generating all the possible games.
def all_moves_from_board( board, sign ):
We will now place this logic in the state space generator and check how well the computer player is doing by generating all the possible games.
def all_moves_from_board(board, sign):
    move_list = []
    for i, v in enumerate(board):
        if v == EMPTY_SIGN:
            new_board = board[:i] + sign + board[i+1:]
            move_list.append(new_board)
            if game_won_by(new_board) == AI_SIGN:
                return [new_board]
    if sign == AI_SIGN:
        safe_moves = []
        for move in move_list:
            if not player_can_win(move, OPPONENT_SIGN):
                safe_moves.append(move)
        return safe_moves if len(safe_moves) > 0 else             move_list[0:1]
    else:
        return move_list
Count the possibilities that as possible.
count_possibilities()
The output is as follows:
step 0. Moves: 1
step 1. Moves: 9
step 2. Moves: 72
step 3. Moves: 504
step 4. Moves: 3024
step 5. Moves: 5197
step 6. Moves: 18606
step 7. Moves: 19592
step 8. Moves: 30936
First player wins: 20843
Second player wins: 962
Draw 20243
Total 42048

We are doing better than before. We not only got rid of almost 2/3 of possible games again, but most of the time, the AI player either wins or settles for a draw. Despite our effort to make the AI better, it can still lose in 962 ways. We will eliminate all these losses in the next activity.

Activity 3: Fix the first and second moves of the AI to make it invincible

Follow these steps to complete the activity:

We will count the number of empty fields in the board and make a hard-coded move in case there are 9 or 7 empty fields. You can experiment with different hard coded moves. We found that occupying any corner, then occupying the opposite corner leads to no losses. If the opponent occupied the opposite corner, making a move in the middle results in no losses.
def all_moves_from_board(board, sign):
    if sign == AI_SIGN:
        empty_field_count = board.count(EMPTY_SIGN)
        if empty_field_count == 9:
            return [sign + EMPTY_SIGN * 8]
        elif empty_field_count == 7:
            return [
                board[:8] + sign if board[8] ==                     EMPTY_SIGN else
                board[:4] + sign + board[5:]
            ]
    move_list = []
    for i, v in enumerate(board):
        if v == EMPTY_SIGN:
            new_board = board[:i] + sign + board[i+1:]
            move_list.append(new_board)
            if game_won_by(new_board) == AI_SIGN:
                return [new_board]
    if sign == AI_SIGN:
        safe_moves = []
        for move in move_list:
            if not player_can_win(move, OPPONENT_SIGN):
                safe_moves.append(move)
        return safe_moves if len(safe_moves) > 0 else             move_list[0:1]
    else:
        return move_list
Let's verify the state space
countPossibilities()
The output is as follows:
step 0. Moves: 1
step 1. Moves: 1
step 2. Moves: 8
step 3. Moves: 8
step 4. Moves: 48
step 5. Moves: 38
step 6. Moves: 108
step 7. Moves: 76
step 8. Moves: 90
First player wins: 128
Second player wins: 0
Draw 60
After fixing the first two steps, we only need to deal with 8 possibilities instead of 504. We also guided the AI into a state, where the hard-coded rules were sufficient for never losing a game.
Fixing the steps is not important because we would give the AI hard coded steps to start with, but it is important, because it is a tool to evaluate and compare each step.
After fixing the first two steps, we only need to deal with 8 possibilities instead of 504. We also guided the AI into a state, where the hard-coded rules were sufficient for never losing a game.

Activity 4: Connect Four

This section will practice using the EasyAI library and develop a heuristic. We will be using connect four game. The game board is seven cells wide and cells high. When you make a move, you can only select the column in which you drop your token. Then gravity pulls the token down to the lowest possible empty cell. Your objective is to connect four of your own tokens horizontally, vertically, or diagonally, before your opponent does this, or you run out of empty spaces. The rules of the game can be found at: https://en.wikipedia.org/wiki/Connect_Four

Let's set up the TwoPlayersGame framework:
from easyAI import TwoPlayersGame
from easyAI.Player import Human_Player
class ConnectFour(TwoPlayersGame):
    def __init__(self, players):
        self.players = players
    def possible_moves(self):
        return []
    def make_move(self, move):
        return
    def unmake_move(self, move):
# optional method (speeds up the AI)
        return
    def lose(self):
        return False
    def is_over(self):
        return (self.possible_moves() == []) or self.lose()
    def show(self):
        print ('board')
    def scoring(self):
        return -100 if self.lose() else 0

if __name__ == "__main__":
    from easyAI import AI_Player, Negamax
    ai_algo = Negamax(6)
We can leave a few functions from the definition intact. We have to implement the following methods:
__init__
possible_moves
make_move
unmake_move (optional)
lose
show
We will reuse the basic scoring function from tic-tac-toe. Once you test out the game, you will see that the game is not unbeatable, but plays surprisingly well, even though we are only using basic heuristics.
Let's write the init method. We will define the board as a one-dimensional list, similar to the tic-tac-toe example. We could use a two-dimensional list too, but modeling will not get much easier or harder. Beyond making initializations like we did in the tic-tac-toe game, we will work a bit ahead. We will generate all of the possible winning combinations in the game and save them for future use:
def __init__(self, players):
        self.players = players
        # 0 1 2 3 4 5 6
        # 7 8 9 10 11 12 13
        # ...
        # 35 36 37 38 39 40 41
        self.board = [0 for i in range(42)]
        self.nplayer = 1 # player 1 starts.
        def generate_winning_tuples():
            tuples = []
            # horizontal
            tuples += [
                list(range(row*7+column, row*7+column+4, 1))
                for row in range(6)
                for column in range(4)]
            # vertical
            tuples += [
                list(range(row*7+column, row*7+column+28, 7))
                for row in range(3)
                for column in range(7)
            ]
            # diagonal forward
            tuples += [
                list(range(row*7+column, row*7+column+32, 8))
                for row in range(3)
                for column in range(4)
            ]
            # diagonal backward
            tuples += [
                list(range(row*7+column, row*7+column+24, 6))
                for row in range(3)
                for column in range(3, 7, 1)
            ]
            return tuples
self.tuples=generate_winning_tuples()
Let's handle the moves. The possible moves function is a simple enumeration. Notice we are using column indices from 1 to 7 in the move names, because it is more convenient to start column indexing with 1 in the human player interface than with zero. For each column, we check if there is an unoccupied field. If there is one, we will make the column a possible move.
def possible_moves(self):
        return [column+1
                for column in range(7)
                if any([
                    self.board[column+row*7] == 0
                    for row in range(6)
                ])
                ]
Making a move is similar to the possible moves function. We check the column of the move, and find the first empty cell starting from the bottom. Once we find it, we occupy it. You can also read the implementation of the dual of the make_move function: unmake_move. In the unmake_move function, we check the column from top to down, and we remove the move at the first non-empty cell. Notice we rely on the internal representation of easyAi so that it does not undo moves that it had not made. Otherwise, this function would remove a token of the other player without checking whose token got removed.
def make_move(self, move):
        column = int(move) - 1
        for row in range(5, -1, -1):
            index = column + row*7
            if self.board[index] == 0:
                self.board[index] = self.nplayer
                return
    def unmake_move(self, move):
# optional method (speeds up the AI)
        column = int(move) - 1
        for row in range(6):
            index = column + row*7
            if self.board[index] != 0:
                self.board[index] = 0
                return
As we already have the tuples that we have to check, we can mostly reuse the lose function from the tic-tac-toe example.
def lose(self):
        return any([all([(self.board[c] == self.nopponent)
                         for c in line])
                    for line in self.tuples])
    def is_over(self):
        return (self.possible_moves() == []) or self.lose()
Our last task is the show method that prints the board. We will reuse the tic-tac-toe implementation, and just change the variables.
def show(self):
        print(' '+' '.join([
            ' '.join([['.', 'O', 'X'][self.board[7*row+column]]
                     for column in range(7)]
                     )
            for row in range(6)])
        )

Now that all functions are complete, you can try out the example. Feel free to play a round or two against the opponent. You can see that the opponent is not perfect, but it plays reasonably well. If you have a strong computer, you can increase the parameter of the Negamax algorithm. I encourage you to come up with a better heuristic.

Chapter 3: Regression

Activity 5: Predicting Population

You are working at the government office of Metropolis, trying to forecast the need for elementary school capacity. Your task is to figure out a 2025 and 2030 prediction for the number of children starting elementary school. Past data are as follows:

Figure 3.21 Data of Elementary School

Plot tendencies on a two-dimensional chart. Use linear regression.

Our features are the years ranging from 2001 to 2018. For simplicity, we can indicate 2001 as year 1, and 2018 as year 18.

x = np.array(range(1, 19))

y = np.array([

147026,

144272,

140020,

143801,

146233,

144539,

141273,

135389,

142500,

139452,

139722,

135300,

137289,

136511,

132884,

125683,

127255,

124275

])

Use np.polyfit to determine the coefficients of the regression line.

[a, b] = np.polyfit(x, y, 1)

[-1142.0557275541753, 148817.5294117646]

Plot the results using matplotlib.pyplot to determine future tendencies.

import matplotlib.pyplot as plot

plot.scatter( x, y )

plot.plot( [0, 30], [b, 30*a+b] )

plot.show()

Activity 6: Stock Price Prediction with Quadratic and Cubic Linear Polynomial Regression with Multiple Variables

This section will discuss how to perform linear, polynomial, and support vector regression with scikit-learn. We will also learn to predict the best fit model for a given task. We will be assuming that you are a software engineer at a financial institution and your employer wants to know whether linear regression, or support vector regression is a better fit for predicting stock prices. You will have to load all data of the S&P 500 from a data source. Then build a regressor using linear regression, cubic polynomial linear regression, and a support vector regression with a polynomial kernel of degree 3. Then separate training and test data. Plot the test labels and the prediction results and compare them with the y=x line. And finally, compare how well the three models score.

Let's load the S&P 500 index data using Quandl, then prepare the data for prediction. You can read the process in the Predicting the Future section of the topic Linear Regression with Multiple Variables.

import quandl

import numpy as np

from sklearn import preprocessing

from sklearn import model_selection

from sklearn import linear_model

from sklearn.preprocessing import PolynomialFeatures

from matplotlib import pyplot as plot

from sklearn import svm

data_frame = quandl.get("YALE/SPCOMP")

data_frame[['Long Interest Rate', 'Real Price',

'Real Dividend', 'Cyclically Adjusted PE Ratio']]

data_frame.fillna(-100, inplace=True)

# We shift the price data to be predicted 20 years forward

data_frame['Real Price Label'] = data_frame['RealPrice'].shift(-240)

# Then exclude the label column from the features

features = np.array(data_frame.drop('Real Price Label', 1))

# We scale before dropping the last 240 rows from the features

scaled_features = preprocessing.scale(features)

# Save the last 240 rows before dropping them

scaled_features_latest240 = scaled_features[-240:]

# Exclude the last 240 rows from the data used for # # modelbuilding

scaled_features = scaled_features[:-240]

# Now we can drop the last 240 rows from the data frame

data_frame.dropna(inplace=True)

# Then build the labels from the remaining data

label = np.array(data_frame['Real Price Label'])

# The rest of the model building stays

(features_train,

features_test,

label_train,

label_test

) = model_selection.train_test_split(

scaled_features,

label,

test_size=0.1

)

Let's first use a polynomial of degree 1 for the evaluation of the model and for the prediction. We are still recreating the main example from the second topic.

model = linear_model.LinearRegression()

model.fit(features_train, label_train)

model.score(features_test, label_test)

The output is as follows:
0.8978136465083912
The output always depends on the test data, so the values may differ after each run.
label_predicted = model.predict(features_test)
plot.plot(
label_test, label_predicted, 'o',
[0, 3000], [0, 3000]
)

Fig 3.22: Graph showing the output

The closer the dots are to the y=x line, the less error the model works with.

It is now time to perform a linear multiple regression with quadratic polynomials. The only change is in the Linear Regression model

poly_regressor = PolynomialFeatures(degree=3)

poly_scaled_features = poly_regressor.fit_transform(scaled_features)

(poly_features_train,

poly_features_test,

poly_label_train,

poly_label_test) = model_selection.train_test_split(

poly_scaled_features,

label,

test_size=0.1)

model = linear_model.LinearRegression()

model.fit(poly_features_train, poly_label_train)

print('Polynomial model score: ', model.score(

poly_features_test, poly_label_test))

print(' ')

poly_label_predicted = model.predict(poly_features_test)

plot.plot(

poly_label_test, poly_label_predicted, 'o',

[0, 3000], [0, 3000]

)

The model is performing surprisingly well on test data. Therefore, we can already suspect our polynomials are overfitting for scenarios used in training and testing.

We will now perform a Support Vector regression with a polynomial kernel of degree 3.

model = svm.SVR(kernel='poly')

model.fit(features_train, label_train)

label_predicted = model.predict(features_test)

plot.plot(

label_test, label_predicted, 'o',

[0,3000], [0,3000]

)

Fig 3.23: Graph showing the output

model.score(features_test, label_test)

The output will be 0.06388628722032952.

We will now perform a Support Vector regression with a polynomial kernel of degree 3.

Chapter 4: Classification

Activity 7: Preparing Credit Data for Classification

This section will discuss how to prepare data for a classifier. We will be using german.data from https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/, as an example and prepare the data for training and testing a classifier. Make sure all your labels are numeric, and the values are prepared for classification. Use 80% of the data points as training data.

Save german.data from https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/, and open it in a text editor like Sublime Text or Atom. Add the following first row to it:
CheckingAccountStatus DurationMonths CreditHistory CreditPurpose CreditAmount SavingsAccount EmploymentSince DisposableIncomePercent PersonalStatusSex OtherDebtors PresentResidenceMonths Property Age OtherInstallmentPlans Housing NumberOfExistingCreditsInBank Job LiabilityNumberOfPeople Phone ForeignWorker CreditScore
Import the data file using pandas and replace NA values with an outlier value:
import pandas
data_frame = pandas.read_csv('german.data', sep=' ')
data_frame.replace('NA', -1000000, inplace=True)
Perform label encoding. We need to transform all labels in the data frame to integers. We could create all labels in a one dimensional array. However, this would be highly ineffective, because each label occurs in exactly one column. It makes a lot more sense to group our labels per column:
labels = {
'CheckingAccountStatus': ['A11', 'A12', 'A13', 'A14'],
'CreditHistory': ['A30', 'A31', 'A32', 'A33', 'A34'],
'CreditPurpose': ['A40', 'A41', 'A42', 'A43', 'A44', 'A45', 'A46', 'A47', 'A48', 'A49', 'A410'],
'SavingsAccount': ['A61', 'A62', 'A63', 'A64', 'A65'],
'EmploymentSince': ['A71', 'A72', 'A73', 'A74', 'A75'],
'PersonalStatusSex': ['A91', 'A92', 'A93', 'A94', 'A95'],
'OtherDebtors': ['A101', 'A102', 'A103'],
'Property': ['A121', 'A122', 'A123', 'A124'],
'OtherInstallmentPlans': ['A141', 'A142', 'A143'],
'Housing': ['A151', 'A152', 'A153'],
'Job': ['A171', 'A172', 'A173', 'A174'],
'Phone': ['A191', 'A192'],
'ForeignWorker': ['A201', 'A202']
}
Let's create a label encoder for each column and encode the values:
from sklearn import preprocessing
label_encoders = {}
data_frame_encoded = pandas.DataFrame()
for column in data_frame:
    if column in labels:
        label_encoders[column] = preprocessing.LabelEncoder()
        label_encoders[column].fit(labels[column])
        data_frame_encoded[column] = label_encoders[
            column].transform(data_frame[column])
    else:
        data_frame_encoded[column] = data_frame[column]
Let's verify that we did everything correctly:

data_frame_encoded.head()

CheckingAccountStatus DurationMonths CreditHistory CreditPurpose

0 0 6 4 4

1 1 48 2 4

2 3 12 4 7

3 0 42 2 3

4 0 24 3 0

CreditAmount SavingsAccount EmploymentSince DisposableIncomePercent

0 1169 4 4 4

1 5951 0 2 2

2 2096 0 3 2

3 7882 0 3 2

4 4870 0 2 3

PersonalStatusSex OtherDebtors ... Property Age

0 2 0 ... 0 67

1 1 0 ... 0 22

2 2 0 ... 0 49

3 2 2 ... 1 45

4 2 0 ... 3 53

OtherInstallmentPlans Housing NumberOfExistingCreditsInBank Job

0 2 1 2 2

1 2 1 1 2

2 2 1 1 1

3 2 2 1 2

4 2 2 2 2

LiabilityNumberOfPeople Phone ForeignWorker CreditScore

0 1 1 0 1

1 1 0 0 2

2 2 0 0 1

3 2 0 0 1

4 2 0 0 2

[5 rows x 21 columns]

label_encoders

{'CheckingAccountStatus': LabelEncoder(),

'CreditHistory': LabelEncoder(),

'CreditPurpose': LabelEncoder(),

'EmploymentSince': LabelEncoder(),

'ForeignWorker': LabelEncoder(),

'Housing': LabelEncoder(),

'Job': LabelEncoder(),

'OtherDebtors': LabelEncoder(),

'OtherInstallmentPlans': LabelEncoder(),

'PersonalStatusSex': LabelEncoder(),

'Phone': LabelEncoder(),

'Property': LabelEncoder(),

'SavingsAccount': LabelEncoder()}

All the 21 columns are available, and the label encoders have been saved in an object too. Our data are now pre-processed.

You don't need to save these label encoders if you don't wish to decode the encoded values. We just saved them for the sake of completeness.

It is time to separate features from labels. We can apply the same method as the one we saw in the theory section:
import numpy as np
features = np.array(
data_frame_encoded.drop(['CreditScore'], 1)
)
label = np.array(data_frame_encoded['CreditScore'])
Our features are not yet scaled. This is a problem, because the credit amount distances can be significantly higher than the differences in age for instance.
We must perform scaling of the training and testing data together, therefore, the latest step when we can still perform scaling is before we split training data from testing data.
Let's use a Min-Max scaler from scikit's Preprocessing library:
scaled_features = preprocessing.MinMaxScaler(
feature_range=(0,1)).fit_transform(features)
The final step is cross-validation. We will shuffle our data, and use 80% of all data for training, 20% for testing.
from sklearn import model_selection
features_train, features_test, label_train,
label_test = model_selection.train_test_split(
    scaled_features,
    label,
    test_size = 0.2
)

Activity 8: Increase the accuracy of credit scoring

This section will learn how the parametrization of the k-nearest neighbor classifier affects the end result. The accuracy of credit scoring is currently quite low: 66.5%. Find a way to increase it by a few percentage points. And to ensure that it happens correctly, you will need to do the previous exercises.

There are many ways to accomplish this exercise. In this solution, I will show you one way to increase the credit score by changing the parametrization.

You must have completed Exercise 13, to be able to complete this activity.

Increase the K-value of the k-nearest neighbor classifier from the default 5 to 10, 15, 25, and 50. Evaluate the results:
You must have completed Exercise 13, to be able to complete this activity
classifier = neighbors.KNeighborsClassifier(n_neighbors=10)
classifier.fit(
features_train,label_train
)
classifier.score(features_test, label_test)
After running these lines for all four n_neighbors values, I got the following results:
K=10: accuracy is 71.5%
K=15: accuracy is 70.5%
K=25: accuracy is 72%
K=50: accuracy is 74%
Higher K values do not necessarily mean better score. In this example though, K=50 yielded a better result than K=5.

Activity 9: Support Vector Machine Optimization in scikit-learn

This section will discuss how to use the different parameters of a Support Vector Machine classifier. We will be using comparing and contrasting the different support vector regression classifier parameters you learned and find a set of parameters resulting in the highest classification data on the training and testing data loaded and prepared in previous activity. And to ensure that it happens correctly, you will need to have completed the previous activities and exercises.

We will try out a few combinations. You may choose different parameters, that

Linear kernel
classifier = svm.SVC(kernel="linear")
classifier.fit(features_train, label_train)
classifier.score(features_test, label_test)
Polynomial kernel of degree 4, C=2, gamma=0.05
classifier = svm.SVC(kernel="poly", C=2, degree=4, gamma=0.05)
classifier.fit(features_train, label_train)
classifier.score(features_test, label_test)
The output is as follows: 0.705.
Polynomial kernel of degree 4, C=2, gamma=0.25
classifier = svm.SVC(kernel="poly", C=2, degree=4, gamma=0.25)
classifier.fit(features_train, label_train)
classifier.score(features_test, label_test)
The output is as follows: 0.76.
Polynomial kernel of degree 4, C=2, gamma=0.5
classifier = svm.SVC(kernel="poly", C=2, degree=4, gamma=0.5)
classifier.fit(features_train, label_train)
classifier.score(features_test, label_test)
The output is as follows: 0.72.
Sigmoid kernel
classifier = svm.SVC(kernel="sigmoid")
classifier.fit(features_train, label_train)
classifier.score(features_test, label_test)
The output is as follows: 0.71.
Default kernel with a gamma of 0.15
classifier = svm.SVC(kernel="rbf", gamma=0.15)
classifier.fit(features_train, label_train)
classifier.score(features_test, label_test)
The output is as follows: 0.76.

Chapter 5: Using Trees for Predictive Analysis

Activity 10: Car Data Classification

This section will discuss how to build a reliable decision tree model capable of aiding your company in finding cars clients are likely to buy. We will be assuming that you are employed by a car rental agency focusing on building a lasting relationship with its clients. Your task is to build a decision tree model classifying cars into one of four categories: unacceptable, acceptable, good, very good.

The data set can be accessed here: https://archive.ics.uci.edu/ml/datasets/Car+Evaluation. Click the Data Folder link to download the data set. Click the Data Set Description link to access the description of the attributes.

Evaluate the utility of your decision tree model.

Download the car data file from here: https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data. Add a header line to the front of the CSV file to reference it in Python more easily:
Buying,Maintenance,Doors,Persons,LuggageBoot,Safety,Class
We simply call the label Class. We named the six features after their descriptions in https://archive.ics.uci.edu/ml/machine-learning-databases/car/car.names.
Load the data set into Python
import pandas
data_frame = pandas.read_csv('car.data')
Let's check if the data got loaded correctly:
data_frame.head()
Buying Maintenance Doors Persons LuggageBoot Safety Class
0 vhigh     vhigh     2     2     small    low unacc
1 vhigh     vhigh     2     2     small    med unacc
2 vhigh     vhigh     2     2     small high unacc
3 vhigh     vhigh     2     2         med    low unacc
4 vhigh     vhigh     2     2         med    med unacc
As classification works with numeric data, we have to perform label encoding as seen in previous chapter.
labels = {
    'Buying': ['vhigh', 'high', 'med', 'low'],
    'Maintenance': ['vhigh', 'high', 'med', 'low'],
    'Doors': ['2', '3', '4', '5more'],
    'Persons': ['2', '4', 'more'],
    'LuggageBoot': ['small', 'med', 'big'],
    'Safety': ['low', 'med', 'high'],
    'Class': ['unacc', 'acc', 'good', 'vgood']
}
from sklearn import preprocessing
label_encoders = {}
data_frame_encoded = pandas.DataFrame()
for column in data_frame:
    if column in labels:
        label_encoders[column] = preprocessing.LabelEncoder()
        label_encoders[column].fit(labels[column])
        data_frame_encoded[column] = label_encoders[column].transform(data_frame[column])
    else:
data_frame_encoded[column] = data_frame[column]
Let's separate features from labels:
import numpy as np
features = np.array(data_frame_encoded.drop(['Class'], 1))
label = np.array( data_frame_encoded['Class'] )
It is time to separate training and testing data with the cross-validation (in newer versions model-selection) featue of scikit-learn. We will use 10% test data:
from sklearn import model_selection
features_train, features_test, label_train, label_test = model_selection.train_test_split(
    features,
    label,
    test_size=0.1
)
Note that the train_test_split method will be available in model_selection module, not in the cross_validation module starting in scikit-learn 0.20. In previous versions, model_selection already contains the train_test_split method.
We have everything to build the decision tree classifier:
from sklearn.tree import DecisionTreeClassifier
decision_tree = DecisionTreeClassifier()
decision_tree.fit(features_train, label_train)
The output of the fit method is as follows:
DecisionTreeClassifier(
    class_weight=None,
    criterion='gini',
    max_depth=None,
    max_features=None,
    max_leaf_nodes=None,
    min_impurity_decrease=0.0,
    min_impurity_split=None,
    min_samples_leaf=1,
    min_samples_split=2,
    min_weight_fraction_leaf=0.0,
    presort=False,
    random_state=None,
    splitter='best'
)
You can see the parametrization of the decision tree classifier. There are quite a few options we could set to tweak the performance of the classifier model.
Let's score our model based on the test data:
decision_tree.score( features_test, label_test )
The output is as follows:
0.9884393063583815
This is the point where your knowledge up until chapter 4 would take you on model evaluation. We will now go a bit further and create a deeper evaluation of the model based on the classification_report feature we learned in this topic:
from sklearn.metrics import classification_report
print(
    classification_report(
        label_test,
        decision_tree.predict(features_test)
    )
)
The output is as follows:
             precision    recall f1-score support
         0     0.97     0.97     0.97        36
         1     1.00     1.00     1.00         5
         2     1.00     0.99     1.00     127
         3     0.83     1.00     0.91         5
avg / total     0.99     0.99     0.99     173
The model has been proven to be quite accurate. In case of such a high accuracy score, suspect the possibility of overfitting.

Activity 11: Random Forest Classification for your Car Rental Company

This section will optimize your classifier to satisfy your clients better when selecting future cars for your car fleet. We will be performing random forest and extreme random forest classification on your car dealership data set you worked on in Activity 1 of this chapter. Suggest further improvements to the model to improve the performance of the classifier.
We can reuse Steps 1 – 5 of Activity 1. The end of Step 5 looks as follows:
from sklearn import model_selection
features_train, features_test, label_train, label_test = model_selection.train_test_split(
    features,
    label,
    test_size=0.1
)
If you are using IPython, your variables may already be accessible in your console.
Let's create a Random Forest and an Extremely Randomized Trees classifier and train the models.
from sklearn.ensemble import RandomForestClassifier,ExtraTreesClassifier
random_forest_classifier = RandomForestClassifier(n_estimators=100, max_depth=6)
random_forest_classifier.fit(features_train, label_train)
extra_trees_classifier =ExtraTreesClassifier(
n_estimators=100, max_depth=6
)
extra_trees_classifier.fit(features_train, label_train)
Let's estimate how well the two models perform on the test data:
from sklearn.metrics import classification_report
print(
    classification_report(
        label_test,
        random_forest_classifier.predict(features_test)
    )
)
The output for model 1 is as follows:
                 precision    recall f1-score support
         0     0.78     0.78     0.78        36
         1     0.00     0.00     0.00         5
         2     0.94     0.98     0.96     127
         3     0.75     0.60     0.67         5
avg / total     0.87     0.90     0.89     173
The output for model 1 is as follows:
print(
    classification_report(
        label_test,
        extra_trees_classifier.predict(features_test)
    )
)
             precision    recall f1-score support
          0     0.72     0.72     0.72        36
          1     0.00     0.00     0.00         5
          2     0.93     1.00     0.96     127
          3     0.00     0.00     0.00         5
avg / total     0.83     0.88     0.86     173
We can also calculate the accuracy scores:
random_forest_classifier.score(features_test, label_test)
The output is as follows:
0.9017341040462428
The output for extraTreesClassifier is as follows:
extra_trees_classifier.score(features_test, label_test)
The output is as follows:
0.884393063583815
We can see that the random forest classifier is performing slightly better than the extra trees classifier.
As a first optimization technique, let's see which features are more important and which features are less important. Due to randomization, removing the least important features may reduce the random noise in the model.
random_forest_classifier.feature_importances_
The output is as follows:
array([0.12656512, 0.09934031, 0.02073233, 0.35550329, 0.05411809, 0.34374086])
The output for extra_trees_classifier is as follows:
extra_trees_classifier.feature_importances_
The output is as follows:
array([0.08699494, 0.07557066, 0.01221275, 0.38035005, 0.05879822, 0.38607338])
Both classifiers treats the third and the fifth attributes quite unimportant. We may not be sure about the fifth attribute, as the importance score is more than 5% in both models. However, we are quite certain that the third attribute is the least significant attribute in the decision. Let's see the feature names once again.
data_frame_encoded.head()
The output is as follows:
Buying Maintenance Doors Persons LuggageBoot Safety Class
0     3            3     0        0            2     1
1     3            3     0        0            2     2
2     3            3     0        0            2     0
3     3            3     0        0            1     1
4     3            3     0        0            1     2
The least important feature is Doors. It is quite evident in hindsight: the number of doors doesn't have as big of an influence in the car's rating than the safety rating for instance.
Remove the third feature from the model and retrain the classifier.
features2 = np.array(data_frame_encoded.drop(['Class', 'Doors'], 1))
label2 = np.array(data_frame_encoded['Class'])
features_train2,
features_test2,
label_train2,
label_test2 = model_selection.train_test_split(
    features2,
    label2,
    test_size=0.1
)
random_forest_classifier2 = RandomForestClassifier(
    n_estimators=100, max_depth=6
)
random_forest_classifier2.fit(features_train2, label_train2)
extra_trees_classifier2 = ExtraTreesClassifier(
    n_estimators=100, max_depth=6
)
extra_trees_classifier2.fit(features_train2, label_train2)
Let's compare how well the new models fare compared to the original ones:
print(
    classification_report(
        label_test2,
        random_forest_classifier2.predict(features_test2)
    )
)
The output is as follows:
            precision    recall f1-score support
         0     0.89     0.85     0.87        40
         1     0.00     0.00     0.00         3
         2     0.95     0.98     0.96     125
         3     1.00     1.00     1.00         5
avg / total     0.92     0.93     0.93     173
Second Model:
print(
    classification_report(
        label_test2,
        extra_trees_classifier2.predict(features_test2)
    )
)
The output is as follows:
            precision    recall f1-score support
         0     0.78     0.78     0.78        40
         1     0.00     0.00     0.00         3
         2     0.93     0.98     0.95     125
         3     1.00     0.40     0.57         5
avg / total     0.88     0.90     0.88     173
Although we did improve a few percentage points, note that a direct comparison is not possible, because of following reasons. First, the train-test split selects different data for training and testing. A few badly selected data points may easily cause a few percentage point increase or decrease in the scores. Second, the way how we train the classifiers also has random elements. This randomization may also shift the performance of the classifiers a bit. Always use best judgement when interpreting results and measure your results multiple times on different train-test splits if needed.
Let's tweak the parametrization of the classifiers a bit more. The following set of parameters increase the F1 Score of the Random Forest Classifier to 97%:
random_forest_classifier2 = RandomForestClassifier(
    n_estimators=150,
    max_ depth=8,
    criterion='entropy',
    max_features=5
)
random_forest_classifier2.fit(features_train2, label_train2)
print(
    classification_report(
        label_test2,
        random_forest_classifier2.predict(features_test2)
    )
)
The output is as follows:
           precision    recall f1-score support
          0     0.95     0.95     0.95        40
          1     0.50     1.00     0.67         3
          2     1.00     0.97     0.98     125
          3     0.83     1.00     0.91         5
avg / total     0.97     0.97     0.97     173
Using the same parameters on the Extra Trees Classifier, we also get surprisingly good results:
extra_trees_classifier2 = ExtraTreesClassifier(
    n_estimators=150,
    max_depth=8,
    criterion='entropy',
    max_features=5
)
extra_trees_classifier2.fit(features_train2, label_train2)
print(
    classification_report(
        label_test2,
        extra_trees_classifier2.predict(features_test2)
    )
)
The output is as follows:
            precision    recall f1-score support
          0     0.92     0.88     0.90        40
          1     0.40     0.67     0.50         3
          2     0.98     0.97     0.97     125
          3     0.83     1.00     0.91         5
avg / total     0.95     0.94     0.94     173

Chapter 6: Clustering

Activity 12: k-means Clustering of Sales Data

This section will detect product sales that perform similarly in nature to recognize trends in product sales.

We will be using the Sales Transactions Weekly Dataset from this URL:

https://archive.ics.uci.edu/ml/datasets/Sales_Transactions_Dataset_Weekly Perform clustering on the dataset using the k-means Algorithm. Make sure you prepare your data for clustering based on what you have learned in the previous chapters.

Use the default settings for the k-means algorithm.

Load the dataset using pandas.
import pandas
pandas.read_csv('Sales_Transactions_Dataset_Weekly.csv')
If you examine the data in the CSV file, you can realize that the first column contains product id strings. These values just add noise to the clustering process. Also notice that for weeks 0 to 51, there is a W-prefixed label and a Normalized label. Using the normalized label makes more sense, so we can drop the regular weekly labels from the data set.
import numpy as np
drop_columns = ['Product_Code']
for w in range(0, 52):
drop_columns.append('W' + str(w))
features = data_frame.drop(dropColumns, 1)
Our data points are normalized except for the min and max
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
scaled_features = scaler.fit_transform(features)
Create a k-means clustering model and fit the data points into 8 clusters.
from sklearn.cluster import KMeans
k_means_model = KMeans()
k_means_model.fit(scaled_features)
The labels belonging to each data point can be retrieved using the labels_ property. These labels determine the clustering of the rows of the original data frame.
k_means_model.labels_
Retrieve the center points and the labels from the clustering algorithm:
k_means_model.cluster_centers_
The output will be as follows:
array([5, 5, 4, 5, 5, 3, 4, 5, 5, 5, 5, 5, 4, 5, 0, 0, 0, 0, 0, 4, 4, 4,
       4, 0, 0, 5, 0, 0, 5, 0, 4, 4, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 5, 0, 0, 5, 0, 0, 0, 0, 0, 4, 0, 0, 5, 0, 0, 5, 0,
       ...
       1, 7, 3, 2, 6, 7, 6, 2, 2, 6, 2, 7, 2, 7, 2, 6, 1, 3, 2, 2, 6, 6,
       7, 7, 7, 1, 1, 2, 1, 2, 7, 7, 6, 2, 7, 6, 6, 6, 1, 6, 1, 6, 7, 7,
       1, 1, 3, 5, 3, 3, 3, 5, 7, 2, 2, 2, 3, 2, 2, 7, 7, 3, 3, 3, 3, 2,
       2, 6, 3, 3, 5, 3, 2, 2, 6, 7, 5, 2, 2, 2, 6, 2, 7, 6, 1])

How are these labels beneficial?

Suppose that in the original data frame, the product names are given. You can easily recognize that similar types of products sell similarly. There are also products that fluctuate a lot, and products that are seasonal in nature. For instance, if some products promoted fat loss and getting into shape, they tend to sell during the first half of the year, before the beach season.

Activity 13: Shape Recognition with the Mean Shift algorithm

This section will learn how images can be clustered. We will be assuming that you are working for a company detecting human emotions from photos. Your task is to extract pixels making up a face in an avatar photo.

Create a clustering algorithm with Mean Shift to cluster pixels of images. Examine the results of the Mean Shift algorithm and check if any of the clusters contains a face when used on avatar images.

Then apply the k-means algorithm with a fixed default number of clusters: 8. Compare your results with the Mean Shift clustering algorithm.

Select an image you would like to cluster and load the image.
We chose this image from the Author's Youtube channel:
Fig 7.13: An image with the Author's picture
The image size has been significantly reduced so that our algorithm would terminate more quickly.
image = Image.open('destructuring.jpg')
pixels = image.load()
Transform the pixels into a data frame to perform clustering
import pandas
data_frame = pandas.DataFrame(
    [[x,y,pixels[x,y][0], pixels[x,y][1], pixels[x,y][2]]
        for x in range(image.size[0])
        for y in range(image.size[1])
    ],
    columns=['x', 'y', 'r', 'g', 'b']
)
Perform Mean Shift clustering on the image using scikit-learn. Note that this time we will skip normalization of the features, because proximity of the pixels and proximity of color components are represented in close to equal weight. The largest difference in pixels distance is 750, while the largest difference in a color component is 256.
from sklearn.cluster import MeanShift
mean_shift_model = MeanShift()
mean_shift_model.fit(data_frame)
for i in range(len(mean_shift_model.cluster_centers_)):
    image = Image.open('destructuring.jpg')
    pixels = image.load()
    for j in range(len(data_frame)):
        if (mean_shift_model.labels_[j] != i ):
           pixels[ int(data_frame['x'][j]),
       int(data_frame['y'][j]) ] = (255, 255, 255)
    image.save( 'cluster' + str(i) + '.jpg' )
The algorithm found the following two clusters:

Fig 7.14: Images after performing k-means Clustering
The Mean Shift algorithm treated my skin and the yellow JavaScript and Destructuring text close enough to each other to form the same cluster.
Let's use the k-means algorithm to formulate eight clusters on the same data.
k_means_model = KMeans(n_clusters=8)
k_means_model.fit(data_frame)
for i in range(len(k_means_model.cluster_centers_)):
    image = Image.open('destructuring.jpg')
    pixels = image.load()
    for j in range(len(data_frame)):
        if (k_means_model.labels_[j] != i):
            pixels[int(data_frame['x'][j]), int(data_frame['y'][j])] = (255, 255, 255)
    image.save('kmeanscluster' + str(i) + '.jpg')
The 8 clusters are the following:
The output for the first is as follows:

Fig 7.15: Images after performing k-means Clustering

The output for the second is as follows:

Fig 7.16: Images after performing k-means Clustering

The output for the third is as follows:

Fig 7.17: Images after performing k-means Clustering

The output for the fourth is as follows:

Fig 7.18: Images after performing k-means Clustering

The output for the fifth is as follows:

Fig 7.19: Images after performing k-means Clustering

The output for the sixth is as follows:

Fig 7.20: Images after performing k-means Clustering

The output for the seventh is as follows:

Fig 7.21: Images after performing k-means Clustering

The output for the eighth is as follows:

Fig 7.22: Images after performing k-means Clustering

As you can see, the fifth cluster recognized my face quite well. The clustering algorithm indeed located data points that are close and contain similar colors.

Chapter 7: Deep Learning with Neural Networks

Activity 14: Written digit detection

This section will discuss how to provide more security for the cryptocurrency traders via the detection of hand-written digits. We will be using assuming that you are a software developer at a new Cryptocurrency trader platform. The latest security measure you are implementing requires the recognition of hand-written digits. Use the MNIST library to train a neural network to recognize digits. You can read more about this dataset on https://www.tensorflow.org/tutorials/.
Improve the accuracy of the model as much as possible. And to ensure that it happens correctly, you will need to complete the previous topic.
Load the dataset and format the input
import tensorflow.keras.datasets.mnist as mnist
(features_train, label_train),
(features_test, label_test) = mnist.load_data()
features_train = features_train / 255.0
features_test = features_test / 255.0
def flatten(matrix):
    return [elem for row in matrix for elem in row]
features_train_vector = [
    flatten(image) for image in features_train
]
features_test_vector = [
    flatten(image) for image in features_test
]
import numpy as np
label_train_vector = np.zeros((label_train.size, 10))
for i, label in enumerate(label_train_vector):
    label[label_train[i]] = 1
label_test_vector = np.zeros((label_test.size, 10))
for i, label in enumerate(label_test_vector):
    label[label_test[i]] = 1
Set up the Tensorflow graph. Instead of the sigmoid function, we will now use the relu function.
import tensorflow as tf
f = tf.nn.softmax
x = tf.placeholder(tf.float32, [None, 28 * 28 ])
W = tf.Variable( tf.random_normal([784, 10]))
b = tf.Variable( tf.random_normal([10]))
y = f(tf.add(tf.matmul( x, W ), b ))
Train the model.
import random
y_true = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=y,
    labels=y_true
)
cost = tf.reduce_mean(cross_entropy)
optimizer = tf.train.GradientDescentOptimizer(
    learning_rate = 0.5
).minimize(cost)
session = tf.Session()
session.run(tf.global_variables_initializer())
iterations = 600
batch_size = 200
sample_size = len(features_train_vector)
for _ in range(iterations):
    indices = random.sample(range(sample_size), batchSize)
    batch_features = [
        features_train_vector[i] for i in indices
    ]
    batch_labels = [
        label_train_vector[i] for i in indices
    ]
    min = i * batch_size
    max = (i+1) * batch_size
    dictionary = {
        x: batch_features,
        y_true: batch_labels
    }
    session.run(optimizer, feed_dict=dictionary)
Test the model
label_predicted = session.run(classify( x ), feed_dict={
    x: features_test_vector
})
label_predicted = [
    np.argmax(label) for label in label_predicted
]
confusion_matrix(label_test, label_predicted)
The output is as follows:
array([[ 0, 0, 223, 80, 29, 275, 372, 0, 0, 1],
   [ 0, 915, 4, 10, 1, 13, 192, 0, 0, 0],
   [ 0, 39, 789, 75, 63, 30, 35, 0, 1, 0],
   [ 0, 6, 82, 750, 13, 128, 29, 0, 0, 2],
   [ 0, 43, 16, 16, 793, 63, 49, 0, 2, 0],
   [ 0, 22, 34, 121, 40, 593, 76, 5, 0, 1],
   [ 0, 29, 34, 6, 44, 56, 788, 0, 0, 1],
   [ 1, 54, 44, 123, 715, 66, 24, 1, 0, 0],
   [ 0, 99, 167, 143, 80, 419, 61, 0, 4, 1],
   [ 0, 30, 13, 29, 637, 238, 58, 3, 1, 0]], dtype=int64)
Calculate the accuracy score:
accuracy_score(label_test, label_predicted)
The output is as follows:
0.4633
By re-running the code segment responsible for training the data set, we can improve the accuracy:
for _ in range(iterations):
    indices = random.sample(range(sample_size), batch_size)
    batch_features = [
        features_train_vector[i] for i in indices
    ]
    batch_labels = [
        label_train_vector[i] for i in indices
    ]
    min = i * batch_size
    max = (i+1) * batch_size
    dictionary = {
        x: batch_features,
        y_true: batch_labels
    }
    session.run(optimizer, feed_dict=dictionary)
Second run: 0.5107
Third run: 0.5276
Fourth run: 0.5683
Fifth run: 0.6002
Sixth run: 0.6803
Seventh run: 0.6989
Eighth run: 0.7074
Ninth run: 0.713
Tenth run: 0.7163
Twentieth run: 0.7308
Thirtieth run: 0.8188
Fortieth run: 0.8256
Fiftieth run: 0.8273
At the end of the fiftieth run, the improved confusion matrix looks as follows:
array([
[946, 0,    6,    3,    0,    1, 15,    2,    7,    0],
[ 0,1097,    3,    7,    1,    0,    4,    0, 23,    0],
[11, 3, 918, 11, 18,    0, 13,    8, 50,    0],
[3,    0, 23, 925,    2, 10,    4,    9, 34,    0],
[2,    2,    6,    1, 929,    0, 14,    2, 26,    0],
[16, 4,    7, 62,    8, 673, 22,    3, 97,    0],
[8,    2,    4,    3,    8,    8, 912,    2, 11,    0],
[5,    9, 33,    6,    9,    1,    0, 949, 16,    0],
[3,    4,    5, 12,    7,    4, 12,    3, 924,    0],
[8,    5,    7, 40, 470, 11,    5, 212, 251,    0]
],
     dtype=int64)

Not a bad result. More than 8 out of 10 digits are accurately recognized.

Activity 15 : Written Digit Detection with Deep Learning

This section will discuss how deep learning improves the performance of your model. We will be assuming that your boss is not satisfied with the results you presented in previous activity and asks you to consider adding two hidden layers to your original model and determine whether new layers improve the accuracy of the model. And to ensure that it happens correctly, you will need to have knowledge of Deep Learning.

Execute the code of previous Activity and measure the accuracy of the model.
Change the neural network by adding new layers. We will combine the relu and softmax activator functions:
x = tf.placeholder(tf.float32, [None, 28 * 28 ])
f1 = tf.nn.relu
W1 = tf.Variable(tf.random_normal([784, 200]))
b1 = tf.Variable(tf.random_normal([200]))
layer1_out = f1(tf.add(tf.matmul(x, W1), b1))
f2 = tf.nn.softmax
W2 = tf.Variable(tf.random_normal([200, 100]))
b2 = tf.Variable(tf.random_normal([100]))
layer2_out = f2(tf.add(tf.matmul(layer1_out, W2), b2))
f3 = tf.nn.softmax
W3 = tf.Variable(tf.random_normal([100, 10]))
b3 = tf.Variable( tf.random_normal([10]))
y = f3(tf.add(tf.matmul(layer2_out, W3), b3))
Retrain the model
y_true = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(
    logits=y,
    labels=y_true
)
cost = tf.reduce_mean(cross_entropy)
optimizer = tf.train.GradientDescentOptimizer(
learning_rate=0.5).minimize(cost)
session = tf.Session()
session.run(tf.global_variables_initializer())
iterations = 600
batch_size = 200
sample_size = len(features_train_vector)
for _ in range(iterations):
    indices = random.sample(range(sample_size), batchSize)
    batch_features = [
        features_train_vector[i] for i in indices
    ]
    batch_labels = [
        label_train_vector[i] for i in indices
    ]
    min = i * batch_size
    max = (i+1) * batch_size
    dictionary = {
        x: batch_features,
        y_true: batch_labels
    }
    session.run(optimizer, feed_dict=dictionary)
Evaluate the model
label_predicted = session.run(y, feed_dict={
    x: features_test_vector
})
label_predicted = [
    np.argmax(label) for label in label_predicted
]
confusion_matrix(label_test, label_predicted)
The output is as follows:
array([[ 801, 11,    0, 14,    0,    0, 56,    0, 61, 37],
     [ 2, 1069,    0, 22,    0,    0, 18,    0,    9, 15],
     [ 276, 138,    0, 225,    0,    2, 233,    0, 105, 53],
     [ 32, 32,    0, 794,    0,    0, 57,    0, 28, 67],
     [ 52, 31,    0, 24,    0,    3, 301,    0, 90, 481],
     [ 82, 50,    0, 228,    0,    3, 165,    0, 179, 185],
     [ 71, 23,    0, 14,    0,    0, 712,    0, 67, 71],
     [ 43, 85,    0, 32,    0,    3, 31,    0, 432, 402],
     [ 48, 59,    0, 192,    0,    2, 45,    0, 425, 203],
     [ 45, 15,    0, 34,    0,    2, 39,    0, 162, 712]],
     dtype=int64)
Calculating the accuracy score.
accuracy_score(label_test, label_predicted)

The output is 0.4516.

The accuracy did not improve.

Let's see if further runs improve the accuracy of the model.

Second run: 0.5216

Third run: 0.5418

Fourth run: 0.5567

Fifth run: 0.564

Sixth run: 0.572

Seventh run: 0.5723

Eighth run: 0.6001

Ninth run: 0.6076

Tenth run: 0.6834

Twentieth run: 0.7439

Thirtieth run: 0.7496

Fortieth run: 0.7518

Fiftieth run: 0.7536

Afterwards, we got the following results: 0.755, 0.7605, 0.7598, 0.7653

The final confusion matrix:

array([[ 954, 0, 2, 1, 0, 6, 8, 0, 5, 4],

[ 0, 1092, 5, 3, 0, 0, 6, 0, 27, 2],

[ 8, 3, 941, 16, 0, 2, 13, 0, 35, 14],

[ 1, 1, 15, 953, 0, 14, 2, 0, 13, 11],

[ 4, 3, 8, 0, 0, 1, 52, 0, 28, 886],

[ 8, 1, 5, 36, 0, 777, 16, 0, 31, 18],

[ 8, 1, 6, 1, 0, 6, 924, 0, 9, 3],

[ 3, 10, 126, 80, 0, 4, 0, 0, 35, 770],

[ 4, 0, 6, 10, 0, 6, 4, 0, 926, 18],

[ 4, 5, 1, 8, 0, 2, 2, 0, 18, 969]],

dtype=int64)

This deep neural network behaves even more chaotically than the single layer one. It took 600 iterations of 200 samples to get from an accuracy of 0.572 to 0.5723. Not long after this iteration, we jumped from 0.6076 to 0.6834 in that number of iterations.

Table of Contents for Appendix

Create new playlist

Sign In

Sign Up