Appendix

1. Introduction to Artificial Intelligence

Activity 1.01: Generating All Possible Sequences of Steps in a Tic-Tac-Toe Game

Solution:

The following steps will help you to complete this activity:

  1. Open a new Jupyter Notebook file.
  2. Reuse the function codes of Steps 2–9 from the previous, Exercise 1.02, Creating an AI with Random Behavior for the Tic-Tac-Toe Game.
  3. Create a function that maps the all_moves_from_board_list function to each element of a list of boards. This way, we will have all of the nodes of a decision tree in each depth:

    def all_moves_from_board_list(board_list, sign):

        move_list = []

        for board in board_list:

            move_list.extend(all_moves_from_board(board, sign))

        return move_list

    In the preceding code snippet, we have defined the all_moves_from_board function, which will enumerate all the possible moves from the board and add the move to a list called move_list.

  4. Create a variable called board that contains the EMPTY_SIGN * 9 decision tree and calls the all_moves_from_board_list function with the board and AI_SIGN. Save its output in a variable called all_moves and print its content:

    board = EMPTY_SIGN * 9

    all_moves = all_moves_from_board(board, AI_SIGN )

    all_moves

    The expected output is this:

    ['X........',

    '.X.......',

    '..X......',

    '...X.....',

    '....X....',

    '.....X...',

    '......X..',

    '.......X.',

    '........X']

  5. Create a filter_wins function that takes the ended games out from the list of moves and appends them in an array containing the board states won by the AI player and the opponent player:

    def filter_wins(move_list, ai_wins, opponent_wins):

        for board in move_list:

            won_by = game_won_by(board)

            if won_by == AI_SIGN:

                ai_wins.append(board)

                move_list.remove(board)

            elif won_by == OPPONENT_SIGN:

                opponent_wins.append(board)

                move_list.remove(board)

    In the preceding code snippet, we have defined a filter_wins function, which will add the winning state of the board for each player to a list.

  6. Use the count_possibilities function, which prints and returns the number of decision tree leaves that ended with a draw, that were won by the first player, and that were won by the second player, as shown in the following code snippet:

    def count_possibilities():

        board = EMPTY_SIGN * 9

        move_list = [board]

        ai_wins = []

        opponent_wins = []

        for i in range(9):

            print('step ' + str(i) + '. Moves: '

                  + str(len(move_list)))

            sign = AI_SIGN if

                   i % 2 == 0 else OPPONENT_SIGN

            move_list = all_moves_from_board_list

                        (move_list, sign)

            filter_wins(move_list, ai_wins,

                        opponent_wins)

        print('First player wins: ' + str(len(ai_wins)))

        print('Second player wins: ' + str(len(opponent_wins)))

        print('Draw', str(len(move_list)))

        print('Total', str(len(ai_wins)

              + len(opponent_wins) + len(move_list)))

        return len(ai_wins), len(opponent_wins),

               len(move_list), len(ai_wins)

               + len(opponent_wins) + len(move_list)

    We have up to 9 steps in each state. In the 0th, 2nd, 4th, 6th, and 8th iterations, the AI player moves. In all the other iterations, the opponent moves. We create all possible moves in all steps and take out the completed games from the move list.

  7. Execute the number of possibilities to experience the combinatorial explosion and save the results in four variables called first_player, second_player, draw, and total:

    first_player, second_player,

    draw, total = count_possibilities()

    The expected output is this:

    step 0. Moves: 1

    step 1. Moves: 9

    step 2. Moves: 72

    step 3. Moves: 504

    step 4. Moves: 3024

    step 5. Moves: 13680

    step 6. Moves: 49402

    step 7. Moves: 111109

    step 8. Moves: 156775

    First player wins: 106279

    Second player wins: 68644

    Draw 91150

    Total 266073

As you can see, the tree of the board states consists of a total of 266073 leaves. The count_possibilities function essentially implements a BFS algorithm to traverse all the possible states of the game. Notice that we count these states multiple times because placing an X in the top-right corner in Step 1 and placing an X in the top-left corner in Step 3 leads to similar possible states as starting with the top-left corner and then placing an X in the top-right corner. If we implemented the detection of duplicate states, we would have to check fewer nodes. However, at this stage, due to the limited depth of the game, we will omit this step.

A decision tree, however, is identical to the data structure examined by count_possibilities. In a decision tree, we explore the utility of each move by investigating all possible future steps up to a certain extent. In our example, we could calculate the utility of the initial moves by observing the number of wins and losses after fixing the first few moves.

Note

The root of the tree is the initial state. An internal state of the tree is a state in which a game has not been ended and moves are still possible. A leaf of the tree contains a state where a game has ended.

To access the source code for this specific section, please refer to https://packt.live/3doxPog.

You can also run this example online at https://packt.live/3dpnuIz.

You must execute the entire Notebook in order to get the desired result.

Activity 1.02: Teaching the Agent to Realize Situations When It Defends Against Losses

Solution:

The following steps will help you to complete this activity:

  1. Open a new Jupyter Notebook file.
  2. Reuse all the code from Steps 2–6 from the previous, Exercise 1.03, Teaching the Agent to Win.
  3. Create a function called player_can_win that takes all the moves from the board using the all_moves_from_board function and iterates over them using the next_move variable.

    In each iteration, it checks whether the game can be won by the player.

    def player_can_win(board, sign):

        next_moves = all_moves_from_board(board, sign)

        for next_move in next_moves:

            if game_won_by(next_move) == sign:

                return True

        return False

  4. Extend the AI move so that it prefers making safe moves. A move is safe if the opponent cannot win the game in the next step:

    def ai_move(board):

        new_boards = all_moves_from_board(board, AI_SIGN)

        for new_board in new_boards:

            if game_won_by(new_board) == AI_SIGN:

                return new_board

        safe_moves = []

        for new_board in new_boards:

            if not player_can_win(new_board, OPPONENT_SIGN):

                safe_moves.append(new_board)

        return choice(safe_moves)

        if len(safe_moves) > 0 else new_boards[0]

    In the preceding code snippet, we have defined the ai_move function, which tells the AI how to move by looking at the list of all the possibilities and choosing one where the player cannot win in the next move. If you test our new application, you will find that the AI has made the correct move.

  5. Now, place this logic in the state space generator and check how well the computer player is doing by generating all the possible games:

    def all_moves_from_board(board, sign):

        move_list = []

        for i, v in enumerate(board):

            if v == EMPTY_SIGN:

                new_board = board[:i] + sign + board[i+1:]

                move_list.append(new_board)

                if game_won_by(new_board) == AI_SIGN:

                    return [new_board]

        if sign == AI_SIGN:

            safe_moves = []

            for move in move_list:

                if not player_can_win(move, OPPONENT_SIGN):

                    safe_moves.append(move)

            return safe_moves if len(safe_moves) > 0 else move_list[0:1]

        else:

            return move_list

    In the preceding code snippet, we have defined a function that generates all possible moves. As soon as we find the next move that can make the player win, we return a move to counter it. We do not care whether the player has multiple options to win the game in one move – we just return the first possibility. If the AI cannot stop the player from winning, we return all possible moves.

    Let's see what this means in terms of counting all of the possibilities at each step.

  6. Count the options that are possible:

    first_player, second_player,

    draw, total = count_possibilities()

    The expected output is this:

    step 0. Moves: 1

    step 1. Moves: 9

    step 2. Moves: 72

    step 3. Moves: 504

    step 4. Moves: 3024

    step 5. Moves: 5197

    step 6. Moves: 18606

    step 7. Moves: 19592

    step 8. Moves: 30936

    First player wins: 20843

    Second player wins: 962

    Draw 20243

    Total 42048

We are doing better than before. We not only got rid of almost 2/3 of possible games again, but, most of the time, the AI player either wins or settles for a draw.

Note

To access the source code for this specific section, please refer to https://packt.live/2B0G9xf.

You can also run this example online at https://packt.live/2V7qLpO.

You must execute the entire Notebook in order to get the desired result.

Activity 1.03: Fixing the First and Second Moves of the AI to Make It Invincible

Solution:

The following steps will help you to complete this activity:

  1. Open a new Jupyter Notebook file.
  2. Reuse the code from Steps 2–4 of the previous, Activity 1.02, Teaching the Agent to Realize Situations When It Defends Against Losses.
  3. Now, count the number of empty fields on the board and make a hardcoded move in case there are 9 or 7 empty fields. You can experiment with different hardcoded moves. We found that occupying any corner, and then occupying the opposite corner, leads to no loss. If the opponent occupies the opposite corner, making a move in the middle results in no losses:

    def all_moves_from_board(board, sign):

        if sign == AI_SIGN:

            empty_field_count = board.count(EMPTY_SIGN)

            if empty_field_count == 9:

                return [sign + EMPTY_SIGN * 8]

            elif empty_field_count == 7:

                return [board[:8] + sign if board[8] ==

                        EMPTY_SIGN else board[:4] + sign + board[5:]]

        move_list = []

        for i, v in enumerate(board):

            if v == EMPTY_SIGN:

                new_board = board[:i] + sign + board[i+1:]

                move_list.append(new_board)

                if game_won_by(new_board) == AI_SIGN:

                    return [new_board]

        if sign == AI_SIGN:

            safe_moves = []

            for move in move_list:

                if not player_can_win(move, OPPONENT_SIGN):

                    safe_moves.append(move)

            return safe_moves if len(safe_moves) > 0 else move_list[0:1]

        else:

            return move_list

  4. Now, verify the state space:

    first_player, second_player, draw, total = count_possibilities()

    The expected output is this:

    step 0. Moves: 1

    step 1. Moves: 1

    step 2. Moves: 8

    step 3. Moves: 8

    step 4. Moves: 48

    step 5. Moves: 38

    step 6. Moves: 108

    step 7. Moves: 76

    step 8. Moves: 90

    First player wins: 128

    Second player wins: 0

    Draw 60

    Total 188

After fixing the first two steps, we only need to deal with 8 possibilities instead of 504. We also guided the AI into a state where the hardcoded rules were sufficient enough for it to never lose a game. Fixing the steps is not important because we would give the AI hardcoded steps to start with, but it is important because it is a tool that is used to evaluate and compare each step. After fixing the first two steps, we only need to deal with 8 possibilities instead of 504. We also guided the AI into a state, where the hardcoded rules were sufficient for never losing a game. As you can see, the AI is now nearly invincible and will only win or make a draw.

The best that a player can hope to get against this AI is a draw.

Note

To access the source code for this specific section, please refer to https://packt.live/2YnUcpA.

You can also run this example online at https://packt.live/318TBtq.

You must execute the entire Notebook in order to get the desired result.

Activity 1.04: Connect Four

Solution:

  1. Open a new Jupyter Notebook file.

    Let's set up the TwoPlayersGame framework by writing the init method.

  2. Define the board as a one-dimensional list, like the tic-tac-toe example. We could use a two-dimensional list, too, but modeling will not get much easier or harder. Beyond making initialization like we did in the tic-tac-toe game, we will work a bit further ahead. We will generate all of the possible winning combinations in the game and save them for future use, as shown in the following code snippet:

    from easyAI import TwoPlayersGame, Human_Player

    class ConnectFour(TwoPlayersGame):

        def __init__(self, players):

            self.players = players

            self.board = [0 for i in range(42)]

            self.nplayer = 1

            def generate_winning_tuples():

                tuples = []

                # horizontal

                tuples += [list(range(row*7+column,

                           row*7+column+4, 1))

                           for row in range(6)

                           for column in range(4)]

                # vertical

                tuples += [list(range(row*7+column,

                           row*7+column+28, 7))

                           for row in range(3)

                           for column in range(7)]

                # diagonal forward

                tuples += [list(range(row*7+column,

                           row*7+column+32, 8))

                           for row in range(3)

                           for column in range(4)]

                # diagonal backward

                tuples += [list(range(row*7+column,

                           row*7+column+24, 6))

                           for row in range(3)

                           for column in range(3, 7, 1)]

                return tuples

            self.tuples = generate_winning_tuples()

  3. Next, handle the possible_moves function, which is a simple enumeration. Notice that we are using column indices from 1 to 7 in the move names because it is more convenient to start a column indexing with 1 in the human player interface than with zero. For each column, we check whether there is an unoccupied field. If there is one, we will make the column a possible move:

        def possible_moves(self):

            return [column+1

                    for column in range(7)

                    if any([self.board[column+row*7] == 0

                            for row in range(6)])

                    ]

  4. Making a move is like the possible_moves function. We check the column of the move and find the first empty cell starting from the bottom. Once we find it, we occupy it. You can also read the implementation of the both the make_move function: unmake_move. In the unmake_move function, we check the column from top to down, and we remove the move at the first non-empty cell. Notice that we rely on the internal representation of easyAI so that it does not undo moves that it hasn't made. Otherwise, this function would remove a token of the other player without checking whose token was removed:

        def make_move(self, move):

            column = int(move) - 1

            for row in range(5, -1, -1):

                index = column + row*7

                if self.board[index] == 0:

                    self.board[index] = self.nplayer

                    return

        # optional method (speeds up the AI)

        def unmake_move(self, move):

            column = int(move) - 1

            for row in range(6):

                index = column + row*7

                if self.board[index] != 0:

                    self.board[index] = 0

                    return

  5. Since we already have the tuples that we must check, we can mostly reuse the lose function from the tic-tac-toe example:

        def lose(self):

            return any([all([(self.board[c] == self.nopponent)

                             for c in line])

                        for line in self.tuples])

        def is_over(self):

            return (self.possible_moves() == []) or self.lose()

  6. Our final task is to implement the show method, which prints the board. We will reuse the tic-tac-toe implementation and just change the show and scoring variables:

        def show(self):

            print(' '+' '.join([

                ' '.join([['.', 'O', 'X']

                          [self.board[7*row+column]]

                          for column in range(7)])

                for row in range(6)]))

        def scoring(self):

            return -100 if self.lose() else 0

    if __name__ == "__main__":

        from easyAI import AI_Player, Negamax

        ai_algo = Negamax(6)

        ConnectFour([Human_Player(),

                     AI_Player(ai_algo)]).play()

  7. Now that all the functions are complete, you can try out the example. Feel free to play a round or two against your opponent.

    The expected output is this:

    Figure 1.30: Expected output for the Connect Four game

Figure 1.30: Expected output for the Connect Four game

By completing this activity, you have seen that the opponent is not perfect, but that it plays reasonably well. If you have a strong computer, you can increase the parameter of the Negamax algorithm. We encourage you to come up with a better heuristic.

Note

To access the source code for this specific section, please refer to https://packt.live/3esk2hI.

You can also run this example online at https://packt.live/3dnkfS5.

You must execute the entire Notebook in order to get the desired result.

2. An Introduction to Regression

Activity 2.01: Boston House Price Prediction with Polynomial Regression of Degrees 1, 2, and 3 on Multiple Variables

Solution:

  1. Open a Jupyter Notebook.
  2. Import the required packages and load the Boston House Prices data from sklearn into a DataFrame:

    import numpy as np

    import pandas as pd

    from sklearn import preprocessing

    from sklearn import model_selection

    from sklearn import linear_model

    from sklearn.preprocessing import PolynomialFeatures

    file_url = 'https://raw.githubusercontent.com/'

               'PacktWorkshops/'

               'The-Applied-Artificial-Intelligence-Workshop/'

               'master/Datasets/boston_house_price.csv'

    df = pd.read_csv(file_url)

    The output of df is as follows:

    Figure 2.28: Output displaying the dataset

    Figure 2.28: Output displaying the dataset

    Earlier in this chapter, you learned that most of the required packages to perform linear regression come from sklearn. We need to import the preprocessing module to scale the data, the linear_model module to train linear regression, the PolynomialFeatures module to transform the inputs for the polynomial regression, and the model_selection module to evaluate the performance of each model.

  3. Prepare the dataset for prediction by converting the label and features into NumPy arrays and scaling the features:

    features = np.array(df.drop('MEDV', 1))

    label = np.array(df['MEDV'])

    scaled_features = preprocessing.scale(features)

    The output for features is as follows:

    Figure 2.29: Labels and features converted to NumPy arrays

    Figure 2.29: Labels and features converted to NumPy arrays

    As you can see, our features have been converted into a NumPy array.

    The output for the label is as follows:

    Figure 2.30: Output showing the expected labels

    Figure 2.30: Output showing the expected labels

    As you can see, our labels have been converted into a NumPy array.

    The output for scaled_features is as follows:

    array([[-0.41978194, 0.28482986, -1.2879095 , ...,

            -0.66660821, -1.45900038, -1.0755623 ],

           [-0.41733926, -0.48772236, -0.59338101, ...,

            -0.98732948, -0.30309415, -0.49243937],

           [-0.41734159, -0.48772236, -0.59338101, ...,

            -0.98732948, -0.30309415, -1.2087274 ],

           ...,

           [-0.41344658, -0.48772236, 0.11573841, ...,

            -0.80321172, 1.17646583, -0.98304761],

           [-0.40776407, -0.48772236, 0.11573841, ...,

            -0.80321172, 1.17646583, -0.86530163],

           [-0.41500016, -0.48772236, 0.11573841, ...,

            -0.80321172, 1.17646583, -0.66905833]])

    As you can see, our features have been properly scaled.

    As we don't have any missing values and we are not trying to predict a future value as we did in Exercise 2.03, Preparing the Quandl Data for Prediction, we can directly convert the label ('MEDV') and features into NumPy arrays. Then, we can scale the arrays of features using the preprocessing.scale() function.

  4. Create three different set of features by transforming the scaled features into a suitable format for each of the polynomial regressions:

    poly_1_scaled_features = PolynomialFeatures(degree=1)

                             .fit_transform(scaled_features)

    poly_2_scaled_features = PolynomialFeatures(degree=2)

                             .fit_transform(scaled_features)

    poly_3_scaled_features = PolynomialFeatures(degree=3)

                             .fit_transform(scaled_features)

    The output for poly_1_scaled_features is as follows:

    array([[ 1. , -0.41978194, 0.28482986, ..., -0.66660821,

            -1.45900038, -1.0755623 ],

           [ 1. , -0.41733926, -0.48772236, ..., -0.98732948,

            -0.30309415, -0.49243937],

           [ 1. , -0.41734159, -0.48772236, ..., -0.98732948,

            -0.30309415, -1.2087274 ],

           ...,

           [ 1. , -0.41344658, -0.48772236, ..., -0.80321172,

             1.17646583, -0.98304761],

           [ 1. , -0.40776407, -0.48772236, ..., -0.80321172,

             1.17646583, -0.86530163],

           [ 1. , -0.41500016, -0.48772236, ..., -0.80321172,

             1.17646583, -0.66905833]])

    Our scaled_features variable has been properly transformed for the polynomial regression of degree 1.

    The output for poly_2_scaled_features is as follows:

    Figure 2.31: Output showing poly_2_scaled_features

    Figure 2.31: Output showing poly_2_scaled_features

    Our scaled_features variable has been properly transformed for the polynomial regression of degree 2.

    The output for poly_3_scaled_features is as follows:

    array([[ 1. , -0.41978194, 0.28482986, ..., -2.28953024,

            -1.68782164, -1.24424733],

           [ 1. , -0.41733926, -0.48772236, ..., -0.04523847,

            -0.07349928, -0.11941484],

           [ 1. , -0.41734159, -0.48772236, ..., -0.11104103,

            -0.4428272 , -1.76597723],

           ...,

           [ 1. , -0.41344658, -0.48772236, ..., -1.36060852,

             1.13691611, -0.9500001 ],

           [ 1. , -0.40776407, -0.48772236, ..., -1.19763962,

             0.88087515, -0.64789192],

           [ 1. , -0.41500016, -0.48772236, ..., -0.9260248 ,

             0.52663205, -0.29949664]])

    Our scaled_features variable has been properly transformed for the polynomial regression of degree 3.

    We had to transform the scaled features in three different ways as each degree of polynomial regression required a different input transformation.

  5. Split the data into a training set and a testing set with random state = 8:

    (poly_1_features_train, poly_1_features_test,

    poly_label_train, poly_label_test) =

    model_selection.train_test_split(poly_1_scaled_features,

                                     label,

                                     test_size=0.1,

                                     random_state=8)

    (poly_2_features_train, poly_2_features_test,

    poly_label_train, poly_label_test) =

    model_selection.train_test_split(poly_2_scaled_features,

                                     label,

                                     test_size=0.1,

                                     random_state=8)

    (poly_3_features_train, poly_3_features_test,

    poly_label_train, poly_label_test) =

    model_selection.train_test_split(poly_3_scaled_features,

                                     label,

                                     test_size=0.1,

                                     random_state=8)

    As we have three different sets of scaled transformed features but the same set of labels, we had to perform three different splits. By using the same set of labels and random_state in each splitting, we ensure that we obtain the same poly_label_train and poly_label_test for every split.

  6. Perform a polynomial regression of degree 1 and evaluate whether the model is overfitting:

    model_1 = linear_model.LinearRegression()

    model_1.fit(poly_1_features_train, poly_label_train)

    model_1_score_train = model_1.score(poly_1_features_train,

                                        poly_label_train)

    model_1_score_test = model_1.score(poly_1_features_test,

                                       poly_label_test)

    The output for model_1_score_train is as follows:

    0.7406006443486721

    The output for model_1_score_test is as follows:

    0.6772229017901507

    To estimate whether a model is overfitting or not, we need to compare the scores of the model applied to the training set and testing set. If the score for the training set is much higher than the test set, we are overfitting. This is the case here where the polynomial regression of degree 1 achieved a score of 0.74 for the training set compared to 0.68 for the testing set.

  7. Perform a polynomial regression of degree 2 and evaluate whether the model is overfitting:

    model_2 = linear_model.LinearRegression()

    model_2.fit(poly_2_features_train, poly_label_train)

    model_2_score_train = model_2.score(poly_2_features_train,

                                        poly_label_train)

    model_2_score_test = model_2.score(poly_2_features_test,

                                       poly_label_test)

    The output for model_2_score_train is as follows:

    0.9251199698832675

    The output for model_2_score_test is as follows:

    0.8253870684280571

    Like with the polynomial regression of degree 1, our polynomial regression of degree 2 is overfitting even more than degree 1, but has managed to achieve better results at the end.

  8. Perform a polynomial regression of degree 3 and evaluate whether the model is overfitting:

    model_3 = linear_model.LinearRegression()

    model_3.fit(poly_3_features_train, poly_label_train)

    model_3_score_train = model_3.score(poly_3_features_train,

                                        poly_label_train)

    model_3_score_test = model_3.score(poly_3_features_test,

                                       poly_label_test)

    The output for model_3_score_train is as follows:

    0.9910498071894897

    The output for model_3_score_test is as follows:

    -8430.781888645262

    These results are very interesting because the polynomial regression of degree 3 managed to achieve a near-perfect score with 0.99 (1 is the maximum). This is a warning sign that our model is overfitting too much. We have the confirmation of this warning when the model is applied to the testing set and achieves a very low negative score of -8430. As a reminder, a score of 0 can be achieved by using the mean of the data as a prediction. This means that our third model managed to make worse predictions than just using the mean.

  9. Compare the predictions of the 3 models against the label on the testing set:

    model_1_prediction = model_1.predict(poly_1_features_test)

    model_2_prediction = model_2.predict(poly_2_features_test)

    model_3_prediction = model_3.predict(poly_3_features_test)

    df_prediction = pd.DataFrame(poly_label_test)

    df_prediction.rename(columns = {0:'label'}, inplace = True)

    df_prediction['model_1_prediction'] =

    pd.DataFrame(model_1_prediction)

    df_prediction['model_2_prediction'] =

    pd.DataFrame(model_2_prediction)

    df_prediction['model_3_prediction'] =

    pd.DataFrame(model_3_prediction)

    The output of df_prediction is as follows:

    Figure 2.32: Output showing the expected predicted values

Figure 2.32: Output showing the expected predicted values

After applying the predict function for each model on their respective testing set, in order to get the predicted values, we convert them into a single df_prediction DataFrame with the label values. Increasing the number of degrees in polynomial regressions does not necessarily mean that the model will perform better compared to one with a lower degree. In fact, increasing the degree will lead to more overfitting on the training data.

Note

To access the source code for this specific section, please refer to https://packt.live/3eD8gAY.

You can also run this example online at https://packt.live/3etadjp.

You must execute the entire Notebook in order to get the desired result.

In this activity, we learned how to perform polynomial regressions of degrees 1 to 3 with multiple variables on the Boston House Price dataset and saw how increasing the degrees led to overfitted models.

3. An Introduction to Classification

Activity 3.01: Increasing the Accuracy of Credit Scoring

Solution:

  1. Open a new Jupyter Notebook file and execute all the steps from the previous exercise, Exercise 3.04, K-Nearest Neighbors Classification in Scikit-Learn.
  2. Import neighbors from sklearn:

    from sklearn import neighbors

  3. Create a function called fit_knn that takes the following parameters: k, p, features_train, label_train, features_test, and label_test. This function will fit KNeighborsClassifier with the training set and print the accuracy score for the training and testing sets, as shown in the following code snippet:

    def fit_knn(k, p, features_train, label_train,

                features_test, label_test):

        classifier = neighbors.KNeighborsClassifier(n_neighbors=k, p=p)

        classifier.fit(features_train, label_train)

        return classifier.score(features_train, label_train),

               classifier.score(features_test, label_test)

  4. Call the fit_knn() function with k=5 and p=2, save the results in 2 variables, and print them. These variables are acc_train_1 and acc_test_1:

    acc_train_1, acc_test_1 = fit_knn(5, 2, features_train,

                                      label_train,

                                      features_test, label_test)

    acc_train_1, acc_test_1

    The expected output is this:

    (0.78625, 0.75)

    With k=5 and p=2, KNN achieved a good accuracy score close to 0.78. But the score is quite different from the training and testing sets, which means the model is overfitting.

  5. Call the fit_knn() function with k=10 and p=2, save the results in 2 variables, and print them. These variables are acc_train_2 and acc_test_2:

    acc_train_2, acc_test_2 = fit_knn(10, 2, features_train,

                                      label_train,

                                      features_test, label_test)

    acc_train_2, acc_test_2

    The expected output is this:

    (0.775, 0.785)

    Increasing the number of neighbors to 10 has decreased the accuracy score of the training set, but now it is very close to the testing set.

  6. Call the fit_knn() function with k=15 and p=2, save the results in 2 variables, and print them. These variables are acc_train_3 and acc_test_3:

    acc_train_3, acc_test_3 = fit_knn(15, 2, features_train,

                                      label_train,

                                      features_test, label_test)

    acc_train_3, acc_test_3

    The expected output is this:

    (0.76625, 0.79)

    With k=15 and p=2, the difference between the training and testing sets has  increased.

  7. Call the fit_knn() function with k=25 and p=2, save the results in 2 variables, and print them. These variables are acc_train_4 and acc_test_4:

    acc_train_4, acc_test_4 = fit_knn(25, 2, features_train,

                                      label_train,

                                      features_test, label_test)

    acc_train_4, acc_test_4

    The expected output is this:

    (0.7375, 0.77)

    Increasing the number of neighbors to 25 has a significant impact on the training set. However, the model is still overfitting.

  8. Call the fit_knn() function with k=50 and p=2, save the results in 2 variables, and print them. These variables are acc_train_5 and acc_test_5:

    acc_train_5, acc_test_5 = fit_knn(50, 2, features_train,

                                      label_train,

                                      features_test, label_test)

    acc_train_5, acc_test_5

    The expected output is this:

    (0.70625, 0.775)

    Bringing the number of neighbors to 50 neither improved the model's performance or the overfitting issue.

  9. Call the fit_knn() function with k=5 and p=1, save the results in 2 variables, and print them. These variables are acc_train_6 and acc_test_6:

    acc_train_6, acc_test_6 = fit_knn(5, 1, features_train,

                                      label_train,

                                      features_test, label_test)

    acc_train_6, acc_test_6

    The expected output is this:

    (0.8, 0.735)

    Changing to the Manhattan distance has helped increase the accuracy of the training set, but the model is still overfitting.

  10. Call the fit_knn() function with k=10 and p=1, save the results in 2 variables, and print them. These variables are acc_train_7 and acc_test_7:

    acc_train_7, acc_test_7 = fit_knn(10, 1, features_train,

                                      label_train,

                                      features_test, label_test)

    acc_train_7, acc_test_7

    The expected output is this:

    (0.77, 0.785)

    With k=10, the accuracy score for the training and testing sets are quite close to each other: around 0.78.

  11. Call the fit_knn() function with k=15 and p=1, save the results in 2 variables, and print them. These variables are acc_train_8 and acc_test_8:

    acc_train_8, acc_test_8 = fit_knn(15, 1, features_train,

                                      label_train,

                                      features_test, label_test)

    acc_train_8, acc_test_8

    The expected output is this:

    (0.7575, 0.775)

    Bumping k to 15, the model achieved a better accuracy score and is not overfitting very much.

  12. Call the fit_knn() function with k=25 and p=1, save the results in 2 variables, and print them. These variables are acc_train_9 and acc_test_9:

    acc_train_9, acc_test_9 = fit_knn(25, 1, features_train,

                                      label_train,

                                      features_test, label_test)

    acc_train_9, acc_test_9

    The expected output is this:

    (0.745, 0.8)

    With k=25, the difference between the training and testing sets' accuracy is increasing, so the model is overfitting.

  13. Call the fit_knn() function with k=50 and p=1, save the results in 2 variables, and print them. These variables are acc_train_10 and acc_test_10:

    acc_train_10, acc_test_10 = fit_knn(50, 1, features_train,

                                        label_train,

                                        features_test, label_test)

    acc_train_10, acc_test_10

    The expected output is this:

    (0.70875, 0.78)

    With k=50, the model's performance on the training set dropped significantly and the model is definitely overfitting.

In this activity, we tried multiple combinations of hyperparameters for n_neighbors and p. The best one we found was for n_neighbors=10 and p=2. With these hyperparameters, the model is not overfitting much and it achieved an accuracy score of around 78% for both the training and testing sets.

Note

To access the source code for this specific section, please refer to https://packt.live/2V5TOtG.

You can also run this example online at https://packt.live/2Bx0yd8.

You must execute the entire Notebook in order to get the desired result.

Activity 3.02: Support Vector Machine Optimization in scikit-learn

Solution:

  1. Open a new Jupyter Notebook file and execute all the steps mentioned in the previous, Exercise 3.04, K-Nearest Neighbor Classification in scikit-learn.
  2. Import svm from sklearn:

    from sklearn import svm

  3. Create a function called fit_knn that takes the following parameters: features_train, label_train, features_test, label_test, kernel="linear", C=1, degree=3, and gamma='scale'. This function will fit an SVC with the training set and print the accuracy score for both the training and testing sets:

    def fit_svm(features_train, label_train,

                features_test, label_test,

                kernel="linear", C=1,

                degree=3, gamma='scale'):

        classifier = svm.SVC(kernel=kernel, C=C,

                             degree=degree, gamma=gamma)

        classifier.fit(features_train, label_train)

        return classifier.score(features_train, label_train),

               classifier.score(features_test, label_test)

  4. Call the fit_knn() function with the default hyperparameter values, save the results in 2 variables, and print them. These variables are acc_train_1 and acc_test_1:

    acc_train_1,

    acc_test_1 = fit_svm(features_train,

                          label_train,

                          features_test,

                          label_test)

    acc_train_1, acc_test_1

    The expected output is this:

    (0.71625, 0.75)

    With the default hyperparameter values (linear model), the performance of the model is quite different between the training and the testing set.

  5. Call the fit_knn() function with kernel="poly", C=1, degree=4, and gamma=0.05, save the results in 2 variables, and print them. These variables are acc_train_2 and acc_test_2:

    acc_train_2,

    acc_test_2 = fit_svm(features_train, label_train,

                         features_test, label_test,

                         kernel="poly", C=1,

                         degree=4, gamma=0.05)

    acc_train_2, acc_test_2

    The expected output is this:

    (0.68875, 0.745)

    With a fourth-degree polynomial, the model is not performing well on the training set.

  6. Call the fit_knn() function with kernel="poly", C=2, degree=4, and gamma=0.05, save the results in 2 variables, and print them. These variables are acc_train_3 and acc_test_3:

    acc_train_3,

    acc_test_3 = fit_svm(features_train,

                         label_train, features_test,

                         label_test, kernel="poly",

                         C=2, degree=4, gamma=0.05)

    acc_train_3, acc_test_3

    The expected output is this:

    (0.68875, 0.745)

    Increasing the regularization parameter, C, didn't impact the model's performance at all.

  7. Call the fit_knn() function with kernel="poly", C=1, degree=4, and gamma=0.25, save the results in 2 variables, and print them. These variables are acc_train_4 and acc_test_4:

    acc_train_4,

    acc_test_4 = fit_svm(features_train,

                         label_train, features_test,

                         label_test, kernel="poly",

                         C=1, degree=4, gamma=0.25)

    acc_train_4, acc_test_4

    The expected output is this:

    (0.84625, 0.775)

    Increasing the value of gamma to 0.25 has significantly improved the model's performance on the training set. However, the accuracy on the testing set is much lower, so the model is overfitting.

  8. Call the fit_knn() function with kernel="poly", C=1, degree=4, and gamma=0.5, save the results in 2 variables, and print them. These variables are acc_train_5 and acc_test_5:

    acc_train_5,

    acc_test_5 = fit_svm(features_train,

                         label_train, features_test,

                         label_test, kernel="poly",

                         C=1, degree=4, gamma=0.5)

    acc_train_5, acc_test_5

    The expected output is this:

    (0.9575, 0.73)

    Increasing the value of gamma to 0.5 has drastically improved the model's performance on the training set, but it is definitely overfitting as the accuracy score on the testing set is much lower.

  9. Call the fit_knn() function with kernel="poly", C=1, degree=4, and gamma=0.16, save the results in 2 variables, and print them. These variables are acc_train_6 and acc_test_6:

    acc_train_6,

    acc_test_6 = fit_svm(features_train, label_train,

                         features_test, label_test,

                         kernel="poly", C=1,

                         degree=4, gamma=0.16)

    acc_train_6, acc_test_6

    The expected output is this:

    (0.76375, 0.785)

    With gamma=0.16, the model achieved a better accuracy score than it did for the best KNN model. Both the training and testing sets have a score of around 0.77.

  10. Call the fit_knn() function with kernel="sigmoid", save the results in 2 variables, and print them. These variables are acc_train_7 and acc_test_7:

    acc_train_7,

    acc_test_7 = fit_svm(features_train, label_train,

                         features_test, label_test,

                         kernel="sigmoid")

    acc_train_7, acc_test_7

    The expected output is this:

    (0.635, 0.66)

    The sigmoid kernel achieved a low accuracy score.

  11. Call the fit_knn() function with kernel="rbf" and gamma=0.15, save the results in 2 variables, and print them. These variables are acc_train_8 and acc_test_8:

    acc_train_8,

    acc_test_8 = fit_svm(features_train,

                         label_train, features_test,

                         label_test, kernel="rbf",

                         gamma=0.15)

    acc_train_8, acc_test_8

    The expected output is this:

    (0.7175, 0.765)

    The rbf kernel achieved a good score with gamma=0.15. The model is overfitting a bit, though.

  12. Call the fit_knn() function with kernel="rbf" and gamma=0.25, save the results in 2 variables, and print them. These variables are acc_train_9 and acc_test_9:

    acc_train_9,

    acc_test_9 = fit_svm(features_train,

                         label_train, features_test,

                         label_test, kernel="rbf",

                         gamma=0.25)

    acc_train_9, acc_test_9

    The expected output is this:

    (0.74, 0.765)

    The model performance got better with gamma=0.25, but it is still overfitting.

  13. Call the fit_knn() function with kernel="rbf" and gamma=0.35, save the results in 2 variables, and print them. These variables are acc_train_10 and acc_test_10:

    acc_train_10,

    acc_test_10 = fit_svm(features_train, label_train,

                          features_test, label_test,

                          kernel="rbf", gamma=0.35)

    acc_train_10, acc_test_10

    The expected output is this:

    (0.78125, 0.775)

With the rbf kernel and gamma=0.35, we got very similar results for the training and testing sets and the model's performance is higher than the best KNN we trained in the previous activity. This is our best model for the German credit dataset.

Note

To access the source code for this specific section, please refer to https://packt.live/3fPZlMQ.

You can also run this example online at https://packt.live/3hVlEm3.

You must execute the entire Notebook in order to get the desired result.

In this activity, we tried different values for the main hyperparameters of the SVM classifier: kernel, gamma, C, and degrees. We saw how they affected the model's performance and their tendency to overfit. With trial and error, we finally found the best hyperparameter combination and achieved an accuracy score close to 0.78. This process is called hyperparameter tuning and is an important step for any data science project.

4. An Introduction to Decision Trees

Activity 4.01: Car Data Classification

Solution:

  1. Open a new Jupyter Notebook file.
  2. Import the pandas package as pd:

    import pandas as pd

  3. Create a new variable called file_url that will contain the URL to the raw dataset:

    file_url = 'https://raw.githubusercontent.com/'

               'PacktWorkshops/'

               'The-Applied-Artificial-Intelligence-Workshop/'

               'master/Datasets/car.csv'

  4. Load the data using the pd.read_csv() method.:

    df = pd.read_csv(file_url)

  5. Print the first five rows of df:

    df.head()

    The output will be as follows:

    Figure 4.13: The first five rows of the dataset

    Figure 4.13: The first five rows of the dataset

  6. Import the preprocessing module from sklearn:

    from sklearn import preprocessing

  7. Create a function called encode() that takes a DataFrame and column name as parameters. This function will instantiate LabelEncoder(), fit it with the unique value of the column, and transform its data. It will return the transformed column:

    def encode(data_frame, column):

        label_encoder = preprocessing.LabelEncoder()

        label_encoder.fit(data_frame[column].unique())

        return label_encoder.transform(data_frame[column])

  8. Create a for loop that will iterate through each column of df and will encode them with the encode() function:

    for column in df.columns:

        df[column] = encode(df, column)

  9. Now, print the first five rows of df:

    df.head()

    The output will be as follows:

    Figure 4.14: The updated first five rows of the dataset

    Figure 4.14: The updated first five rows of the dataset

  10. Extract the class column using .pop() from pandas and save it in a variable called label:

    label = df.pop('class')

  11. Import model_selection from sklearn:

    from sklearn import model_selection

  12. Split the dataset into training and testing sets with test_size=0.1 and random_state=88:

    features_train, features_test, label_train, label_test =

    model_selection.train_test_split(df, label,

                                     test_size=0.1,

                                     random_state=88)

  13. Import DecisionTreeClassifier from sklearn:

    from sklearn.tree import DecisionTreeClassifier

  14. Instantiate DecisionTreeClassifier() and save it in a variable called decision_tree:

    decision_tree = DecisionTreeClassifier()

  15. Fit the decision tree with the training set:

    decision_tree.fit(features_train, label_train)

    The output will be as follows:

    Figure 4.15: Decision tree fit with the training set

    Figure 4.15: Decision tree fit with the training set

  16. Print the score of the decision tree on the testing set:

    decision_tree.score( features_test, label_test )

    The output will be as follows:

    0.953757225433526

    The decision tree is achieving an accuracy score of 0.95 for our first try. This is remarkable.

  17. Import classification_report from sklearn.metrics:

    from sklearn.metrics import classification_report

  18. Print the classification report of the test labels and predictions:

    print(classification_report(label_test,

          decision_tree.predict(features_test)))

    The output will be as follows:

    Figure 4.16: Output showing the expected classification report

Figure 4.16: Output showing the expected classification report

From this classification report, we can see that our model is performing quite well for the precision scores for all four classes. Regarding the recall score, we can see that it didn't perform as well for the last class.

Note

To access the source code for this specific section, please refer to https://packt.live/3hQDLtr.

You can also run this example online at https://packt.live/2NkEEML.

You must execute the entire Notebook in order to get the desired result.

By completing this activity, you have prepared the car dataset and trained a decision tree model. You have learned how to get its accuracy score and a classification report so that you can analyze its precision and recall scores.

Activity 4.02: Random Forest Classification for Your Car Rental Company

Solution:

  1. Open a Jupyter Notebook.
  2. Reuse the code mentioned in Steps 1 - 4 of Activity 1, Car Data Classification.
  3. Import RandomForestClassifier from sklearn.ensemble:

    from sklearn.ensemble import RandomForestClassifier

  4. Instantiate a random forest classifier with n_estimators=100, max_depth=6, and random_state=168. Save it to a variable called random_forest_classifier:

    random_forest_classifier =

    RandomForestClassifier(n_estimators=100,

                           max_depth=6, random_state=168)

  5. Fit the random forest classifier with the training set:

    random_forest_classifier.fit(features_train, label_train)

    The output will be as follows:

    Figure 4.17: Logs of the RandomForest classifier with its hyperparameter values

    Figure 4.17: Logs of the RandomForest classifier with its hyperparameter values

    These are the logs of the RandomForest classifier with its hyperparameter values.

  6. Make predictions on the testing set using the random forest classifier and save them in a variable called rf_preds_test. Print its content:

    rf_preds_test = random_forest_classifier.fit(features_train,

                                                 label_train)

    rf_preds_test

    The output will be as follows:

    Figure 4.18: Output showing the predictions on the testing set

    Figure 4.18: Output showing the predictions on the testing set

  7. Import classification_report from sklearn.metrics:

    from sklearn.metrics import classification_report

  8. Print the classification report with the labels and predictions from the test set:

    print(classification_report(label_test, rf_preds_test))

    The output will be as follows:

    Figure 4.19: Output showing the classification report with the labels and predictions from the test set

    Figure 4.19: Output showing the classification report with the labels and predictions from the test set

    The F1 score in the preceding report shows us that the random forest is performing well on class 2 but not as good for classes 0 and 3. The model is unable to predict accurately for class 1, but there were only 9 observations in the testing set. The accuracy score is 0.84, while the F1 score is 0.82.

  9. Import confusion_matrix from sklearn.metrics:

    from sklearn.metrics import confusion_matrix

  10. Display the confusion matrix on the true and predicted labels of the testing set:

    confusion_matrix(label_test, rf_preds_test)

    The output will be as follows:

    array([[ 32, 0, 10, 0],

          [ 8, 0, 0, 1],

          [ 5, 0, 109, 0],

          [ 3, 0, 0, 5]])

    From this confusion matrix, we can see that the RandomForest model is having difficulties accurately predicting the first class. It incorrectly predicted 16 cases (8 + 5 + 3) for this class.

  11. Print the feature importance score of the test set using .feature_importance_ and save the results in a variable called rf_varimp. Print its contents:

    rf_varimp = random_forest_classifier.feature_importances_

    rf_varimp

    The output will be as follows:

    array([0.12676384, 0.10366314, 0.02119621, 0.35266673,

           0.05915769, 0.33655239])

    The preceding output shows us that the most important features are the fourth and sixth ones, which correspond to persons and safety, respectively.

  12. Import ExtraTreesClassifier from sklearn.ensemble:

    from sklearn.ensemble import ExtraTreesClassifier

  13. Instantiate ExtraTreestClassifier with n_estimators=100, max_depth=6, and random_state=168. Save it to a variable called random_forest_classifier:

    extra_trees_classifier =

    ExtraTreesClassifier(n_estimators=100,

                         max_depth=6, random_state=168)

  14. Fit the extratrees classifier with the training set:

    extra_trees_classifier.fit(features_train, label_train)

    The output will be as follows:

    Figure 4.20: Output with the extratrees classifier with the training set

    Figure 4.20: Output with the extratrees classifier with the training set

    These are the logs of the extratrees classifier with its hyperparameter values.

  15. Make predictions on the testing set using the extratrees classifier and save them in a variable called et_preds_test. Print its content:

    et_preds_test = extra_trees_classifier.predict(features_test)

    et_preds_test

    The output will be as follows:

    Figure 4.21: Predictions on the testing set using extratrees

    Figure 4.21: Predictions on the testing set using extratrees

  16. Print the classification report with the labels and predictions from the test set:

    print(classification_report(label_test,

          extra_trees_classifier.predict(features_test)))

    The output will be as follows:

    Figure 4.22: Classification report with the labels and predictions from the test set

    Figure 4.22: Classification report with the labels and predictions from the test set

    The F1 score shown in the preceding report shows us that the random forest is performing well on class 2 but not as good for class 0. The model is unable to predict accurately for classes 1 and 3, but there were only 9 and 8 observations in the testing set, respectively. The accuracy score is 0.82, while the F1 score is 0.78. So, our RandomForest classifier performed better with extratrees.

  17. Display the confusion matrix of the true and predicted labels of the testing set:

    confusion_matrix(label_test, et_preds_test)

    The output will be as follows:

    array([[ 28, 0, 14, 0],

           [ 9, 0, 0, 0],

           [ 2, 0, 112, 0],

           [ 7, 0, 0, 1]])

    From this confusion matrix, we can see that the extratrees model is having difficulties accurately predicting the first and third classes.

  18. Print the feature importance score on the test set using .feature_importance_ and save the results in a variable called et_varimp. Print its content:

    et_varimp = extra_trees_classifier.feature_importances_

    et_varimp

    The output will be as follows:

    array([0.08844544, 0.0702334 , 0.01440408, 0.37662014, 0.05965896,

           0.39063797])

The preceding output shows us that the most important features are the sixth and fourth ones, which correspond to safety and persons, respectively. It is interesting to see that RandomForest has the same two most important features but in a different order.

Note

To access the source code for this specific section, please refer to https://packt.live/2YoUY5t.

You can also run this example online at https://packt.live/3eswBcW.

You must execute the entire Notebook in order to get the desired result.

5. Artificial Intelligence: Clustering

Activity 5.01: Clustering Sales Data Using K-Means

Solution:

  1. Open a new Jupyter Notebook file.
  2. Load the dataset as a DataFrame and inspect the data:

    import pandas as pd

    file_url = 'https://raw.githubusercontent.com/'

               'PacktWorkshops/'

               'The-Applied-Artificial-Intelligence-Workshop/'

               'master/Datasets/'

               'Sales_Transactions_Dataset_Weekly.csv'

    df = pd.read_csv(file_url)

    df

    The output of df is as follows:

    Figure 5.18: Output showing the contents of the dataset

    Figure 5.18: Output showing the contents of the dataset

    If you look at the output, you will notice that our dataset contains 811 rows, with each row representing a product. It also contains 107 columns, with the first column being the product code, then 52 columns starting with W representing the sale quantity for each week, and finally, the normalized version of the 52 columns, starting with the Normalized columns. The normalized columns will be a better choice to work with rather than the absolute sales columns, W, as they will help our k-means algorithms to find the center of each cluster faster. Since we are going to work on the normalized columns, we can remove every W column plus the Product_Code column. We can also remove the MIN and MAX columns as they do not bring any value to our clustering. Also notice that the weeks run from 0 to 51 and not 1 to 52.

  3. Next, create a new DataFrame without the unnecessary columns, as shown in the following code snippet (the first 55 columns of the dataset). You should use the inplace parameter to help you:

    df2 = df.drop(df.iloc[:, 0:55], inplace = False, axis = 1)

    The output of df2 is as follows:

    Figure 5.19: Modified DataFrame

    Figure 5.19: Modified DataFrame

    In the preceding code snippet, we used the drop function of the pandas DataFrame in order to remove the first 55 columns. We also set the inplace parameter to False in order to not remove the column of our original df DataFrame. As a result, we should only have the normalized columns from 0 to 51 in df2 and df should still be unchanged.

  4. Create a k-means clustering model with 8 clusters and with random state = 8:

    from sklearn.cluster import KMeans

    k_means_model = KMeans(n_clusters=8, random_state=8)

    k_means_model.fit(df2)

    We build a k-means model with the default value for every parameter except for n_clusters=8 with random_state=8 in order to obtain 8 clusters and reproducible results.

  5. Retrieve the labels from the clustering algorithm:

    labels = k_means_model.labels_

    labels

    The output of labels will be as follows:

    Figure 5.20: Output array of labels

    Figure 5.20: Output array of labels

    It is very hard to make sense out of this output, but each index of labels represents the cluster that the product has been assigned, based on similar weekly sales trends. We can now use these cluster labels to group products together.

  6. Now, from the first DataFrame, df, keep only the W columns and add the labels as a new column, as shown in the following code snippet:

    df.drop(df.iloc[:, 53:], inplace = True, axis = 1)

    df.drop('Product_Code', inplace = True, axis = 1)

    df['label'] = labels

    df

    In the preceding code snippet, we removed all the unneeded columns and added labels as a new column in the DataFrame.

    The output of df will be as follows:

    Figure 5.21: Updated DataFrame with the new labels as a new column

    Figure 5.21: Updated DataFrame with the new labels as a new column

    Now that we have the label, we can perform aggregation on the label column in order to calculate the yearly average sales of each cluster.

  7. Perform the aggregation (use the groupby function from pandas) in order to obtain the yearly average sale of each cluster, as shown in the following code snippet:

    df_agg = df.groupby('label').sum()

    df_final = df[['label','W0']].groupby('label').count()

    df_final=df_final.rename(columns = {'W0':'count_product'})

    df_final['total_sales'] = df_agg.sum(axis = 1)

    df_final['yearly_average_sales']=

    df_final['total_sales'] / df_final['count_product']

    df_final.sort_values(by='yearly_average_sales',

                         ascending=False, inplace = True)

    df_final

    In the preceding code snippet, we first used the groupby function with the sum() method of the DataFrame to calculate the sum of every product's sales for each W column and cluster, and stored the results in df_agg. We then used the groupby function with the count() method on a single column (an arbitrary choice) of df to obtain the total number of products per cluster (note that we also had to rename the W0 column after the aggregation). The next step was to sum all the sales columns of df_agg in order to obtain the total sales for each cluster. Finally, we calculated the yearly_average_sales for each cluster by dividing total_sales by count_product. We also included a final step to sort out the cluster by the highest yearly_average_sales.

    The output of df_final will be as follows:

    Figure 5.22: Expected output on the sales transaction dataset

Figure 5.22: Expected output on the sales transaction dataset

Now, with this output, we see that our k-means model has managed to put similarly performing products together. We can easily see that the 115 products in cluster 3 are the best-selling products, whereas the 123 products of cluster 1 are performing very badly. This is very valuable for any business, as it helps them automatically identify and group together a number of similarly performing products without having any bias in the product name or description.

Note

To access the source code for this specific section, please refer to https://packt.live/3fVpSbT.

You can also run this example online at https://packt.live/3hW24Gk.

You must execute the entire Notebook in order to get the desired result.

By completing this activity, you have learned how to perform k-means clustering on multiple columns for many products. You have also learned how useful clustering can be for a business, even without label data.

Activity 5.02: Clustering Red Wine Data Using the Mean Shift Algorithm and Agglomerative Hierarchical Clustering

Solution:

  1. Open a new Jupyter Notebook file.
  2. Load the dataset as a DataFrame with sep = ";" and inspect the data:

    import pandas as pd

    import numpy as np

    from sklearn import preprocessing

    from sklearn.cluster import MeanShift

    from sklearn.cluster import AgglomerativeClustering

    from scipy.cluster.hierarchy import dendrogram

    import scipy.cluster.hierarchy as sch

    from sklearn import metrics

    file_url = 'https://raw.githubusercontent.com/'

               'PacktWorkshops/'

               'The-Applied-Artificial-Intelligence-Workshop/'

               'master/Datasets/winequality-red.csv'

    df = pd.read_csv(file_url,sep=';')

    df

    The output of df is as follows:

    Figure 5.23: df showing the dataset as the output

    Figure 5.23: df showing the dataset as the output

    Note

    The output from the preceding screenshot is truncated.

    Our dataset contains 1599 rows, with each row representing a red wine. It also contains 12 columns, with the last column being the quality of the wine. We can see that the remaining 11 columns will be our features, and we need to scale them in order to help the accuracy and speed of our models.

  3. Create features, label, and scaled_features variables from the initial DataFrame, df:

    features = df.drop('quality', 1)

    label = df['quality']

    scaled_features = preprocessing.scale(features)

    In the preceding code snippet, we separated the label (quality) from the features. Then we used preprocessing.scale function from sklearn in order to scale our features, as this will improve our models.

  4. Next, create a mean shift clustering model, then retrieve the model's predicted labels and the number of clusters created:

    mean_shift_model = MeanShift()

    mean_shift_model.fit(scaled_features)

    n_cluster_mean_shift = len(mean_shift_model.cluster_centers_)

    label_mean_shift = mean_shift_model.labels_

    n_cluster_mean_shift

    The output of n_cluster_mean_shift will be as follows:

    10

    Our mean shift model has created 10 clusters, which is already more than the number of groups that we have in our quality label. This will probably affect our extrinsic scores and might be an early indicator that wines sharing similar physicochemical properties don't belong in the same quality group.

    The output of label_mean_shift will be as follows:

    Figure 5.24: Output array of label_mean_shift

    Figure 5.24: Output array of label_mean_shift

    This is a very interesting output because it clearly shows that most wines in our dataset are very similar; there are a lot more wines in cluster 0 than in the other clusters.

  5. Now create an agglomerative hierarchical clustering model after creating a dendrogram and selecting the optimal number of clusters for it:

    dendrogram = sch.dendrogram(sch.linkage(scaled_features,

                                method='ward'))

    agglomerative_model =

    AgglomerativeClustering(n_clusters=7,

                            affinity='euclidean',

                            linkage='ward')

    agglomerative_model.fit(scaled_features)

    label_agglomerative = agglomerative_model.labels_

    The output of dendrogram will be as follows:

    Figure 5.25: Output showing the dendrogram for the clusters

    Figure 5.25: Output showing the dendrogram for the clusters

    From this output, we can see that seven clusters seems to be the optimal number for our model. We get this number by searching for the highest difference on the y axis between the lowest branch and the highest branch. In our case, for seven clusters, the lowest branch has a value of 29 and the highest branch has a value of 41.

    The output of label_agglomerative will be as follows:

    Figure 5.26: Array showing label_agglomerative

    Figure 5.26: Array showing label_agglomerative

    We can see that we have a predominant cluster, 1, but not as much as was the case in the mean shift model.

  6. Now, compute the following extrinsic approach scores for both models:

    a. Begin with the adjusted Rand index:

    ARI_mean=metrics.adjusted_rand_score(label, label_mean_shift)

    ARI_agg=metrics.adjusted_rand_score(label, label_agglomerative)

    ARI_mean

    The output of ARI_mean will be as follows:

    0.0006771608724007207

    Next, enter ARI_agg to get the expected values:

    ARI_agg

    The output of ARI_agg will be as follows:

    0.05358047852603172

    Our agglomerative model has a much higher adjusted_rand_score than the mean shift model, but both scores are very close to 0, which means that neither model is performing very well with regard to the true labels.

    b. Next, calculate the adjusted mutual information:

    AMI_mean = metrics.adjusted_mutual_info_score(label,

                                                  label_mean_shift)

    AMI_agg = metrics.adjusted_mutual_info_score(label,

                                                 label_agglomerative)

    AMI_mean

    The output of AMI_mean will be as follows:

    0.004837187596124968

    Next, enter AMI_agg to get the expected values:

    AMI_agg

    The output of AMI_agg will be as follows:

    0.05993098663692826

    Our agglomerative model has a much higher adjusted_mutual_info_score than the mean shift model, but both scores are very close to 0, which means that neither model is performing very well with regard to the true labels.

    c. Calculate the V-Measure:

    V_mean = metrics.v_measure_score(label,

                                     label_mean_shift, beta=1)

    V_agg = metrics.v_measure_score(label,

                                    label_agglomerative, beta=1)

    V_mean

    The output of V_mean will be as follows:

    0.021907254751144124

    Next, enter V_agg to get the expected values:

    V_agg

    The output of V_agg will be as follows:

    0.07549735446050691

    Our agglomerative model has a higher V-Measure than the mean shift model, but both scores are very close to 0, which means that neither model is performing very well with regard to the true labels.

    d. Next, find the Fowlkes-Mallows score:

    FM_mean = metrics.fowlkes_mallows_score(label,

                                            label_mean_shift)

    FM_agg= metrics.fowlkes_mallows_score(label,

                                           label_agglomerative)

    FM_mean

    The output of FM_mean will be as follows:

    0.5721233634622408

    Next, enter FM_agg to get the expected values:

    FM_agg

    The output of FM_agg will be as follows:

    0.3300681478007641

    This time, our mean shift model has a higher Fowlkes-Mallows score than the agglomerative model, but both scores are still on the lower range of the score, which means that neither model is performing very well with regard to the true labels.

    In conclusion, with the extrinsic approach evaluation, neither of our models were able to find clusters containing wines of a similar quality based on their physicochemical properties. We will confirm this by using the intrinsic approach evaluation to ensure that our models' clusters are well defined and are properly grouping similar wines together.

  7. Now, compute the following intrinsic approach scores for both models:

    a. Begin with the Silhouette Coefficient:

    Sil_mean = metrics.silhouette_score(scaled_features,

                                        label_mean_shift)

    Sil_agg = metrics.silhouette_score(scaled_features,

                                       label_agglomerative)

    Sil_mean

    The output of Sil_mean will be as follows:

    0.32769323700400077

    Next, enter Sil_agg to get the expected values:

    Sil_agg

    The output of Sil_agg will be as follows:

    0.1591882574407987

    Our mean shift model has a higher Silhouette Coefficient than the agglomerative model, but both scores are very close to 0, which means that both models have overlapping clusters.

    b. Next, find the Calinski-Harabasz index:

    CH_mean = metrics.calinski_harabasz_score(scaled_features,

                                              label_mean_shift)

    CH_agg = metrics.calinski_harabasz_score(scaled_features,

                                             label_agglomerative)

    CH_mean

    The output of CH_mean will be as follows:

    44.62091774102674

    Next, enter CH_agg to get the expected values:

    CH_agg

    The output of CH_agg will be as follows:

    223.5171774491095

    Our agglomerative model has a much higher Calinski-Harabasz index than the mean shift model, which means that the agglomerative model has much more dense and well-defined clusters than the mean shift model.

    c. Finally, find the Davies-Bouldin index:

    DB_mean = metrics.davies_bouldin_score(scaled_features,

                                           label_mean_shift)

    DB_agg = metrics.davies_bouldin_score(scaled_features,

                                          label_agglomerative)

    DB_mean

    The output of DB_mean will be as follows:

    0.8106334674570222

    Next, enter DB_agg to get the expected values:

    DB_agg

    The output of DB_agg will be as follows:

    1.4975443816135114

    Our agglomerative model has a higher David-Bouldin index than the mean shift model, but both scores are close to 0, which means that both models are performing well with regard to the definition of their clusters.

    Note

    To access the source code for this specific section, please refer to https://packt.live/2YXMl0U.

    You can also run this example online at https://packt.live/2Bs7sAp.

    You must execute the entire Notebook in order to get the desired result.

In conclusion, with the intrinsic approach evaluation, both our models were well defined and confirm our intuition on the red wine dataset, that is, similar physicochemical properties are not associated with similar quality. We were also able to see that in most of our scores, the agglomerative hierarchical model performs better than the mean shift model.

6. Neural Networks and Deep Learning

Activity 6.01: Finding the Best Accuracy Score for the Digits Dataset

Solution:

  1. Open a new Jupyter Notebook file.
  2. Import tensorflow.keras.datasets.mnist as mnist:

    import tensorflow.keras.datasets.mnist as mnist

  3. Load the mnist dataset using mnist.load_data() and save the results into (features_train, label_train), (features_test, label_test):

    (features_train, label_train),

    (features_test, label_test) = mnist.load_data()

  4. Print the content of label_train:

    label_train

    The expected output is this:

    array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)

    The label column contains numeric values that correspond to the 10 handwritten digits: 0 to 9.

  5. Print the shape of the training set:

    features_train.shape

    The expected output is this:

    (60000, 28, 28)

    The training set is composed of 60,000 observations of shape 28 by 28. We will need to flatten the input for our neural network.

  6. Print the shape of the testing set:

    features_test.shape

    The expected output is this:

    (10000, 28, 28)

    The testing set is composed of 10,000 observations of shape 28 by 28.

  7. Standardize features_train and features_test by dividing them by 255:

    features_train = features_train / 255.0

    features_test = features_test / 255.0

  8. Import numpy as np, tensorflow as tf, and layers from tensorflow.keras:

    import numpy as np

    import tensorflow as tf

    from tensorflow.keras import layers

  9. Set 8 as the seed for NumPy and TensorFlow using np.random_seed() and tf.random.set_seed():

    np.random.seed(8)

    tf.random.set_seed(8)

  10. Instantiate a tf.keras.Sequential() class and save it into a variable called model:

    model = tf.keras.Sequential()

  11. Instantiate layers.Flatten() with input_shape=(28,28) and save it into a variable called input_layer:

    input_layer = layers.Flatten(input_shape=(28,28))

  12. Instantiate a layers.Dense() class with 128 neurons and activation='relu', then save it into a variable called layer1:

    layer1 = layers.Dense(128, activation='relu')

  13. Instantiate a second layers.Dense() class with 1 neuron and activation='softmax', then save it into a variable called final_layer:

    final_layer = layers.Dense(10, activation='softmax')

  14. Add the three layers you just defined to the model using .add() and add a layers.Dropout(0.25) layer in between each of them (except for the flatten layer):

    model.add(input_layer)

    model.add(layer1)

    model.add(layers.Dropout(0.25))

    model.add(final_layer)

  15. Instantiate a tf.keras.optimizers.Adam() class with 0.001 as learning rate and save it into a variable called optimizer:

    optimizer = tf.keras.optimizers.Adam(0.001)

  16. Compile the neural network using .compile() with loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy']:

    model.compile(loss='sparse_categorical_crossentropy',

                  optimizer=optimizer,

                  metrics=['accuracy'])

  17. Print a summary of the model using .summary():

    model.summary()

    The expected output is this:

    Figure 6.29: Summary of the model

    Figure 6.29: Summary of the model

    This output summarizes the architecture of our neural networks. We can see it is composed of four layers with one flatten layer, two dense layers, and one dropout layer.

  18. Instantiate the tf.keras.callbacks.EarlyStopping() class with monitor='val_loss' and patience=5 as the learning rate and save it into a variable called callback:

    callback = tf.keras.callbacks.EarlyStopping(monitor='val_loss',

                                                patience=5)

  19. Fit the neural networks with the training set and specify epochs=10, validation_split=0.2, callbacks=[callback], and verbose=2:

    model.fit(features_train, label_train, epochs=10,

              validation_split = 0.2,

              callbacks=[callback], verbose=2)

    The expected output is this:

    Figure 6.30: Fitting the neural network with the training set

Figure 6.30: Fitting the neural network with the training set

We achieved an accuracy score of 0.9825 for the training set and 0.9779 for the validation set for recognizing hand-written digits after just 10 epochs. These are amazing results. In this section, you learned how to build and train a neural network from scratch using TensorFlow to classify digits.

Note

To access the source code for this specific section, please refer to https://packt.live/37UWf7E.

You can also run this example online at https://packt.live/317R2b3.

You must execute the entire Notebook in order to get the desired result.

Activity 6.02: Evaluating a Fashion Image Recognition Model Using CNNs

Solution:

  1. Open a new Jupyter Notebook.
  2. Import tensorflow.keras.datasets.fashion_mnist as fashion_mnist:

    import tensorflow.keras.datasets.fashion_mnist as fashion_mnist

  3. Load the Fashion MNIST dataset using fashion_mnist.load_data() and save the results into (features_train, label_train), (features_test, label_test):

    (features_train, label_train),

    (features_test, label_test) = fashion_mnist.load_data()

  4. Print the shape of the training set:

    features_train.shape

    The expected output is this:

    (60000, 28, 28)

    The training set is composed of 60,000 images of size 28*28.

  5. Print the shape of the testing set:

    features_test.shape

    The expected output is this:

    (10000, 28, 28)

    The testing set is composed of 10,000 images of size 28*28.

  6. Reshape the training and testing sets with the dimensions (number_rows, 28, 28, 1), as shown in the following code snippet:

    features_train = features_train.reshape(60000, 28, 28, 1)

    features_test = features_test.reshape(10000, 28, 28, 1)

  7. Standardize features_train and features_test by dividing them by 255:

    features_train = features_train / 255.0

    features_test = features_test / 255.0

  8. Import numpy as np, tensorflow as tf, and layers from tensorflow.keras:

    import numpy as np

    import tensorflow as tf

    from tensorflow.keras import layers

  9. Set 8 as the seed for numpy and tensorflow using np.random_seed() and tf.random.set_seed():

    np.random.seed(8)

    tf.random.set_seed(8)

  10. Instantiate a tf.keras.Sequential() class and save it into a variable called model:

    model = tf.keras.Sequential()

  11. Instantiate layers.Conv2D() with 64 kernels of shape (3,3), activation='relu' and input_shape=(28,28) and save it into a variable called conv_layer1:

    conv_layer1 = layers.Conv2D(64, (3,3),

                  activation='relu', input_shape=(28, 28, 1))

  12. Instantiate layers.Conv2D() with 64 kernels of shape (3,3), activation='relu' and save it into a variable called conv_layer2:

    conv_layer2 = layers.Conv2D(64, (3,3), activation='relu')

  13. Instantiate layers.Flatten() with 128 neurons and activation='relu', then save it into a variable called fc_layer1:

    fc_layer1 = layers.Dense(128, activation='relu')

  14. Instantiate layers.Flatten() with 10 neurons and activation='softmax', then save it into a variable called fc_layer2:

    fc_layer2 = layers.Dense(10, activation='softmax')

  15. Add the four layers you just defined to the model using .add() and add a MaxPooling2D() layer of size (2,2) in between each of the convolutional layers:

    model.add(conv_layer1)

    model.add(layers.MaxPooling2D(2, 2))

    model.add(conv_layer2)

    model.add(layers.MaxPooling2D(2, 2))

    model.add(layers.Flatten())

    model.add(fc_layer1)

    model.add(fc_layer2)

  16. Instantiate a tf.keras.optimizers.Adam() class with 0.001 as the learning rate and save it into a variable called optimizer:

    optimizer = tf.keras.optimizers.Adam(0.001)

  17. Compile the neural network using .compile() with loss='sparse_categorical_crossentropy', optimizer=optimizer, metrics=['accuracy']:

    model.compile(loss='sparse_categorical_crossentropy',

                  optimizer=optimizer, metrics=['accuracy'])

  18. Print a summary of the model using .summary():

    model.summary()

    The expected output is this:

    Figure 6.31: Summary of the model

    Figure 6.31: Summary of the model

    The summary shows us that there are more than 240,000 parameters to be optimized with this model.

  19. Fit the neural network with the training set and specify epochs=5, validation_split=0.2, and verbose=2:

    model.fit(features_train, label_train,

              epochs=5, validation_split = 0.2, verbose=2)

    The expected output is this:

    Figure 6.32: Fitting the neural network with the training set

    Figure 6.32: Fitting the neural network with the training set

    After training for 5 epochs, we achieved an accuracy score of 0.925 for the training set and 0.9042 for the validation set. Our model is overfitting a bit.

  20. Evaluate the performance of the model on the testing set:

    model.evaluate(features_test, label_test)

    The expected output is this:

    10000/10000 [==============================] - 1s 108us/sample - loss: 0.2746 - accuracy: 0.8976

    [0.27461639745235444, 0.8976]

We achieved an accuracy score of 0.8976 on the testing set for predicting images of clothing from the Fashion MNIST dataset. You can try on your own to improve this score and reduce the overfitting.

Note

To access the source code for this specific section, please refer to https://packt.live/2Nzt6pn.

You can also run this example online at https://packt.live/2NlM5nd.

You must execute the entire Notebook in order to get the desired result.

In this activity, we designed and trained a CNN architecture for recognizing images of clothing from the Fashion MNIST dataset.

Activity 6.03: Evaluating a Yahoo Stock Model with an RNN

Solution:

  1. Open a Jupyter Notebook.
  2. Import pandas as pd and numpy as np:

    import pandas as pd

    import numpy as np

  3. Create a variable called file_url containing a link to the raw dataset:

    file_url = 'https://raw.githubusercontent.com/'

               'PacktWorkshops/'

               'The-Applied-Artificial-Intelligence-Workshop/'

               'master/Datasets/yahoo_spx.csv'

  4. Load the dataset using pd.read_csv() into a new variable called df:

    df = pd.read_csv(file_url)

  5. Extract the values of the second column using .iloc and .values and save the results in a variable called stock_data:

    stock_data = df.iloc[:, 1:2].values

  6. Import MinMaxScaler from sklearn.preprocessing:

    from sklearn.preprocessing import MinMaxScaler

  7. Instantiate MinMaxScaler() and save it to a variable called sc:

    sc = MinMaxScaler()

  8. Standardize the data with .fit_transform() and save the results in a variable called stock_data_scaled:

    stock_data_scaled = sc.fit_transform(stock_data)

  9. Create two empty arrays called X_data and y_data:

    X_data = []

    y_data = []

  10. Create a variable called window that will contain the value 30:

    window = 30

  11. Create a for loop starting from the window value and iterate through the length of the dataset. For each iteration, append to X_data the previous rows of stock_data_scaled using window and append the current value of stock_data_scaled:

    for i in range(window, len(df)):

        X_data.append(stock_data_scaled[i - window:i, 0])

        y_data.append(stock_data_scaled[i, 0])

    y_data will contain the opening stock price for each day and X_data will contain the last 30 days' stock prices.

  12. Convert X_data and y_data into NumPy arrays:

    X_data = np.array(X_data)

    y_data = np.array(y_data)

  13. Reshape X_data as (number of rows, number of columns, 1):

    X_data = np.reshape(X_data, (X_data.shape[0],

                        X_data.shape[1], 1))

  14. Use the first 1,000 rows as the training data and save them into two variables called features_train and label_train:

    features_train = X_data[:1000]

    label_train = y_data[:1000]

  15. Use the rows after row 1,000 as the testing data and save them into two variables called features_test and label_test:

    features_test = X_data[:1000]

    label_test = y_data[:1000]

  16. Import numpy as np, tensorflow as tf, and layers from tensorflow.keras:

    import numpy as np

    import tensorflow as tf

    from tensorflow.keras import layers

  17. Set 8 as seed for NumPy and TensorFlow using np.random_seed() and tf.random.set_seed():

    np.random.seed(8)

    tf.random.set_seed(8)

  18. Instantiate a tf.keras.Sequential() class and save it into a variable called model:

    model = tf.keras.Sequential()

  19. Instantiate layers.LSTM() with 50 units, return_sequences='True', and input_shape=(X_train.shape[1], 1), then save it into a variable called lstm_layer1:

    lstm_layer1 = layers.LSTM(units=50,return_sequences=True,

                              input_shape=(features_train.shape[1], 1))

  20. Instantiate layers.LSTM() with 50 units and return_sequences='True', then save it into a variable called lstm_layer2:

    lstm_layer2 = layers.LSTM(units=50,return_sequences=True)

  21. Instantiate layers.LSTM() with 50 units and return_sequences='True', then save it into a variable called lstm_layer3:

    lstm_layer3 = layers.LSTM(units=50,return_sequences=True)

  22. Instantiate layers.LSTM() with 50 units and save it into a variable called lstm_layer4:

    lstm_layer4 = layers.LSTM(units=50)

  23. Instantiate layers.Dense() with 1 neuron and save it into a variable called fc_layer:

    fc_layer = layers.Dense(1)

  24. Add the five layers you just defined to the model using .add() and add a Dropout(0.2) layer in between each of the LSTM layers:

    model.add(lstm_layer1)

    model.add(layers.Dropout(0.2))

    model.add(lstm_layer2)

    model.add(layers.Dropout(0.2))

    model.add(lstm_layer3)

    model.add(layers.Dropout(0.2))

    model.add(lstm_layer4)

    model.add(layers.Dropout(0.2))

    model.add(fc_layer)

  25. Instantiate a tf.keras.optimizers.Adam() class with 0.001 as the learning rate and save it into a variable called optimizer:

    optimizer = tf.keras.optimizers.Adam(0.001)

  26. Compile the neural network using .compile() with loss='mean_squared_error', optimizer=optimizer, metrics=[mse]:

    model.compile(loss='mean_squared_error',

                  optimizer=optimizer, metrics=['mse'])

  27. Print a summary of the model using .summary():

    model.summary()

    The expected output is this:

    Figure 6.33: Summary of the model

    Figure 6.33: Summary of the model

    The summary shows us that there are more than 71,051 parameters to be optimized with this model.

  28. Fit the neural network with the training set and specify epochs=10, validation_split=0.2, verbose=2:

    model.fit(features_train, label_train, epochs=10,

              validation_split = 0.2, verbose=2)

    The expected output is this:

    Figure 6.34: Fitting the neural network with the training set

    Figure 6.34: Fitting the neural network with the training set

    After training for 10 epochs, we achieved a mean squared error score of 0.0025 for the training set and 0.0033 for the validation set. Our model is overfitting a little bit.

  29. Finally, evaluate the performance of the model on the testing set:

    model.evaluate(features_test, label_test)

    The expected output is this:

    1000/1000 [==============================] - 0s 279us/sample - loss: 0.0016 - mse: 0.0016

    [0.00158528157370165, 0.0015852816]

We achieved a mean squared error score of 0.0017 on the testing set, which means we can quite accurately predict the stock price of Yahoo using the last 30 days' stock price data as features.

Note

To access the source code for this specific section, please refer to https://packt.live/3804U8P.

You can also run this example online at https://packt.live/3hWtU5l.

You must execute the entire Notebook in order to get the desired result.

In this activity, we designed and trained an RNN model to predict the Yahoo stock price from the previous 30 days of data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset