How to do it...

Let's go ahead and implement this strategy in code (The code file is available as Adversarial_attack.ipynb in GitHub):

Read the image of a cat:

import matplotlib.pyplot as plt
%matplotlib inline
img = cv2.imread('/content/cat.JPG')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.resize(img, (299,299))
plt.imshow(img)
plt.axis('off')

The plot of the image looks as follows:

Preprocess the image so that it can then be passed to an inception network:

original_image = cv2.resize(img,(299,299)).astype(float)
original_image /= 255.
original_image -= 0.5
original_image *= 2.
original_image = np.expand_dims(original_image, axis=0)

Import the pre-trained model:

import numpy as np
from keras.preprocessing import image
from keras.applications import inception_v3
model = inception_v3.InceptionV3()

Predict the class of the object present in the image:

predictions = model.predict(original_image)
predicted_classes = inception_v3.decode_predictions(predictions, top=1)
imagenet_id, name, confidence = predicted_classes[0][0]
print("This is a {} with {:.4}% confidence".format(name, confidence * 100))

The preceding code results in the following:

" This is a Persian_cat with 95.45% confidence"

Define the input and output:

model = inception_v3.InceptionV3()
model_input_layer = model.layers[0].input
model_output_layer = model.layers[-1].output

model_input_layer is the input to the model and model_output_layer is the probability of various classes for the input image (the last layer with softmax activation).

Set the limits of change for the original image:

max_change_above = np.copy(original_image) + 0.01
max_change_below = np.copy(original_image) - 0.01
hacked_image = np.copy(original_image)

In the preceding code, we are specifying the limits to which the original image can be changed.

Initialize the cost function so that the object type to fake is an African elephant (386^th index value in the prediction vector):

learning_rate = 0.1
object_type_to_fake = 386
cost_function = model_output_layer[0, object_type_to_fake]

The output of model_output_layer is the probability of various classes for the image of interest. In this instance, we are specifying that the cost function will be dictated by the index location of the object we are trying to fake our object into.

Initialize the gradient function of the cost with respect to the input:

gradient_function = K.gradients(cost_function, model_input_layer)[0]

This code calculates the gradient of cost_function with respect to the change in model_input_layer (which is the input image).

Map the cost and gradient functions with respect to the input:

grab_cost_and_gradients_from_model = K.function([model_input_layer], [cost_function, gradient_function])
cost = 0.0

In the preceding code, we are calculating the values of cost_function (the probability of the image belonging to the African elephant class) and the gradients with respect to the input image.

Keep updating the input image with respect to gradients until the probability of the resulting image being an African elephant is at least 80%:

while cost < 0.80:
    cost, gradients = grab_cost_and_gradients_from_model([hacked_image, 0])
    hacked_image += gradients * learning_rate
    hacked_image = np.clip(hacked_image, max_change_below, max_change_above)
    print("Model's predicted likelihood that the image is an African elephant: 
{:.8}%".format(cost * 100))

In the preceding code, we are obtaining the cost and gradients that correspond to the input image (hacked_image). Additionally, we are updating the input image by the gradient (which is multiplied by the learning rate). Finally, if the hacked image crosses the threshold of the maximum changes of the input image, we'll clip it.

Keep looping through these steps until you achieve a probability that the input image is at least 0.8.

The variation of the probability of the image of persian cat being detected as the image of an African elephant over increasing epochs is as follows:

epochs = range(1, len(prob_elephant) + 1)
plt.plot(epochs, prob_elephant, 'b')
plt.title('Probability of African elephant class')
plt.xlabel('Epochs')
plt.ylabel('Probability')
plt.grid('off')

The variation of probability of the modified image belonging to the African elephant class is as follows:

Predict the class of the updated image:

model.predict(hacked_image)[0][386]

The output of the predict method, which provides the probability of the modified image belonging to African elephant class, is 0.804.

De-process the updated input image (as it was pre-processed to scale it) so that it can be visualized:

hacked_image = hacked_image/2
hacked_image = hacked_image + 0.5
hacked_image = hacked_image*255
hacked_image = np.clip(hacked_image, 0, 255).astype('uint8')

plt.subplot(131)
plt.imshow(img)
plt.title('Original image')
plt.axis('off')
plt.subplot(132)
plt.imshow(hacked_image[0,:,:,:])
plt.title('Hacked image')
plt.axis('off')
plt.subplot(133)
plt.imshow(img - hacked_image[0,:,:,:])
plt.title('Difference')
plt.axis('off')

The combination of the original image, the modified (hacked) images and the difference between the two images is printed as follows:

Note that the output is now visually indistinguishable from the original image.

It is interesting to note that with hardly any change in pixel values from the original image, we have fooled the neural network (the inception v3 model) so that it now predicts a different class. This is a great example of some of the security flaws that you could encounter if the algorithm that was used to come up with a prediction is exposed to users who could build images that can fool the system.

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...