Data analysis pre-processing

In this section we are going to analyze and preprocess the input images and have it in an acceptable format for our learning algorithm, which is the convolution neural networks here.

So let's start off by importing the required packages for this implementation:

import numpy as np
np.random.seed(2018)
import os
import glob
import cv2
import datetime
import pandas as pd
import time
import warnings
warnings.filterwarnings("ignore")
from sklearn.cross_validation import KFold
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Flatten
from keras.layers.convolutional import Convolution2D, MaxPooling2D, ZeroPadding2D
from keras.optimizers import SGD
from keras.callbacks import EarlyStopping
from keras.utils import np_utils
from sklearn.metrics import log_loss
from keras import __version__ as keras_version

In order to use the images provided in the dataset, we need to get them to have the same size. OpenCV is a good choice for doing this, from the OpenCV website:

OpenCV (Open Source Computer Vision Library) is released under a BSD license and hence it’s free for both academic and commercial use. It has C++, C, Python and Java interfaces and supports Windows, Linux, Mac OS, iOS and Android. OpenCV was designed for computational efficiency and with a strong focus on real-time applications. Written in optimized C/C++, the library can take advantage of multi-core processing. Enabled with OpenCL, it can take advantage of the hardware acceleration of the underlying heterogeneous compute platform.
You can install OpenCV by using the python package manager by issuing, pip install opencv-python
# Parameters
# ----------
# img_path : path
# path of the image to be resized
def rezize_image(img_path):
#reading image file
img = cv2.imread(img_path)
#Resize the image to to be 32 by 32
img_resized = cv2.resize(img, (32, 32), cv2.INTER_LINEAR)
return img_resized

Now we need to load all the training samples of our dataset and resize each image, according to the previous function. So we are going to implement a function that will load the training samples from the different folders that we have for each fish type:

# Loading the training samples and their corresponding labels
def load_training_samples():
#Variables to hold the training input and output variables
train_input_variables = []
train_input_variables_id = []
train_label = []
# Scanning all images in each folder of a fish type
print('Start Reading Train Images')
folders = ['ALB', 'BET', 'DOL', 'LAG', 'NoF', 'OTHER', 'SHARK', 'YFT']
for fld in folders:
folder_index = folders.index(fld)
print('Load folder {} (Index: {})'.format(fld, folder_index))
imgs_path = os.path.join('..', 'input', 'train', fld, '*.jpg')
files = glob.glob(imgs_path)
for file in files:
file_base = os.path.basename(file)
# Resize the image
resized_img = rezize_image(file)
# Appending the processed image to the input/output variables of the classifier
train_input_variables.append(resized_img)
train_input_variables_id.append(file_base)
train_label.append(folder_index)
return train_input_variables, train_input_variables_id, train_label

As we discussed, we have a test set that will act as the unseen data to test the generalization ability of our model. So we need to do the same with testing images; load them and do the resizing processing:

def load_testing_samples():
# Scanning images from the test folder
imgs_path = os.path.join('..', 'input', 'test_stg1', '*.jpg')
files = sorted(glob.glob(imgs_path))
# Variables to hold the testing samples
testing_samples = []
testing_samples_id = []
#Processing the images and appending them to the array that we have
for file in files:
file_base = os.path.basename(file)
# Image resizing
resized_img = rezize_image(file)
testing_samples.append(resized_img)
testing_samples_id.append(file_base)
return testing_samples, testing_samples_id

Now we need to invoke the previous function into another one that will use the load_training_samples() function in order to load and resize the training samples. Also, it will add a few lines of code to convert the training data into NumPy format, reshape that data to fit into our classifier, and finally convert it to float:

def load_normalize_training_samples():
# Calling the load function in order to load and resize the training samples
training_samples, training_label, training_samples_id = load_training_samples()
# Converting the loaded and resized data into Numpy format
training_samples = np.array(training_samples, dtype=np.uint8)
training_label = np.array(training_label, dtype=np.uint8)
# Reshaping the training samples
training_samples = training_samples.transpose((0, 3, 1, 2))
# Converting the training samples and training labels into float format
training_samples = training_samples.astype('float32')
training_samples = training_samples / 255
training_label = np_utils.to_categorical(training_label, 8)
return training_samples, training_label, training_samples_id

We also need to do the same with the test:

def load_normalize_testing_samples():
# Calling the load function in order to load and resize the testing samples
testing_samples, testing_samples_id = load_testing_samples()
# Converting the loaded and resized data into Numpy format
testing_samples = np.array(testing_samples, dtype=np.uint8)
# Reshaping the testing samples
testing_samples = testing_samples.transpose((0, 3, 1, 2))
# Converting the testing samples into float format
testing_samples = testing_samples.astype('float32')
testing_samples = testing_samples / 255
return testing_samples, testing_samples_id
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset