Image processing

When I tried to develop this application, I found that the photos are of different size and shape: some images are tall, some of them are wide, some of them are outside, some images are inside, and most of them are pictures of food. However, some are other, random things too. Another important aspect is, while training images varied in portrait/landscape and the number of pixels, most were roughly square, and many of them were exactly 500 x 375:

Figure 9: Resized figure (left the original and tall one, right the squared one)

As we have already seen, CNN cannot work with images with a heterogeneous size and shape. There are many robust and efficient image processing techniques to extract only the region of interest (ROI). But, honestly, I am not an image processing expert, so I decided to keep this resizing step simpler.

CNN has a serious limitation as it cannot address the orientational and relative spatial relationships. Therefore, these components are not very important to a CNN. In short, CNN is not that suitable for images having heterogeneous shape and orientation. For why, people are now talking about the Capsule Networks. See more at the original paper at https://arxiv.org/pdf/1710.09829v1.pdf and https://openreview.net/pdf?id=HJWLfGWRb.

Naively, I made all the images square, but still, I tried to preserve the quality. The ROIs are centered in most cases, so capturing only the center-most square of each image is not that trivial. Nevertheless, we also need to convert each image to a grayscale image. Let's make irregularly shaped images square. Take a look at the following image, where the original one is on the left and the right is the square one (see Figure 9).

Now we have generated a square one, how did we achieve this? Well, I checked first if the height and the width are the same, if so, no resizing takes place. In the other two cases, I cropped the center region. The following method does the trick (but feel free to execute the SquaringImage.scala script to see the output):

def makeSquare(img: java.awt.image.BufferedImage): java.awt.image.BufferedImage = {
    val w = img.getWidth
    val h = img.getHeight
    val dim = List(w, h).min
    img match {
        case x 
            if w == h => img // do nothing and returns the original one
        case x 
            if w > h => Scalr.crop(img, (w - h) / 2, 0, dim, dim)
        case x 
            if w < h => Scalr.crop(img, 0, (h - w) / 2, dim, dim)
        }
    }

Well done! Now that all of our training images are square, the next import preprocessing task is to resize them all. I decided to make all the images 128 x 128 in size. Let's see how the previous (the original) one looks after resizing:

Figure 10: Image resizing (256 x 256, 128 x 128, 64 x 64 and 32 x 32 respectively)

The following method did the trick (but feel free to execute the ImageResize.scala script to see a demo):

def resizeImg(img: java.awt.image.BufferedImage, width: Int, height: Int) = {
    Scalr.resize(img, Scalr.Method.BALANCED, width, height) 
}

By the way, for the image resizing and squaring, I used some built-in packages for image reading and some third-party packages for processing:

import org.imgscalr._
import java.io.File
import javax.imageio.ImageIO

To use the preceding packages, add the following dependencies in a Maven-friendly pom.xml file:

<dependency>
    <groupId>org.imgscalr</groupId>
    <artifactId>imgscalr-lib</artifactId>
    <version>4.2</version>
</dependency>
<dependency>
    <groupId>org.datavec</groupId>
    <artifactId>datavec-data-image</artifactId>
    <version>0.9.1</version>
</dependency>
<dependency>
    <groupId>com.sksamuel.scrimage</groupId>
    <artifactId>scrimage-core_2.10</artifactId>
    <version>2.1.0</version>
</dependency>

Although DL4j-based CNNs can handle color images, it's better to simplify the computation with grayscale images. Although color images are more exciting and effective, this way we can make the overall representation simpler and space efficient.

Let's give an example of our previous step. We resized each image to a 256 x 256 pixel image represented by 16,384 features, rather than 16,384 x 3 for a color image having three RGB channels (execute GrayscaleConverter.scala to see a demo). Let's see how the converted image would look:

Figure 11: Left - original image, right the grayscale one RGB averaging

The preceding conversion is done using two methods called pixels2Gray() and makeGray():

def pixels2Gray(R: Int, G: Int, B: Int): Int = (R + G + B) / 3
def makeGray(testImage: java.awt.image.BufferedImage): java.awt.image.BufferedImage = {
    val w = testImage.getWidth
    val h = testImage.getHeight
        for { 
        w1 <- (0 until w).toVector
        h1 <- (0 until h).toVector
        } 
    yield 
    {
    val col = testImage.getRGB(w1, h1)
    val R = (col & 0xff0000) / 65536
    val G = (col & 0xff00) / 256
    val B = (col & 0xff)
    val graycol = pixels2Gray(R, G, B)
testImage.setRGB(w1, h1, new Color(graycol, graycol, graycol).getRGB)
    }
testImage
}

So what happens under the hood? We chain the preceding three steps: make all the images square, then convert all of them to 25 x 256, and finally convert the resized image into a grayscale one:

val demoImage = ImageIO.read(new File(x))
    .makeSquare
    .resizeImg(resizeImgDim, resizeImgDim) // (128, 128)
    .image2gray

So, in summary, we now have all the images in gray after squaring and resizing. The following image gives some sense of the conversion step:

Figure 12: Resized figure (left the original and tall one, right the squared one)

The following chaining also comes with some additional effort. Now we put these three steps together in the code, and we can finally prepare all of our images:

import scala.Vector
import org.imgscalr._

object imageUtils {
    implicitclass imageProcessingPipeline(img: java.awt.image.BufferedImage) {
    // image 2 vector processing
    def pixels2gray(R: Int, G:Int, B: Int): Int = (R + G + B) / 3
    def pixels2color(R: Int, G:Int, B: Int): Vector[Int] = Vector(R, G, B)
    private def image2vec[A](f: (Int, Int, Int) => A ): Vector[A] = {
        val w = img.getWidth
        val h = img.getHeight
        for {
            w1 <- (0 until w).toVector
            h1 <- (0 until h).toVector
            } 
        yield {
            val col = img.getRGB(w1, h1)
            val R = (col & 0xff0000) / 65536
            val G = (col & 0xff00) / 256
            val B = (col & 0xff)
        f(R, G, B)
                }
            }

    def image2gray: Vector[Int] = image2vec(pixels2gray)
    def image2color: Vector[Int] = image2vec(pixels2color).flatten

    // make image square
    def makeSquare = {
        val w = img.getWidth
        val h = img.getHeight
        val dim = List(w, h).min
        img match {
            case x     
                if w == h => img
            case x 
                if w > h => Scalr.crop(img, (w-h)/2, 0, dim, dim)
            case x 
                if w < h => Scalr.crop(img, 0, (h-w)/2, dim, dim)
              }
            }

    // resize pixels
    def resizeImg(width: Int, height: Int) = {
        Scalr.resize(img, Scalr.Method.BALANCED, width, height)
            }
        }
    }

Table of Contents for Image processing

Create new playlist

Sign In

Sign Up

Table of Contents for
Image processing