CHAPTER 8
Java Programming for AI Applications

“Life is like riding a bicycle. To keep your balance you must keep moving.”

—Albert Einstein

8.1 What Is Artificial Intelligence?

Artificial intelligence (AI) is another hot buzzword at the moment. Just listen to the news—we have AI in this, AI in that, AI everywhere! So, what exactly is artificial intelligence? How is AI going to affect our lives? Is AI going to put all of us out of work one day?

Artificial intelligence is an area of computer science to create machines to do intelligent things, such as learning, planning, problem-solving, prediction, face and speech recognition, and so on. The beginning of artificial intelligence dates back to the 1950s, when Alan Turing, an English computer scientist, proposed the imitation game test to see if a computer could think and behave indistinguishably from a human. This is the famous Turing test. No computer has passed the Turing test so far.

Artificial intelligence as a research discipline was established in a workshop in 1956. The term artificial intelligence was coined by John McCarthy, a legendary computer scientist at Stanford University, who was also one of the most influential founders and leaders of AI research.

Artificial intelligence can be divided into narrow AI, general AI, and super AI.

  • Narrow AI, or weak AI, is the intelligence to perform one single task. Examples of narrow AI include weather forecasts, making purchase suggestions, sales predictions, computer vision, natural language processing, speech recognition, playing chess, and Google Translate. Narrow AI is what we have achieved so far. Narrow AI is where we are, and general AI is where we are going.
  • General AI, or strong AI, is the intelligence that can deal with more complex, more general tasks. General AI would possess the cognitive abilities and would be able to understand its environments. General AI would be able to observe, think, analyze, learn, invent, and have feelings, like humans. According to Ray Kurzweil, a well-known futurist and Google's director of engineering, by 2029 AI will pass the Turing test and by 2045, the technological singularity will occur. The singularity is the time when artificial intelligence starts to overtake human beings; Figure 8.1 illustrates the concept.
Graph depicting Time on the horizontal axis, Intellectual Power on the vertical axis, and two curves intersecting at The Singularity with Human Intelligence, Superintelligence, and Artificial Intelligence.

Figure 8.1: The technological singularity and the timeline of artificial intelligence compared with human intelligence

  • Super AI, or superintelligence, is AI after the singularity point. There are different views on what will happen with super AI. Some people have expressed worries and fears; for example, Elon Musk, SpaceX founder and CEO of Tesla Motors, has famously called AI the biggest existential threat. Stephen Hawking, the English theoretical physicist, has warned that AI is going to end humanity one day. Many others, such as Bill Gates (founder of Microsoft) and Mark Zuckerberg (founder of Facebook), believe that AI will benefit the human race. Just like many earlier technological revolutions, it will destroy some jobs but will create more new jobs. So ready or not, like it or not, AI is coming. Therefore, we will need to be prepared, to make sure that we will benefit from AI, and, even more importantly, to make sure that the doomsday scenario depicted in Hollywood movies such as The Terminator (with Arnold Schwarzenegger) never happen.

8.1.1 History of AI

The history of artificial intelligence research can be roughly divided into three stages, focusing on different techniques: neural networks (1950s–1970s), machine learning (1980s–2010s), and deep learning (the present day), as illustrated in Figure 8.2 (https://www.sas.com/en_gb/insights/analytics/what-is-artificial-intelligence.html).

Screen capture depicting SAS website for history of artificial intelligence research.

Figure 8.2: The history of artificial intelligence research according to Statistical Analysis System (SAS)

  • Neural Networks Neural networks were developed based on human biological neural networks. Neural networks typically consist of three distinct layers: one input layer, one hidden layer, and one output layer. Once a neural network has been trained with a large amount of given data, it can be used to predict the output for unseen data. Neural networks attracted a lot of attention from the 1950s through the 1970s and stimulated much enthusiasm and optimism. But since the 1980s, following many disappointments and much criticism, funding and interest in artificial intelligence research were significantly reduced. This period is also called AI winter.
  • Machine Learning Machine learning (ML) is a set of mathematical algorithms for automatic data analysis. ML started to flourish during the 1980s through the 2010s and includes popular algorithms such as support vector machine (SVM), K-mean clustering, linear regression, and native Bayes, to name a few.
  • Deep Learning Deep learning uses neural networks with multiple hidden layers. This approach has been possible only since the 2010s, with the increase in available computing power, particular graphics processing units (GPUs), and improved algorithms. Today, with ever-increasing massive labeled data sets and ever-increasing GPU computing power, deep learning has shown huge potential in many application areas.

Figure 8.3 shows the timeline as well as the differences between artificial intelligence, machine learning, and deep learning, from the Nvidia site (https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/).

Image described by caption and surrounding text.

Figure 8.3: The timeline and differences between artificial intelligence, machine learning, and deep learning from Nvidia

8.1.2 Cloud AI vs. Edge AI

Many AI applications require large training data sets and enormous computing power. For these purposes, it is beneficial to run AI on the cloud. Many technology giants, such as Google, IBM, Microsoft, Amazon, Alibaba, and Baidu, all provide cloud-based AI services. There are many advantages for customers using cloud-based AI services. You don't need to purchase expensive hardware, you pay for what you use, and there is no need to worry about software installation, configuration, or troubleshooting and upgrading. The disadvantages of cloud-based AI are in latency, bandwidth requirements, and security. Because you need to send the data to the cloud and get the results back, there is latency in cloud-based AI. Sending a large amount of data to the cloud also requires a lot of bandwidth. Finally, if the cloud service is hacked, your data or information might be lost or stolen. For many IoT or other real-time applications, latency, bandwidth, and security could be problems. This is where Edge AI could be useful.

Edge AI means running AI software on the edge devices, such as microcontrollers, smart phones, or other devices. It is also called on-device AI. The advantages of Edge AI are that it operates in real time, can work offline, and is secure. Applications such as voice recognition, face recognition, object detection, driverless cars, and so on, can all use Edge AI. Cloud AI and Edge AI will be largely complementary to each other, and you can choose to use whichever is the best for your applications.

The following are some examples of Edge AI devices and applications.

8.2 Neural Networks

A neural network (NN), also called an artificial neural network (ANN), is a mathematical algorithm for problem-solving. The concept of artificial neural networks was first developed in 1943, by Warren S. McCulloch (neuroscientist) and Walter Pitts (logician) in the United States, inspired by the biological neural networks that constitute the human brain. The biological neural networks are made of a large number of interconnected neurons. The human brain typically has about 100 billion neurons. Each neuron consists of three main parts: the dendrites, a cell body (soma), and an axon. Dendrites are the tree-like structures for receiving input signals from surrounding neurons, the cell body is for processing the input signals, and the axon is for connecting to another neuron's dendrites; the contact is made through a synapse. Synapses allow a neuron to pass an electrical or chemical signal to another neuron, and the strength of a synaptic connection varies. A neuron will sum all the inputs and then fire an output signal via an axon to the next neuron. This signal can be either excitatory or inhibitory, which means increasing or decreasing the firing, depending on certain conditions. Figure 8.4 shows the typical structure of a neuron and different types of neurons (https://en.wikipedia.org/wiki/Neuron).

Image described by caption and surrounding text.

Figure 8.4: The structure of a typical neuron (top left), a multipolar neuron (top right), SMI32-stained pyramidal neurons in the cerebral cortex (bottom left), and Golgi-stained neurons in human hippocampal tissue (bottom right)

Figure 8.5 shows an interesting neural network tutorial that explains how to use mathematical functions to create an artificial neuron based on real neurons (https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/neural_networks.html). More details are available in next section.

Image described by caption and surrounding text.

Figure 8.5: Natural and artificial neurons

8.2.1 The Perceptron

Like the biological neural networks, artificial neural networks are also made of an interconnected individual neuron, called perceptron. The perceptron is the most fundamental element of the neural networks. The perceptron algorithm was defined by Frank Rosenblatt, an American psychologist, at the Cornell Aeronautical Laboratory in the United States in 1957. Figure 8.6 shows the structure of a perceptron, which has inputs (dendrites), a body, and an output (axon). The weight for each input reflects the synaptic connection strength. The perceptron adds up all the inputs according to their weight and a bias and then feeds the result into an activation function, which will decide the output of the perceptron. A perceptron is a typical feedforward network, where the connections between the nodes do not form a cycle. There is no feedback between layers.

Schematic diagram depicting structure of a perceptron, which has inputs (dendrites), a body, and output (axon) with Sigmoid and Step.

Figure 8.6: The structure of a perceptron, which has inputs (dendrites), a body, and output (axon)

If equation are the inputs of the perceptron, equation are the corresponding weights for the inputs, equation is the total number of inputs, and equation is the corresponding bias, then the weighted sum of the inputs can be calculated as follows:

equation

Here, equation means equation multiplies equation, i is the ith term of each set of data, and equation means to calculate the sum of equation from term 1 to n. This will then be fed into an activation function to generate the output of the perceptron.

equation

There are several popular choices for the activation function. The simplest and most commonly used activation function, equation, is a step function, which gives an output of 0 or 1.

equation

Another commonly used activation function is the sigmoid function, which is a smoothed version of the step function. A sigmoid function gives a continuous output between 0 and 1.

equation

Similar to the sigmoid function, another common choice for equation is the hyperbolic tangent, or tanh, function.

equation

Next you need to train the perceptron. To do that, you need a set of data samples, with given inputs and desired output. The training is done by continuously adjusting equation and equation values, until for a given input (equation), you can get the desired output (equation). This needs to be done over several iterations, called an epoch. The following pseudocode shows the logic:

  • Generate random weights and bias
  • For each of iteration
    • For each set of sample
      • //Calculate the output
      • equation
      • //Calculate the error
      • equation
      • //Calculate the adjustment (gradient)
      • equation
      • //Update the weight
      • equation
      • //Update the bias
      • equation
      • Until sample finished

         

      • if error is small enough or total number of epoch reached
        • stop
      • Else
        • continue.
  • End of iteration

This is called the backpropagation method, and the key is to calculate the gradient for adjusting the weights and bias. Once trained, the perceptron should be able to produce output for any unseen data. Perceptrons have been successfully used in many applications, such as logical operations, AND, OR, NOT, and XOR. However, a single perceptron, or a single layer of perceptrons, is not sophisticated enough to solve complex problems. Multiple-layer perceptrons were therefore developed. Section 8.6 contains Java example programs for a single perceptron and multiple-layer perceptrons.

See the following for more information about perceptrons:

https://appliedgo.net/perceptron/

http://neuralnetworksanddeeplearning.com/chap1.html

https://github.com/mnielsen/neural-networks-and-deep-learning

https://towardsdatascience.com/what-the-hell-is-perceptron-626217814f53

https://natureofcode.com/book/chapter-10-neural-networks/

8.2.2 MultiLayered Perceptron/Backpropagation/Feedforward

Conventional neural networks are made of multilayered perceptrons (MLPs), which typically have three layers: input layer, hidden layer, and output layer. Each layer can have a number of perceptrons. Figure 8.7 shows an example of a neural network that has four perceptrons in the input layer (that is, four inputs), three perceptrons in the hidden layer, and two perceptrons in the output layer.

Schematic diagram depicting Traditional neural network with one input layer, one hidden layer, and one output layer.

Figure 8.7: Traditional neural network with one input layer, one hidden layer, and one output layer

If equation, are the inputs of the input layer, equation are the corresponding weights of the outputs of the input layer, and equation, are the corresponding biases for the hidden layer, then the outputs of the hidden layer equation can be calculated as follows:

equation

Then, if equation are the corresponding weights of the outputs of the hidden layer and equation are the corresponding biases for the output layer, the outputs of the output layer equation, can be calculated as follows:

equation

By using a sigmoid activation function, you can calculate the final output of the neural network as follows:

equation

Again, you can train the network using the backpropagation method, described earlier. You need a set of data samples, with given inputs and desired output. The training is done by continuously adjusting equation and equation values, until for a given input (equation), you can get the desired output (equation). This needs to be done over several iterations, called epochs. The following pseudocode shows the logic:

  • Generate random weights and bias
  • For each of iteration
    • For each set of sample
      • //Calculate the hidden layer output
      • equation
      • //Calculate the hidden layer output
      • equation
      • //Calculate the output layer error
      • equation
      • //Calculate the hidden layer error
      • equation
      • //Update the hidden layer weights,
      • equation
      • //Update the hidden layer weights,
      • equation
      • //Update the hidden layer bias
      • equation
      • //Update the output layer bias
      • equation
    • Until sample finished
    • if error is small enough or total number of epoch reached
      • stop
    • Else
      • continue
  • End of iteration

Figure 8.7 shows a traditional neural network with one input layer, one hidden layer, and one output layer. Each individual neuron has the inputs and outputs illustrated in Figure 8.6 earlier.

See the following resources for more information on neural networks:

https://www.nnwj.de/

https://www.cse.unsw.edu.au/~cs9417ml/MLP2/

https://kunuk.wordpress.com/2010/10/11/neural-network-backpropagation-with-java/

https://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html

http://diffsharp.github.io/DiffSharp/0.6.3/examples-neuralnetworks.html

http://www.theprojectspot.com/tutorial-post/introduction-to-artificial-neural-networks-part-1/7

http://www.theprojectspot.com/tutorial-post/introduction-to-artificial-neural-networks-part-2-learning/8

https://machinelearningmastery.com/neural-networks-crash-course/

8.3 Machine Learning

Machine learning (ML) is a category of mathematical algorithms that allow software to become more accurate in predicting outcomes for a given set of data. The term was coined in 1959 by Arthur Samuel (an American Pioneer in computer and artificial intelligence), while at IBM. ML can be divided into the categories supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning.

  • Supervised Learning In supervised learning, the algorithms are trained using labeled data. The learning algorithms calculate the output with a given input, compare the calculated output with desired output, and then adjust the algorithms accordingly. A good example is to use a support vector machine to classify the type of iris according to its sepal length, sepal width, petal length, and petal width. Other examples include speech recognition, handwriting recognition, pattern recognition, spam detection, and optical character recognition.
  • Unsupervised Learning In unsupervised learning, the algorithms are fed unlabeled data. The learning algorithms will study the structure of the data and divide it into groups with the closest features. K-mean clustering is a popular example of a type of unsupervised learning algorithm. Examples of unsupervised learning applications include grouping customers according to their purchasing behavior, associating certain customers with certain types of products, and so on.
  • Semi-supervised Learning In semi-supervised learning, both labeled and unlabeled data is used. This approach is particularly suitable when the cost for labeling is too high to allow a fully labeled training process or when not all the data can be labeled. Examples of semi-supervised learning include speech analysis and web content analysis.
  • Reinforcement Learning In reinforcement learning, the learning algorithms learn to find, through trial and error, which action can yield the greatest reward. This is normally done in the absence of existing training data. Reinforcement learning is often used in robotics, gaming, and navigation.

The following is a list of commonly used machine learning algorithms:

  • Linear regression
  • Logistic regression
  • Linear discriminant analysis
  • Classification and regression trees
  • Naive Bayes
  • K-mean clustering
  • Learning vector quantization
  • Support vector machines
  • Bagging and random forest
  • Boosting and AdaBoost

Figure 8.8 shows the machine learning information web page from SAS (https://www.sas.com/en_gb/insights/analytics/machine-learning.html).

Image described by caption and surrounding text.

Figure 8.8: The machine learning web page from SAS

See the following resources for more information on machine learning:

https://www.toptal.com/machine-learning/machine-learning-theory-an-introductory-primer

https://www.kaggle.com/kanncaa1/machine-learning-tutorial-for-beginners

https://www.digitalocean.com/community/tutorials/an-introduction-to-machine-learning

8.4 Deep Learning

Conventional neural networks have only three layers: one input layer, one hidden layer, and one output layer. There is only one hidden layer, because the training of neural networks is done using a method known as gradient descent. This is an iterative algorithm for finding the minimum of a function. It starts with an initial value and then takes steps proportional to the negative of the gradient of the function at the current point until it reaches the minimum value, where the gradient is close to zero. As the number of hidden layers increases, training also becomes slow and difficult. This is called the vanishing gradient problem.

In 2009, a free database—ImageNet—with more than 14 million labeled images was launched by AI professor Fei-Fei Li at Stanford University. The aim was to use big data to improve machine learning. In 2010, the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) was started. In this annual challenge, contestants were encouraged to train their algorithms using ImageNet and submit their predictions. The breakthrough came in 2012, when AlexNet, a convolutional neural network (CNN) developed by Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton (University of Toronto), halved the existing prediction error rate to 15.3 percent, more than 10.8 percentage points ahead of the runner up. AlexNet had several key features. First, AlexNet had eight layers, of which the first five were convolutional layers, and the last three were fully connected layers, as illustrated in Figure 8.9 (http://www.mdpi.com/2072-4292/9/8/848).

Schematic diagram depicting AlexNet architecture with Input data, Conv1 to Conv5, and FC6 to FC8.

Figure 8.9: The AlexNet architecture

Convolutional layers apply a convolution operation to the input, which reduces the number of parameters of the problem, to allow deep layers with fewer parameters. In fully connected layers, every neuron in one layer is connected to every neuron in another layer. As a result of this eight-layer architecture, there are 60 million parameters. Second, AlexNet used graphics processing units (GPUs) to train the model. GPUs are essentially parallel floating-point calculators, which are much faster than conventional central processing units (CPUs). Using GPUs meant they could train larger models, which led to lower error rates. Finally, they used the non-saturating rectified linear activation unit (ReLU) activation function, which had reduced overfitting and improved training performance over other activation functions such as tanh and sigmoid. Today, AlexNet has made a significant impact on deep learning, particularly machine vision. Through 2018, AlexNet has been cited more than 25,000 times. Figure 8.10 shows some examples of the activation functions in neural networks (https://towardsdatascience.com/activation-functions-neural-networks-1cbd9f8d91d6).

Screen capture depicting activation functions in neural networks with sigmoid and ReLU graphs.

Figure 8.10: The activation functions in neural networks

Another impressive winner of the ILSVRC challenge was GoogLeNet in 2014. GoogLeNet has achieved an amazing error rate of 6.67 percent. This is equivalent to human performance on this dataset. GoogLeNet is a convolutional neural network (CNN) 22 layers deep and has reduced the number of parameters from 60 million (AlexNet) to 4 million. Figure 8.11 shows a colorful explanation of how a convolutional neural network (CNN) works (https://indoml.com/2018/03/07/student-notes-convolutional-neural-networks-cnn-introduction/). CNNs are effective for any type of prediction problem involving image data as an input.

Image described by caption and surrounding text.

Figure 8.11: A colorful explanation of how convolutional neural networks

Finally, last but not least, is the winner of the ILSVRC challenge in 2015: Residual Neural Network (ResNet) developed by Kaiming He et al. from Microsoft. It achieved an error rate of 3.57 percent, which is better than the human-level performance on this dataset. ResNet used a novel architecture with “skip connections” and features heavy batch normalization, which allowed them to train a neural network with 152 layers while still having lower complexity.

Another type of deep learning network is the recurrent neural network (RNN), which was designed to work with sequence prediction problems. Examples of sequence prediction problems include one-to-many, many-to-one, and many-to-many. A one-to-many problem is when an observation as input is mapped to multiple outputs. A many-to-one problem is when a sequence of multiple inputs is mapped to a single output prediction. A many-to-many problem is when a sequence of multiple inputs is mapped to multiple outputs. You can use RNN for text data, speech data, and time-series data.

Figure 8.12 shows the deep learning information web site from SAS (https://www.sas.com/en_gb/insights/analytics/deep-learning.html). Figure 8.13 shows the Keras tutorial web site on deep learning in Python (https://www.datacamp.com/community/tutorials/deep-learning-python).

Image described by caption and surrounding text.

Figure 8.12: The Deep Learning web site from SAS

Image described by caption and surrounding text.

Figure 8.13: The Keras tutorial web site on deep learning in Python

See the following resources for more information about deep learning:

https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

https://machinelearningmastery.com/crash-course-convolutional-neural-networks/

https://machinelearningmastery.com/crash-course-recurrent-neural-networks-deep-learning/

8.5 Java AI Libraries

The following is a list of Java AI libraries.

Expert Systems

Neural Networks

Natural Language Processing

Machine Learning

Computer Vision

Other Resources

8.6 Java Examples for Neural Networks

Now let's look at some examples of Java applications for neural networks.

8.6.1 Java Perceptron Example

Example 8.1 shows a simple Java perceptron (single neuron) application. It has two Java files, Neuron1.java and Neuron1Demo.java. The Neuron1.java file is the single neuron class, which defines the input and output. The output is simply the weighted sum of all inputs. Neuron1Demo.java is the example program that uses the Neuron1 class to create a perceptron object. Figure 8.14 shows the compilation, execution, and output of the Neuron1Demo.java program.

Screen capture depicting code of compilation, execution and output of Neuron1Demo.java.

Figure 8.14: The compilation, execution, and output of Neuron1Demo.java

Example 8.1A is the code for Neuron1.java.

Example 8.1B is the code for Neuron1Demo.java.

Example 8.2 shows another Java perceptron implementation, this time with training. It also consists of two Java files, Neuron2.java and Neuron2Demo.java. The Neuron2.java file is the single neuron class, which has defined the input and output. The output is simply the weighted sum of all inputs. It also has a Train() method that can train the perceptron to make it behave as a logical AND function. The Neuron2Demo.java file is the example program that uses the Neuron2 class to create a perceptron object. Figure 8.15 shows the compilation, execution, and output of the Neuron2Demo.java program.

Screen capture depicting code of compilation, execution and output of Neuron2Demo.java.

Figure 8.15: The compilation, execution, and output of the Neuron2Demo.java program

Example 8.2A lists the code for Neuron2.java.

Example 8.2B lists the code for Neuron2Demo.java.

8.6.2 Java Neural Network Backpropagation Example

Example 8.3 shows a Java backpropagation neural network example. It is adapted from the following code example:

https://supundharmarathne.wordpress.com/2012/11/23/a-simple-backpropagation-example-of-neural-network/

This program creates a simple neural network with one input layer, one hidden layer, and one output layer. The input layer has four neurons, the hidden layer has three neurons, and the output layer has two neurons. Figure 8.16 shows the compilation, execution, and output of the BackpropagationDemo1.java program after 10 iterations. Figure 8.17 shows the compilation, execution, and output of BackpropagationDemo1.java after 1,000 iterations. Please note that your program's outputs might be different from Figures 8.16 and 8.17, as the parameters initial values are generated randomly.

Screen capture depicting code of compilation, execution, and output of the BackpropagationDemo1.java program after 10 iterations.

Figure 8.16: The compilation, execution, and output of the BackpropagationDemo1.java program after 10 iterations

Screen capture depicting code of compilation, execution, and output of the BackpropagationDemo1.java program after 1,000 iterations.

Figure 8.17: The compilation, execution, and output of BackpropagationDemo1.java after 1,000 iterations

8.7 Java Examples for Machine Learning

Several Java-based library packages are available for machine learning. One commonly used library is Waikato Environment for Knowledge Analysis (Weka), which is a collection of machine learning algorithms for data mining tasks developed at the University of Waikato in New Zealand. It contains tools for data preparation, classification, regression, clustering, association rules mining, and visualization.

To use Weka, first you will need to download the Weka library.

http://www.cs.waikato.ac.nz/ml/weka/snapshots/weka_snapshots.html

Download the file stable-3-8.zip, and unzip it to a folder. Find the JAR file weka.jar and data file iris.arff.

Use IntelliJ IDEA to create a new Java project, and add the weka.jar and iris.arff files to the IntelliJ IDEA project. Create an empty Java class named WekaTest.java, and copy the source code in Example 8.4 to the WekaTest.java file. This is a simple Weka classification demo modified based on the example code at the following link:

https://www.programcreek.com/2013/01/a-simple-machine-learning-example-in-java/

This example reads the Iris classification data from the file, splits it into a training set and a testing set, runs the J48 decision tree classifier, and prints out the results.

Figure 8.18 shows the content of the iris.arff file and the Eclipse project WekaTest and its output.

Image described by caption and surrounding text.

Figure 8.18: The content of the iris.arff file (top) and the IntelliJ IDEA project WekaTest and its output (bottom)

See the following resources for more information about Weka:

https://www.cs.waikato.ac.nz/ml/weka/

http://www.cs.umb.edu/~ding/history/480_697_spring_2013/homework/WekaJavaAPITutorial.pdf

http://www.cs.ru.nl/P.Lucas/teaching/DM/weka.pdf

Another popular machine learning library that supports the Java programming language is Library for Support Vector Machines (LIBSVM ), illustrated in Figure 8.19 (https://www.csie.ntu.edu.tw/~cjlin/libsvm/). LIBSVM supports vector classification and distribution estimation. It also supports multiclass classification. On the web site, there is also a simple Java applet demonstrating SVM classification and regression.

Image described by caption and surrounding text.

Figure 8.19: The LIBSVM Library for support vector machines

8.8 Java Examples for Deep Learning

Deep learning is another hot research topic at the moment. The best way to do deep learning with Java is to use the Deeplearning4J library; Figure 8.20 shows the download page (https://deeplearning4j.org/docs/latest/deeplearning4j-quickstart). You can also download the entire Deeplearning4J library as a zipped file from its GitHub web site.

Image described by caption and surrounding text.

Figure 8.20: The DL4J library web site

https://github.com/deeplearning4j/dl4j-examples

Then unzip it to a folder. Inside there should be a subfolder named dl4j-examples, where you can find many deep learning example applications.

To run the DL4J examples, again you will use IntelliJ IDEA for its simplicity and friendliness. From IntelliJ IDEA, open a project, select the dl4j-examples subfolder in the Deeplearning4J folder, and click OK, as shown in Figure 8.21. Once the project is open, it will look like Figure 8.22. There are many different deep learning example programs. From here you can run the existing examples, modify examples, and create your own programs.

Image described by caption and surrounding text.

Figure 8.21: Open a project in IntelliJ IDEA, and select the dl4j-examples subfolder in the Deeplearning4J folder.

Image described by caption and surrounding text.

Figure 8.22: The dl4j-examples project in IntelliJ IDEA

Figure 8.23 shows the XorExample.java program and its running output. XorExample.java uses a simple multiple-layer, feedforward neural network to implement an XOR functions; it has two input-neurons, one hidden-layer with four hidden-neurons, and two output-neurons.

Image described by caption and surrounding text.

Figure 8.23: The XorExample.java program and its running output

Figure 8.24 shows the MLPClassifierLinear.java program and its running output. MLPClassifierLinear.java uses multiple-layer perceptron neural networks as a linear classifier to separate two groups of the data.

Image described by caption and surrounding text.

Figure 8.24: The MLPClassifierLinear.java program (top) and its running output (bottom). The bottom left shows the training set data results, and the bottom right shows the test set data results.

Figure 8.25 shows the interesting ImageDrawer.java program and its running output. ImageDrawer.java uses the deep learning neural networks to redraw the image (Mona Lisa) that it is given, pixel by pixel. It will first get a very rough representation of the target image and then continue to fine-tune it until it gets the image to appear exactly the same, which normally takes a couple of hours.

Image described by caption and surrounding text.

Figure 8.25: The ImageDrawer.java program (top) and its running output (bottom). The bottom shows the target image and the redrawn image after about five minutes, 30 minutes, six hours, and seven hours.

The following Google paper explains how to use a recurrent neural network for image generation, that is, drawing the image:

https://arxiv.org/pdf/1502.04623.pdf

This is an interesting, free, short course on deep learning, as well as using Deeplearing4J:

http://www.whatisdeeplearning.com/course/

You can also use Deeplearning4J to create and train a neural network on an Android device.

https://deeplearning4j.org/docs/latest/deeplearning4j-android

8.9 TensorFlow for Java

TensorFlow is an open-source software library developed by Google for the purpose of machine learning. It is one of the most popular machine learning libraries, particularly for deep learning. TensorFlow can work on a range of different operating systems, such as Ubuntu Linux, Windows, macOS, and even Raspbian!

The default TensorFlow programming language is Python. But TensorFlow also provides APIs for Java programs, as shown in Figure 8.26 (https://www.tensorflow.org/install/install_java).

Image described by caption and surrounding text.

Figure 8.26: The TensorFlow for Java web site

To use TensorFlow for Java, you will need to download two files from the web site.

  1. Download the TensorFlow Jar Archive (JAR) called libtensorflow.jar from https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.12.0.jar.
  2. Download and extract the Java Native Interface (JNI) file for your operating system and processor support. In this example, I downloaded the JNI file for Windows CPU and from the downloaded zipped file extracted a file named tensorflow_jni.dll.

Put both the libtensorflow.jar and tensorflow_jni.dll files into your Java program folder (in this example, H:Chapter 8), and create a file named HelloTensorFlow.java. You can get the contents of the file from the TensorFlow for Java web site, as shown next:

//Example code from https://www.tensorflow.org/install/lang_java
 
import org.tensorflow.Graph;
import org.tensorflow.Session;
import org.tensorflow.Tensor;
import org.tensorflow.TensorFlow;
 
public class HelloTensorFlow {
  public static void main(String[] args) throws Exception {
    try (Graph g = new Graph()) {
      final String value = "Hello from " + TensorFlow.version();
 
      // Construct the computation graph with a single operation, a constant
      // named "MyConst" with a value "value".
      try (Tensor t = Tensor.create(value.getBytes("UTF-8"))) {
        // The Java API doesn't yet include convenience functions for adding operations.
        g.opBuilder("Const", "MyConst").setAttr("dtype", t.dataType()).setAttr("value", t).build();
      }
 
      // Execute the "MyConst" operation in a Session.
      try (Session s = new Session(g);
          // Generally, there may be multiple output tensors,
          // all of them must be closed to prevent resource leaks.
          Tensor output = s.runner().fetch("MyConst").run().get(0)) {
        System.out.println(new String(output.bytesValue(), "UTF-8"));
      }
    }
  }
}

Example 8.5 is a simplified version of the previous program, which just prints Hello from xxx, where xxx is the version of TensorFlow.

To compile and execute this program, you will need to run the following commands:

javac -cp libtensorflow-1.12.0.jar HelloTensorFlow.java
java -cp libtensorflow-1.12.0.jar;. -Djava.library.path=. HelloTensorFlow   

The -cp libtensorflow-1.12.0.jar command includes the libtensorflow.jar file in the classpath, and the -Djava.library.path=. command specifies where you can find the tensorflow_jni.dll file. In this example, it is “. which means the current folder. Figure 8.27 shows the compilation and execution of the HelloTensorFlow.java program. The version of TensorFlow is 1.12.0.

Screen capture depicting code of compilation and execution of the HelloTensorFlow.java program.

Figure 8.27: The compilation and execution of the HelloTensorFlow.java program

There are many tutorials and TensorFlow Java example programs available. Figure 8.28 (https://sites.google.com/view/tensorflow-example-java-api) shows the Google TensorFlow Java API example site, which uses the YOLO model (https://pjreddie.com/darknet/yolo/) for object detection, for example, to detect cats in a picture.

Image described by caption and surrounding text.

Figure 8.28: The Google TensorFlow Java API example site

The following GitHub site has a simple, illustrative tutorial showing how to get started with TensorFlow with Java. It offers a Hello TensorFlow example (the same as in the previous example) and a LabelImage example, which uses the tensorflow_inception_graph.pb TensorFlow model file for image classification.

https://github.com/loretoparisi/tensorflow-java

The following is another simple guide to getting started with TensorFlow with Java and JavaScript, that is, within a web browser:

https://dzone.com/articles/getting-started-with-tensorflow-using-java-javascr

The following GitHub site has several interesting TensorFlow Java examples, including hello-world, image-classifier, sentiment-analysis, audio-classifier, audio-recommender, and audio-search-engine programs.

https://github.com/chen0040/java-tensorflow-samples

The following is the TensorFlow Java API documentation web site:

https://www.tensorflow.org/api_docs/java/reference/org/tensorflow/package-summary

More TensorFlow Java examples are available from these sites:

https://github.com/tensorflow/models/tree/master/samples/languages/java

https://github.com/szaza/tensorflow-example-java

8.10 AI Resources

This section provides a list of interesting AI resources, including books and tutorials.

8.11 Summary

This chapter first introduced the concept of artificial intelligence and then illustrated some Java examples of AI applications. You looked at the types of AI: narrow AI, general AI, and super AI. You also surveyed the stages of the development of AI. You saw that neural networks started in the 1950s, machine learning started in the 1980s, and deep learning started in the 2010s. For neural networks, the chapter introduced the concept of a perceptron—a single neuron—along with multilayer perceptron, backpropagation networks, and feedforward networks. The chapter also introduced the different types of machine learning: supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. It introduced a brief history of deep learning and the popular types of deep learning networks, such as AlexNet, GoogLeNet, and ResNet. Finally, you looked at a series of Java examples for neural networks, for machine learning, and for deep learning, and you looked at the TensorFlow machine learning library for Java.

8.12 Chapter Review Questions

Q8.1. What is artificial intelligence?
Q8.2. What are the three types of artificial intelligence?
Q8.3. What are the three stages of development for artificial intelligence?
Q8.4. What are neural networks?
Q8.5. What is a perceptron?
Q8.6. What are multilayer perceptrons (MLPs)?
Q8.7. What are feedforward networks?
Q8.8. What is machine learning? What are the different types of machine learning?
Q8.9. What is deep learning?
Q8.10. Do some research on convolutional neural networks (CNNs). What are the key features of CNNs?
Q8.11. Do some research on recurrent neural networks (RNNs). What are the key features of RNNs?
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset