Chapter 1
IN THIS CHAPTER
Defining the dream of AI, and comparing AI to machine learning
Understanding the engineering portion of AI and machine learning
Considering how statistics and big data work together in machine learning
Defining the role of algorithms in machine learning
Determining how training works with algorithms in machine learning
“A breakthrough in machine learning would be worth ten Microsofts.”
— BILL GATES
Artificial Intelligence (AI) is a huge topic today, and it’s getting bigger all the time thanks to the success of technologies such as Siri (www.apple.com/ios/siri
). Talking to your smartphone is both fun and helpful to find out things like the location of the best sushi restaurant in town or to discover how to get to the concert hall. As you talk to your smartphone, it learns more about the way you talk and makes fewer mistakes in understanding your requests. The capability of your smartphone to learn and interpret your particular way of speaking is an example of an AI, and part of the technology used to make it happen is machine learning. You likely make use of machine learning and AI all over the place today without really thinking about it. For example, the capability to speak to devices and have them actually do what you intend is an example of machine learning at work. Likewise, recommender systems, such as those found on Amazon, help you make purchases based on criteria such as previous product purchases or products that complement a current choice. The use of both AI and machine learning will only increase with time.
In this chapter, you delve into AI and discover what it means from several perspectives, including how it affects you as a consumer and as a scientist or engineer. You also discover that AI doesn’t equal machine learning, even though the media often confuse the two. Machine learning is definitely different from AI, even though the two are related.
You will also understand the fuel that powers both AI and machine learning — big data. Algorithms, lines of computer code based on statistics, turn big data into information and eventually insight. Through this process, you will be amazed by how AI and machine learning help computers excel at tasks that used to be done only by humans.
For many years, people understood AI based on Hollywood. Robots enhanced human abilities in TV shows like The Jetsons or Knight Rider, and in movies like Star Wars and Star Trek. Recent developments like powerful computers that can fit in your pocket and cheap storage to collect massive amounts of data have moved real-life reality closer to the on-screen fiction.
This section separates hype from reality, and explores a few actual applications in machine learning and AI.
As any technology becomes bigger, so does the hype, and AI certainly has a lot of hype surrounding it. For one thing, some people have decided to engage in fear mongering rather than science. Killer robots, such as those found in the film The Terminator, really aren’t going to be the next big thing. Your first real experience with an android AI is more likely to be in the form a health-care assistant (www.good.is/articles/robots-elder-care-pepper-exoskeletons-japan
) or possibly as a coworker (www.computerworld.com/article/2990849/robotics/meet-the-virtual-woman-who-may-take-your-job.html
). The reality is that you already interact with AI and machine learning in far more mundane ways. Part of the reason you need to read this chapter is to get past the hype and discover what AI can do for you today.
Machine learning and AI both have strong engineering components. That is, you can quantify both technologies precisely based on theory (substantiated and tested explanations) rather than simply hypothesis (a suggested explanation for a phenomenon). In addition, both have strong science components, through which people test concepts and create new ideas of how expressing the thought process might be possible. Finally, machine learning also has an artistic component, and this is where a talented scientist can excel. In some cases, AI and machine learning both seemingly defy logic, and only the true artist can make them work as expected.
Androids (a specialized kind of robot that looks and acts like a human, such as Data in Star Trek) and some types of humanoid robots (a kind of robot that has human characteristics but is easily distinguished from a human, such as C-3PO in Star Wars) have become the poster children for AI. They present computers in a form that people can anthropomorphize (for example, make human). In fact, it’s entirely possible that one day you won’t be able to distinguish between human and artificial life with ease. Science fiction authors, such as Philip K. Dick, have long predicted such an occurrence, and it seems all too possible today. The story Do Androids Dream of Electric Sheep? discusses the whole concept of more real than real. The idea appears as part of the plot in the movie Blade Runner (www.warnerbros.com/blade-runner
). The sections that follow help you understand how close technology currently gets to the ideals presented by science fiction authors and the movies.
There is a reason, other than anthropomorphization, that humans see the ultimate AI as one that is contained within some type of android. Ever since the ancient Greeks, humans have discussed the possibility of placing a mind inside a mechanical body. One such myth is that of a mechanical man called Talos (www.ancient-wisdom.com/greekautomata.htm
). The fact that the ancient Greeks had complex mechanical devices, only one of which still exists (read about the Antikythera mechanism at www.ancient-wisdom.com/antikythera.htm
), makes it quite likely that their dreams were built on more than just fantasy. Throughout the centuries, people have discussed mechanical persons capable of thought (such as Rabbi Judah Loew's Golem, www.nytimes.com/2009/05/11/world/europe/11golem.html
).
AI is built on the hypothesis that mechanizing thought is possible. During the first millennium, Greek, Indian, and Chinese philosophers all worked on ways to perform this task. As early as the seventeenth century, Gottfried Leibniz, Thomas Hobbes, and René Descartes discussed the potential for rationalizing all thought as simply math symbols. Of course, the complexity of the problem eluded them, and still eludes us today. The point is that the vision for AI has been around for an incredibly long time, but the implementation of AI is relatively new.
The true birth of AI as we know it today began with Alan Turing’s publication of “Computing Machinery and Intelligence” in 1950. In this paper, Turing explored the idea of how to determine whether machines can think. Of course, this paper led to the Imitation Game involving three players. Player A is a computer and Player B is a human. Each must convince Player C (a human who can’t see either Player A or Player B) that they are human. If Player C can’t determine who is human and who isn’t on a consistent basis, the computer wins.
A continuing problem with AI is too much optimism. The problem that scientists are trying to solve with AI is incredibly complex. However, the early optimism of the 1950s and 1960s led scientists to believe that the world would produce intelligent machines in as little as 20 years. After all, machines were doing all sorts of amazing things, such as playing complex games. AI currently has its greatest success in areas such as logistics, data mining, and medical diagnosis.
Machine learning relies on algorithms to analyze huge data sets. Currently, machine learning can’t provide the sort of AI that the movies present. Even the best algorithms can’t think, feel, present any form of self-awareness, or exercise free will. What machine learning can do is perform predictive analytics far faster than any human can. As a result, machine learning can help humans work more efficiently. The current state of AI, then, is one of performing analysis, but humans must still consider the implications of that analysis — making the required moral and ethical decisions. The “Considering the relationship between AI and machine learning” section later in this chapter delves more deeply into precisely how machine learning contributes to AI as a whole. The essence of the matter is that machine learning provides just the learning part of AI, and that part is nowhere near ready to create an AI of the sort you see in films.
At present, AI is based on machine learning, and machine learning is essentially different from statistics. Yes, machine learning has a statistical basis, but it makes some different assumptions than statistics do because the goals are different. Table 1-1 lists some features to consider when comparing AI and machine learning to statistics.
TABLE 1-1 Comparing Machine Learning to Statistics
Technique |
Machine Learning |
Statistics |
Data handling |
Works with big data in the form of networks and graphs; raw data from sensors or the web text is split into training and test data. |
Models are used to create predictive power on small samples. |
Data input |
The data is sampled, randomized, and transformed to maximize accuracy scoring in the prediction of out-of-sample (or completely new) examples. |
Parameters interpret real-world phenomena and provide a stress on magnitude. |
Result |
Probability is taken into account for comparing what could be the best guess or decision. |
The output captures the variability and uncertainty of parameters. |
Assumptions |
The scientist learns from the data. |
The scientist assumes a certain output and tries to prove it. |
Distribution |
The distribution is unknown or ignored before learning from data. |
The scientist assumes a well-defined distribution. |
Fitting |
The scientist creates a best fit, but generalizable, model. |
The result is fit to the present data distribution. |
Huge data sets require huge amounts of memory. Unfortunately, the requirements don’t end there. When you have huge amounts of data and memory, you must also have processors with multiple cores and high speeds. One of the problems that scientists are striving to solve is how to use existing hardware more efficiently. In some cases, waiting for days to obtain a result to a machine learning problem simply isn’t possible. The scientists who want to know the answer need it quickly, even if the result isn’t quite right. With this in mind, investments in better hardware also require investments in better science. This book considers some of the following issues as part of making your machine learning experience better:
As with many other technologies, AI and machine learning both have their fantasy or fad uses. For example, some people are using machine learning to create Picasso-like art from photos. You can read all about it at www.washingtonpost.com/news/innovations/wp/2015/08/31/this-algorithm-can-create-a-new-van-gogh-or-picasso-in-just-an-hour
. As the article points out, the computer can copy only an existing style at this stage — not create an entirely new style of its own. The following sections discuss AI and machine learning fantasies of various sorts.
AI is entering an era of innovation that you used to read about only in science fiction. It can be hard to determine whether a particular AI use is real or simply the dream child of a determined scientist. For example, The Six Million Dollar Man (https://en.wikipedia.org/wiki/The_Six_Million_Dollar_Man
) is a television series that looked fanciful at one time. When it was introduced, no one actually thought that we’d have real-world bionics at some point. However, Hugh Herr has other ideas — bionic legs really are possible now (www.smithsonianmag.com/innovation/future-robotic-legs-180953040
). Of course, they aren’t available for everyone yet; the technology is only now becoming useful. Muddying the waters is another television series, The Six Billion Dollar Man (www.cinemablend.com/new/Mark-Wahlberg-Six-Billion-Dollar-Man-Just-Made-Big-Change-91947.html
). The fact is that AI and machine learning will both present opportunities to create some amazing technologies and that we’re already at the stage of creating those technologies, but you still need to take what you hear with a huge grain of salt.
You find AI and machine learning used in a great many applications today. The only problem is that the technology works so well that you don’t know that it even exists. In fact, you might be surprised to find that many devices in your home already make use of both technologies. Both technologies definitely appear in your car and most especially in the workplace. In fact, the uses for both AI and machine learning number in the millions — all safely out of sight even when they’re quite dramatic in nature.
Here are just a few of the ways in which you might see AI used:
This list doesn’t even begin to scratch the surface. You can find AI used in many other ways. However, it’s also useful to view uses of machine learning outside the normal realm that many consider the domain of AI. Here are a few uses for machine learning that you might not associate with an AI:
Even though the movies make it sound like AI is going to make a huge splash, and you do sometimes see some incredible uses for AI in real life, the fact of the matter is that most uses for AI are mundane, even boring. For example, a recent article details how Verizon uses AI to analyze security breach data (www.computerworld.com/article/3001832/data-analytics/how-verizon-analyzes-security-breach-data-with-r.html
). The act of performing this analysis is dull when compared to other sorts of AI activities, but the benefits are that Verizon saves money performing the analysis, and the results are better as well.
In addition, Python developers have a huge array of libraries available to make machine learning easy. In fact, Kaggle (www.kaggle.com/competitions
) provides competitions to allow developers to hone their machine learning skills in creating practical applications. The results of these competitions often appear later as part of products that people actually use. Additionally, the developer community is particularly busy creating new libraries to make complex data science and machine learning applications easier to program (see www.kdnuggets.com/2015/06/top-20-python-machine-learning-open-source-projects.html
for the top 20 Python libraries in use today).
Machine learning is only part of what a system requires to become an AI. The machine learning portion of the picture enables an AI to perform these tasks:
The use of algorithms to manipulate data is the centerpiece of machine learning. To prove successful, a machine learning session must use an appropriate algorithm to achieve a desired result. In addition, the data must lend itself to analysis using the desired algorithm, or it requires a careful preparation by scientists.
AI encompasses many other disciplines to simulate the thought process successfully. In addition to machine learning, AI normally includes
As scientists continue to work with a technology and turn hypotheses into theories, the technology becomes related more to engineering (where theories are implemented) than science (where theories are created). As the rules governing a technology become clearer, groups of experts work together to define these rules in written form. The result is specifications (a group of rules that everyone agrees upon).
Eventually, implementations of the specifications become standards that a governing body, such as the IEEE (Institute of Electrical and Electronics Engineers) or a combination of the ISO/IEC (International Organization for Standardization/International Electrotechnical Commission), manages. AI and machine learning have both been around long enough to create specifications, but you currently won’t find any standards for either technology.
The basis for machine learning is math. Algorithms determine how to interpret big data in specific ways. The math basics for machine learning appear in Book 8, Chapter 2. You discover that algorithms process input data in specific ways and create predictable outputs based on the data patterns. What isn’t predictable is the data itself. The reason you need AI and machine learning is to decipher the data in such a manner to be able to see the patterns in it and make sense of them.
You see the specifications detailed in Book 8, Chapter 4 in the form of algorithms used to perform specific tasks. When you get to Book 9, you begin to see the reason that everyone agrees to specific sets of rules governing the use of algorithms to perform tasks. The point is to use an algorithm that will best suit the data you have in hand to achieve the specific goals you’ve created. Professionals implement algorithms using languages that work best for the task. Machine learning relies on Python and R, and to some extent MATLAB, Java, Julia, and C++. (See the discussion at www.quora.com/What-is-the-best-language-to-use-while-learning-machine-learning-for-the-first-time
for details.)
The reason that AI and machine learning are both sciences and not engineering disciplines is that both require some level of art to achieve good results. The artistic element of machine learning takes many forms. For example, you must consider how the data is used. Some data acts as a baseline that trains an algorithm to achieve specific results. The remaining data provides the output used to understand the underlying patterns. No specific rules governing the balancing of data exist; the scientists working with the data must discover whether a specific balance produces optimal output.
You can also tune the algorithms in certain ways or refine how the algorithm works. Again, the idea is to create output that truly exposes the desired patterns so that you can make sense of the data. For example, when viewing a picture, a robot may have to determine which elements of the picture it can interact with and which elements it can’t. The answer to that question is important if the robot must avoid some elements to keep on track or to achieve specific goals.
When working in a machine learning environment, you also have the problem of input data to consider. For example, the microphone found in one smartphone won’t produce precisely the same input data that a microphone in another smartphone will. The characteristics of the microphones differ, yet the result of interpreting the vocal commands provided by the user must remain the same. Likewise, environmental noise changes the input quality of the vocal command, and the smartphone can experience certain forms of electromagnetic interference. Clearly, the variables that a designer faces when creating a machine learning environment are both large and complex.
The art behind the engineering is an essential part of machine learning. The experience that a scientist gains in working through data problems is essential because it provides the means for the scientist to add values that make the algorithm work better. A finely tuned algorithm can make the difference between a robot successfully threading a path through obstacles and hitting every one of them.
Computers manage data through applications that perform tasks using algorithms of various sorts. A simple definition of an algorithm is a systematic set of operations to perform on a given data set — essentially a procedure. The four basic data operations are create, read, update, and delete (CRUD). This set of operations may not seem complex, but performing these essential tasks is the basis of everything you do with a computer. As the data set becomes larger, the computer can use the algorithms found in an application to perform more work. The use of immense data sets, known as big data, enables a computer to perform work based on pattern recognition in a nondeterministic manner. In short, to create a computer setup that can learn, you need a data set large enough for the algorithms to manage in a manner that allows for pattern recognition, and this pattern recognition needs to use a simple subset to make predictions (statistical analysis) of the data set as a whole.
Big data exists in many places today. Obvious sources are online databases, such as those created by vendors to track consumer purchases. However, you find many non-obvious data sources, too, and often these non-obvious sources provide the greatest resources for doing something interesting. Finding appropriate sources of big data lets you create machine learning scenarios in which a machine can learn in a specified manner and produce a desired result.
Statistics, one of the methods of machine learning that you consider in this book, is a method of describing problems using math. By combining big data with statistics, you can create a machine learning environment in which the machine considers the probability of any given event. However, saying that statistics is the only machine learning method is incorrect. This chapter also introduces you to the other forms of machine learning currently in place.
Algorithms determine how a machine interprets big data. The algorithm used to perform machine learning affects the outcome of the learning process and, therefore, the results you get. This chapter helps you understand the five main techniques for using algorithms in machine learning.
Before an algorithm can do much in the way of machine learning, you must train it. The training process modifies how the algorithm views big data. The final section of this chapter helps you understand that training is actually using a subset of the data as a method for creating the patterns that the algorithm needs to recognize specific cases from the more general cases that you provide as part of the training.
Big data is substantially different from being just a large database. Yes, big data implies lots of data, but it also includes the idea of complexity and depth. A big data source describes something in enough detail that you can begin working with that data to solve problems for which general programming proves inadequate. For example, consider Google’s self-driving cars. The car must consider not only the mechanics of the car’s hardware and position with space but also the effects of human decisions, road conditions, environmental conditions, and other vehicles on the road. The data source contains many variables — all of which affect the vehicle in some way. Traditional programming might be able to crunch all the numbers, but not in real time. You don’t want the car to crash into a wall and have the computer finally decide five minutes later that the car is going to crash into a wall. The processing must prove timely so that the car can avoid the wall.
The acquisition of big data can also prove daunting. The sheer bulk of the data set isn’t the only problem to consider — also essential is to consider how the data set is stored and transferred so that the system can process it. In most cases, developers try to store the data set in memory to allow fast processing. Using a hard drive to store the data would prove too costly, time-wise.
Finally, big data is so large that humans can’t reasonably visualize it without help. Part of what defined big data as big is the fact that a human can learn something from it, but the sheer magnitude of the data set makes recognition of the patterns impossible (or would take a really long time to accomplish). Machine learning helps humans make sense and use of big data.
Before you can use big data for a machine learning application, you need a source for big data. Of course, the first thing that most developers think about is the huge, corporate-owned database, which could contain interesting information, but it’s just one source. The fact of the matter is that your corporate databases might not even contain particularly useful data for a specific need. The following sections describe locations you can use to obtain additional big data.
To create viable sources of big data for specific needs, you might find that you actually need to create a new data source. Developers built existing data sources around the needs of the client-server architecture in many cases, and these sources may not work well for machine learning scenarios because they lack the required depth (being optimized to save space on hard drives does have disadvantages). In addition, as you become more adept in using machine learning, you find that you ask questions that standard corporate databases can’t answer. With this in mind, the following sections describe some interesting new sources for big data.
Governments, universities, nonprofit organizations, and other entities often maintain publicly available databases that you can use alone or combined with other databases to create big data for machine learning. For example, you can combine several geographic information systems (GIS) to help create the big data required to make decisions such as where to put new stores or factories. The machine learning algorithm can take all sorts of information into account — everything from the amount of taxes you have to pay to the elevation of the land (which can contribute to making your store easier to see).
The best part about using public data is that it’s usually free, even for commercial use (or you pay a nominal fee for it). In addition, many of the organizations that created them maintain these sources in nearly perfect condition because the organization has a mandate, uses the data to attract income, or uses the data internally. When obtaining public source data, you need to consider a number of issues to ensure that you actually get something useful. Here are some of the criteria you should think about when making a decision:
You can obtain data from private organizations such as Amazon and Google, both of which maintain immense databases that contain all sorts of useful information. In this case, you should expect to pay for access to the data, especially when used in a commercial setting. You may not be allowed to download the data to your personal servers, so that restriction may affect how you use the data in a machine learning environment. For example, some algorithms work slower with data that they must access in small pieces.
The biggest advantage of using data from a private source is that you can expect better consistency. The data is likely cleaner than from a public source. In addition, you usually have access to a larger database with a greater variety of data types. Of course, it all depends on where you get the data.
Your existing data may not work well for machine learning scenarios, but that doesn’t keep you from creating a new data source using the old data as a starting point. For example, you might find that you have a customer database that contains all the customer orders, but the data isn’t useful for machine learning because it lacks tags required to group the data into specific types. One of the new job types that you can expect to create is people who massage data to make it better suited for machine learning — including the addition of specific information types such as tags.
Your organization has data hidden in all sorts of places. The problem is in recognizing the data as data. For example, you may have sensors on an assembly line that track how products move through the assembly process and ensure that the assembly line remains efficient. Those same sensors can potentially feed information into a machine learning scenario because they could provide inputs on how product movement affects customer satisfaction or the price you pay for postage. The idea is to discover how to create mashups that present existing data as a new kind of data that lets you do more to make your organization work well.
Some of these applications already exist, and you’re completely unaware of them. The video at www.research.microsoft.com/apps/video/default.aspx?id=256288
makes the presence of these kinds of applications more apparent. By the time you complete the video, you begin to understand that many uses of machine learning are already in place and users already take them for granted (or have no idea that the application is even present).
As you progress through Book 8, you discover the need to teach whichever algorithm you’re using (don’t worry about specific algorithms; you see a number of them in Book 9) how to recognize various kinds of data and then to do something interesting with it. This training process ensures that the algorithm reacts correctly to the data it receives after the training is over. Of course, you also need to test the algorithm to determine whether the training is a success. In many cases, Book 8 helps you discover ways to break a data source into training and testing data components in order to achieve the desired result. Then, after training and testing, the algorithm can work with new data in real time to perform the tasks that you verified it can perform.
In some cases, you might not have enough data at the outset for both training (the essential initial test) and testing. When this happens, you might need to create a test setup to generate more data, rely on data generated in real time, or create the test data source artificially. You can also use similar data from existing sources, such as a public or private database. The point is that you need both training and testing data that will produce a known result before you unleash your algorithm into the real world of working with uncertain data.
Some sites online would have you believe that statistics and machine learning are two completely different technologies. For example, when you read Statistics vs. Machine Learning, fight! (http://brenocon.com/blog/2008/12/statistics-vs-machine-learning-fight/
), you get the idea that the two technologies are not only different, but downright hostile toward each other. The fact is that statistics and machine learning have a lot in common and that statistics represents one of the five tribes (schools of thought) that make machine learning feasible. The five tribes are
The ultimate goal of machine learning is to combine the technologies and strategies embraced by the five tribes to create a single algorithm (the master algorithm) that can learn anything. Of course, achieving that goal is a long way off. Even so, scientists such as Pedro Domingos (homes.cs.washington.edu/~pedrod/
) are currently working toward that goal.
Book 9 follows the Bayesian tribe strategy, for the most part, in that you solve most problems using some form of statistical analysis. You do see strategies embraced by other tribes described, but the main reason you begin with statistics is that the technology is already well established and understood. In fact, many elements of statistics qualify more as engineering (in which theories are implemented) than science (in which theories are created). The next section of the chapter delves deeper into the five tribes by viewing the kinds of algorithms each tribe uses. Understanding the role of algorithms in machine learning is essential to defining how machine learning works.
Everything in machine learning revolves around algorithms. An algorithm is a procedure or formula used to solve a problem. The problem domain affects the kind of algorithm needed, but the basic premise is always the same — to solve some sort of problem, such as driving a car or playing dominoes. In the first case, the problems are complex and many, but the ultimate problem is one of getting a passenger from one place to another without crashing the car. Likewise, the goal of playing dominoes is to win. The following sections discuss algorithms in more detail.
An algorithm is a kind of container. It provides a box for storing a method to solve a particular kind of a problem. Algorithms process data through a series of well-defined states. The states need not be deterministic, but the states are defined nonetheless. The goal is to create an output that solves a problem. In some cases, the algorithm receives inputs that help define the output, but the focus is always on the output.
Algorithms must express the transitions between states using a well-defined and formal language that the computer can understand. In processing the data and solving the problem, the algorithm defines, refines, and executes a function. The function is always specific to the kind of problem being addressed by the algorithm.
As described in the previous section, each of the five tribes has a different technique and strategy for solving problems that result in unique algorithms. Combining these algorithms should lead eventually to the master algorithm that will be able to solve any given problem. The following sections provide an overview of the five main algorithmic techniques.
The term inverse deduction commonly appears as induction. In symbolic reasoning, deduction expands the realm of human knowledge, while induction raises the level of human knowledge. Induction commonly opens new fields of exploration, while deduction explores those fields. However, the most important consideration is that induction is the science portion of this type of reasoning, while deduction is the engineering. The two strategies work hand in hand to solve problems by first opening a field of potential exploration to solve the problem and then exploring that field to determine whether it does, in fact, solve it.
As an example of this strategy, deduction would say that if a tree is green and that green trees are alive, the tree must be alive. When thinking about induction, you would say that the tree is green and that the tree is also alive; therefore, green trees are alive. Induction provides the answer to what knowledge is missing given a known input and output.
The connectionists are perhaps the most famous of the five tribes. This tribe strives to reproduce the brain’s functions using silicon instead of neurons. Essentially, each of the neurons (created as an algorithm that models the real-world counterpart) solves a small piece of the problem, and the use of many neurons in parallel solves the problem as a whole.
The use of backpropagation, or backward propagation of errors, seeks to determine the conditions under which errors are removed from networks built to resemble the human neurons by changing the weights (how much a particular input figures into the result) and biases (which features are selected) of the network. The goal is to continue changing the weights and biases until such time as the actual output matches the target output. At this point, the artificial neuron fires and passes its solution along to the next neuron in line. The solution created by just one neuron is only part of the whole solution. Each neuron passes information to the next neuron in line until the group of neurons creates a final output.
The evolutionaries rely on the principles of evolution to solve problems. In other words, this strategy is based on the survival of the fittest (removing any solutions that don’t match the desired output). A fitness function determines the viability of each function in solving a problem.
Using a tree structure, the solution method looks for the best solution based on function output. The winner of each level of evolution gets to build the next-level functions. The idea is that the next level will get closer to solving the problem but may not solve it completely, which means that another level is needed. This particular tribe relies heavily on recursion and languages that strongly support recursion to solve problems. An interesting output of this strategy has been algorithms that evolve: One generation of algorithms actually builds the next generation.
The Bayesians use various statistical methods to solve problems. Given that statistical methods can create more than one apparently correct solution, the choice of a function becomes one of determining which function has the highest probability of succeeding. For example, when using these techniques, you can accept a set of symptoms as input and decide the probability that a particular disease will result from the symptoms as output. Given that multiple diseases have the same symptoms, the probability is important because a user will see some in which a lower probability output is actually the correct output for a given circumstance.
Ultimately, this tribe supports the idea of never quite trusting any hypothesis (a result that someone has given you) completely without seeing the evidence used to make it (the input the other person used to make the hypothesis). Analyzing the evidence proves or disproves the hypothesis that it supports. Consequently, it isn’t possible to determine which disease someone has until you test all the symptoms.
The analogyzers use kernel machines to recognize patterns in data. By recognizing the pattern of one set of inputs and comparing it to the pattern of a known output, you can create a problem solution. The goal is to use similarity to determine the best solution to a problem. It’s the kind of reasoning that determines that using a particular solution worked in a given circumstance at some previous time; therefore using that solution for a similar set of circumstances should also work. One of the most recognizable outputs from this tribe is recommender systems. For example, when you get on Amazon and buy a product, the recommender system comes up with other, related products that you might also want to buy.
Many people are somewhat used to the idea that applications start with a function, accept data as input, and then provide a result. For example, a programmer might create a function called Add() that accepts two values as input, such as 1 and 2. The result of Add() is 3. The output of this process is a value. In the past, writing a program meant understanding the function used to manipulate data to create a given result with certain inputs.
Machine learning turns this process around. In this case, you know that you have inputs, such as 1 and 2. You also know that the desired result is 3. However, you don’t know what function to apply to create the desired result. Training provides a learner algorithm with all sorts of examples of the desired inputs and results expected from those inputs. The learner then uses this input to create a function. In other words, training is the process whereby the learner algorithm maps a flexible function to the data. The output is typically the probability of a certain class or a numeric value.
The secret to machine learning is generalization. The goal is to generalize the output function so that it works on data beyond the training set. For example, consider a spam filter. Your dictionary contains 100,000 words (actually a small dictionary). A limited training data set of 4,000 or 5,000 word combinations must create a generalized function that can then find spam in the 2^100,000 combinations that the function will see when working with actual data.
When viewed from this perspective, training might seem impossible and learning even worse. However, to create this generalized function, the learner algorithm relies on just three components:
Much of Book 8 and Book 9 focuses on representation. For example, in Book 9, Chapter 2 you discover how to work with the k-Nearest Neighbor (KNN) algorithm. However, the training process is more involved than simply choosing a representation. All three steps come into play when performing the training process. Fortunately, you can start by focusing on representation and allow the various libraries discussed in Book 9 to do the rest of the work for you.