Chapter 6. Understanding Big Data Analysis with Machine Learning

In this chapter, we are going to learn about different machine learning techniques that can be used with R and Hadoop to perform Big Data analytics with the help of the following points:

  • Introduction to machine learning
  • Types of machine-learning algorithms
  • Supervised machine-learning algorithms
  • Unsupervised machine-learning algorithms
  • Recommendation algorithms

Introduction to machine learning

Machine learning is a branch of artificial intelligence that allows us to make our application intelligent without being explicitly programmed. Machine learning concepts are used to enable applications to take a decision from the available datasets. A combination of machine learning and data mining can be used to develop spam mail detectors, self-driven cars, speech recognition, face recognition, and online transactional fraud-activity detection.

There are many popular organizations that are using machine-learning algorithms to make their service or product understand the need of their users and provide services as per their behavior. Google has its intelligent web search engine, which provides a number one search, spam classification in Google Mail, news labeling in Google News, and Amazon for recommender systems. There are many open source frameworks available for developing these types of applications/frameworks, such as R, Python, Apache Mahout, and Weka.

Types of machine-learning algorithms

There are three different types of machine-learning algorithms for intelligent system development:

  • Supervised machine-learning algorithms
  • Unsupervised machine-learning algorithms
  • Recommender systems

In this chapter, we are going to discuss well-known business problems with classification, regression, and clustering, as well as how to perform these machine-learning techniques over Hadoop to overcome memory issues.

If you load a dataset that won't be able to fit into your machine memories and you try to run it, the predictive analysis will throw an error related to machine memory, such as Error: cannot allocate vector of size 990.1 MB. The solution is to increase the machine configuration or parallelize with commodity hardware.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset