11
Opportunities and Challenges in Machine Learning With IoT

Sarvesh Tanwar, Jatin Garg*, Medini Gupta and Ajay Rana

Amity Institute of Information Technology, Amity University Uttar Pradesh, Noida, India

Abstract

Machine learning (ML) is swiftly being used in wide range of applications. It has risen to popularity in recent years, owing in part to the emergence of big data. With respect to big data, ML techniques are more promising than ever before. Big data helps ML algorithms to discover finer-grained trends and make more precise and reliable predictions than ever before; however, it also raises significant obstacles for ML, such as model scalability and distributed computation. We have discussed about coupling ML with IoT, its applications, and challenges. We designed a framework MLBiD (Machine Learning Based on Big Data) and discussed about deliberation of its potential opportunities and defiant challenges. The architecture is focused on ML, which is split into three phases: preprocessing, learning, and assessment. Furthermore, it includes four other components in the framework: big data, consumer, domain, and system. Various different stages of ML as well as the components of MLBiD are stepping ahead for identifying related opportunities and challenges, and they have opened the doors for potential research analysis in a variety of previously untraveled or under expedition.

Keywords: Machine learning, IoT, supervised learning, unsupervised learning, big data, data processing, MLBiD

11.1 Introduction

In areas, for instance, computer vision, speech recognition, natural language comprehension, IoT, neuroscience, and fitness, machine learning (ML) techniques have had tremendous societal impacts. The emergence of the era of big data has sparked a surge in ML interest [1]. Big data has never promised or cross-examined algorithms for ML to achieve new insights into a variety of business applications and human behaviors. On the one hand, big data provides ML algorithms with unparalleled amounts of data from which to derive underlying patterns and develop prediction models; in contrast, conventional ML algorithms face crucial challenges such as scalability in order to fully unlock the value of big data. The obstacle of experienced learning with respect to certain tasks and performance metrics is referred to as an ML problem. ML algorithms are applied by users to deduce the existing structure and predictions are made from large data sets given. ML flourishes on strong computational environments, efficient learning techniques (algorithms), and rich and/or large data. As a consequence, ML has a lot of potential and is an important part of big data analytics [2]. The emphasis of this paper is on various techniques for ML in relation to big data and present computing environments. We want to go through the adequate benefits and drawbacks of ML on big data. New possibilities for ML have arrived due to introduction of big data. Big data, for instance, enables pattern learning at various stages of granularity and diversity from multiple perspectives in an inherently parallel manner. Furthermore, big data allows for causality inferences based on sequence chains. Big data, on the other hand, poses significant challenges to ML, including data streaming, model scalability, high data dimensionality, distributed computing, adaptability, and usability [3]. MLBiD discussed about deliberation of its potential opportunities and defiant challenges. The architecture is emphasized on techniques of ML, which is divided into three phases: a) preprocessing, b) learning, and c) evaluation. In addition, the framework includes four other components: big data, consumer, domain, and system; all of which influence and are influenced by ML. The different elements of MLBiD and, additionally, the phases of ML point the way to recognizing potential opportunities and open threats, as well as potential study in variation of unexplored or underexplored research areas.

11.2 Literature Review

11.2.1 A Designed Architecture of ML on Big Data

Figure 11.1 depicts the paradigm for MLBiD which stands for ML on big data. The ML component is at the center of MLBiD, and it interacts with four other components: big data, consumer, domain, and framework. Interactions take place in both directions. Users can interact with ML by entering domain knowledge, individual preferences, and usability feedback, as well as by enhancing decision-making based on leveraging learning; for example, big data act as inputs to ML and then, after that, produces outputs that are considered as a part of big data; users can interact with ML by giving usability feedback, domain knowledge, and personal preferences and improving the decision-making mechanism by leveraging learning outcomes; domain can serve as both a source of information for ML and a base for executing already learned models; below presented system architecture has an effect on how learning algorithms can operate and how effective they are to run, and meeting ML needs simultaneously can contribute to a redesign of given system architecture.

Schematic illustration of the paradigm for ML on Big data (MLBiD).

Figure 11.1 The paradigm for ML on Big data (MLBiD).

11.2.2 Machine Learning

Initially started from data preprocessing, learning, and then evaluation are common phases of ML (see Figure 11.1). Data preprocessing assists in the transformation of raw data into the “correct shape” for further learning steps. It is possible that the raw data is unstructured, noisy, incomplete, and inconsistent. By cleaning of data, extracting, transforming, and fusion, the preprocessing phase converts that data into such a form that can turn as inputs to learning. Using the preprocessed input data, the learning step selects suitable techniques for learning and regulates model parameters to produce desired outputs. Data preprocessing can be done with some learning methods, especially representational learning. After that, evaluation of trained models is done to see how well they do. Such as selecting appropriate data sets, estimating error, statistical tests, and performance measurement are all part of a classifier’s performance evaluation [4]. The results of the evaluation lead to changes in the parameters of the learning algorithms chosen and/or the selection of new and efficient algorithm.

ML can be categorized into three main components based on available input quality to a learning system: supervised, unsupervised, and reinforcement learning. In case of supervised learning, samples of input and output events are given to learning system the aim is to make the system learn a function that maps inputs to outputs. The aim of unsupervised learning is to discover patterns in the input without providing specific response or expected output. A reinforcement learning system, like unsupervised learning, does not have input-output pairs. Reinforcement learning, like guided learning, provides input on past experiences [5]. In comparison to supervised learning, reinforcement learning offers input in the form of incentives or punishments associated with behaviors instead of desired performance or explicit correction of sub-optimal actions. Hybrid of supervised and unsupervised learning is known as semi-supervised learning in which the system is given a limited number of input-output pairs as well as a large number of unannotated inputs. Semi-supervised learning has the same purpose as supervised learning, but it develops its learning from both unannotated and annotated data.

ML are classified into two types: representational learning and task learning, depending on whether the learning objective is to learn particular tasks or work by applying input samples or to learn the features themselves. When constructing classifiers or other predictors, extracting the useful knowledge to learn about new and innovative data representation is the aim of representational learning. A good illustration separates the fundamental sources of variance. In the case of probabilistic models, it is frequently the one that catches the posterior distribution of underlying exploratory factors for the observed production [6].

11.2.3 Types of Machine Learning

11.2.3.1 Supervised Learning

A branch of ML algorithms known as supervised learning infers a feature from labeled training data. The training data is composed of training set examples, each of which is a pair (x; y), with x representing an input vector and y representing the output value. The algorithm generates a function that has its usage in mapping unknown inputs in the future. Regression algorithms (continuous output) and classification algorithms (discrete output) are the two major types of supervised learning algorithms. There are multiple algorithms in each group, which will be discussed further down [7].

11.2.3.1.1 Regression

Regression algorithms try to find the best t function for the available training data. Linear regression and polynomial regression are the two major algorithms discussed below.

  • Linear Regression: The linear regression algorithm is most commonly used regression algorithms in ML. For the available training data, this algorithm tries to find the best t line/hyperplane.

The algorithm’s aim is to find the optimal coefficient vector opt D [0; 1; ::::; N] so that the predictive function has a linear form.

  • Polynomial Regression: The polynomial regression algorithm is another commonly used algorithm for regression. The main motive of this algorithm is to find the best t polynomial for the available training data. The aim of the algorithm, like its linear counterpart, is to calculate the coefficient value of vector opt D [0; 1; ::::; kN] so that the predictive function is a polynomial of order k.
11.2.3.1.2 Classification

Unlike regression algorithms, which attempt to find the best t function for the training data, classification algorithms attempt to obtain most applicable class for the data by classifying each input. In such instances, the predictive function’s output is discrete, with the probable values being part of one of the various classes used inside the training data sets. The following sections cover four essential classification algorithms: logistic regression, artificial neural networks, support vector machines, and decision trees.

  • Logistic Regression: In the literary texts, logistic regression is a usual classification algorithm. Despite its name, this algorithm is applied on classification rather than regression (i.e., its performance is discrete). A binary classifier is usually used, with the output belonging to only one of two types.
    The hypothesis function, also known as the zopt (x) predictive function, calculates the likelihood of an output equal to 1 provided a specific input. To put it another way, zopt (x) D P(y D 1 = xI). The performance is set to be 1 if this likelihood is greater than 0.5. If the case is different, then the output is set to 0.
  • Support Vector Machines (SVM): Another supervised classification algorithm is SVM. It tries to calculate the best hyperplane that separates the labeled data by the greatest margin from the closest point. The logistic regression algorithm is a more efficient and restrictive classifier. The logistic regression uses function sigmoid that is replaced by a new function termed as hinge loss function in this algorithm. The hypothesis function provided by the SVM algorithm is a discrimination function that returns either 1 or 0. It should be noted that hypothesis function is not interpreted as a probability of the output being 1 or 0, but rather as a probability of the output being 1 or 0.
  • Artificial Neural Networks (ANNs): A common supervised classification algorithm is ANN. It is frequently used when there is a lot of labeled training data large number of features as well as nonlinear hypothesis function is required. ANN attempts to imitate the way our brain operates; it is revealed that the brain uses a single “learning algorithm” for all of its functions. The ANN algorithm uses a model in which the features behave as dendrites (nerve cells) that take electrical inputs (through dendrites) and channel them toward an output (axon), similar to how neurons take electrical inputs (through dendrites) and channel them toward an output (axon). One “secret” layer is often used as an intermediate layer. Activation layer is a layer that helps brings out huge knowledge from inside the collection of features available in the training data. The sigmoid function, which is used in logistic regression, is used at each layer of the network.
  • Decision Trees: Another common supervised learning classification algorithm is decision trees. Since statistical metrics are being used to calculate the nodes of branching, these algorithms are also referred to as statistical classifiers. Decision trees sort instances down the tree from the root node to a specific leaf node to identify them. Each branch of the tree represents a potential value of the executed function, while each node identifies a test case of a specific feature. Quinlan suggested the algorithm, which is centered on the notion of information entropy. In effect, the C4.5 selects the function that distributes its set of given samples into much smaller subsets rich in one of two classes. The normalized information benefit metric (difference in entropy) is used to evaluate the division criterion. The function which has highest level of knowledge gain is selected.
11.2.3.1.3 Deep Learning

Supervised ML algorithms have a subset known as deep learning. Deep learning can be thought of as a large-scale neural network in nature. Deep learning, on the other hand, cannot be classified as a conventional neural network because of its performance in automatic unsupervised feature extraction which is also referred as feature learning. As a result, deep learning is regarded as a subset of supervised ML. Deep learning, in general, uses a graph with multiple processing layers to model abstractions contained in data. Units in these processing layers apply linear and non-linear transformations to the data in order to obtain as much useful information as possible. Deep learning algorithms and ANNs are close to each other. In reality, ANN can be classified as a deep neural network learning algorithm. Deep learning algorithms, on the other hand, are more adaptable due to their nature to be used on both labeled and unlabeled data. They can also be used on a huge scale with neural networks. Deep learning, according to Andrew Ng, who is a co-founder of Coursera and Chief Scientist at Baidu Research, is simply implementing ANN on a broad range so that it can be equipped with more data and work better [8].

11.2.3.2 Unsupervised Learning

Unsupervised learning, in contrast to supervised learning, is a branch of ML that infers a function/pattern from unlabeled training data. Only the inputs x1; x2; ::::; xM and no known outputs make up the training results. As a result, unsupervised learning algorithms seek to make sense of the training data through identifying patterns and relationships. Clustering, dimensionality reduction, and anomaly detection are prime categories of unsupervised learning algorithms. Inside - group, there are many algorithms. The following sections outline some of the most common algorithms in these categories.

11.2.3.2.1 Clustering

Grouping or clustering a series of data points is one of the simplest ways to make it informative. This makes the available data more clear and understandable by giving it more structure by grouping it into a finite number of groups rather than a large number of random data points. This is particularly significant in market segmentation and social network analysis applications. We use the word “clustering” instead of “classification” since the given data points still not labeled to belong to specific groups and we do not know whether this classification is right or not.

  • K-Means Algorithm: For automatic data grouping into coherent clusters, K-means is commonly applied unsupervised clustering algorithms. This algorithm attempts to cluster the data into K clusters by locating the cluster centroid (also known as cluster mean) and grouping the data points that are located nearest to it with it. A cost function must be reduced to gain correct cluster of data points into the K clusters. This cost function depends on three variables: c(j), that is the cluster index to which example x(j) is currently assigned, k centroid of cluster k, and c(j), which is the cluster centroid to which example x(j) is currently assigned.
11.2.3.2.2 Dimensionality Reduction

Another important subject in ML is dimensionality reduction. The following is a summary of the motivation for dimensionality reduction:

  1. (i) Eliminate redundant data
  2. (ii) Lower storage and computational requirements
  3. (iii) Simplify data visualization by focusing on a few main features.
11.2.3.2.3 Principal Component Analysis

In unsupervised learning, principal component analysis is of most commensurable dimensionality reduction algorithms. Its goal is to identify the subset of features that most accurately represents the data. For example, if two features x1 and x2 are given, PCA will select a single line for effectively explain both of these features at the same time. The aim of PCA is to reduce the average projection error (orthogonal distance from the function to the projection line), while the goal of linear regression is to reduce the average error (vertical distance) to the line. Data preprocessing, covariance matrix computation, and eigenvalue decomposition are the four stages in performing PCA [9].

11.2.3.2.4 Anomaly Detection

The anomaly detection algorithm is another significant unsupervised learning algorithm. This algorithm attempts to decide if the given new example x(new) is anomalous or not, as its name suggests. To do so, the probability of an example not being anomalous is determined using the probability function/model p(x). The dividing value between defining the example as regular or anomalous is a threshold value, denoted by. The given features are assumed to be independent in order to measure p(x) that is a probability function, and thus the probability function p(x) becomes the product of the probabilities of the features p(xi) 8i. Another assumption is that the features are normally distributed, implying that p(xi) 8i follows a Gaussian distribution. As a result, comparing p(x(new)) with determines if x(new) is anomalous or not: if p(x-(new)), then the example is anomalous, otherwise it is usual.

11.3 Why Should We Care About Learning Representations?

In the ML world, representation learning has developed in its area only, with frequent workshops at leading conferences such as NIPS and ICML, and even new conference devoted to it, ICLR1, 1 often referred to as deep learning or function learning. Here, depth is an important aspect of the work; there are several other priors that are fascinating and can be handled easily when the problem is formulated as learning a representation. An impressive series of research advances in academia and business has followed and nourished the dramatic growth in research activity on representation literacy.

ML can be divided as batch and online learning depending on the timing of building training data accessible (e.g., is availability of training data all at once or one at a time). Batch learning creates models by learning on all of the training dataset at once, while online learning updates models as new data is added. The assumption of a batch learning algorithm is that data are individual unit and identically distributed or drawn from the same probability distribution, which met rarely by real data. In most cases, no conclusions related to statistics are made about the data in online learning. Although a batch learning algorithm is expected to generalize, no such assumption exists for online learning because the algorithm is only expected to correctly predict the labels of the examples it get as input. Where it is computationally viable to learn over the whole dataset and/or where data is created over a vast time and a learning system must respond to new trends in the data, online learning is used [10].

Batch learning (also known as mathematical learning) and online learning are two supervised systems for machine-learning that are distinct. A problem related to learning is described by an instance space X and a label set Y in both frameworks, and the task is to assign labels from Y to instances in X. We have taken assumption that a probability distribution exists over the product space X Y and that we have access to a training set drawn i.i.d. from this distribution in batch learning. A batch learning algorithm produces an output hypothesis, which is a function that maps instances in X to labels in Y, using the training collection. We expect a batch learning algorithm’s performance hypothesis to correctly predict the labels of previously unseen examples sampled from the distribution [10].

We usually make no statistical conclusions about the data roots in the online learning context. An online learning algorithm receives a list of examples and processes each one independently. The algorithm receives an instance in each online-learning round and predicts its mark using an internal prediction that it stores in memory. The algorithm then receives the right label for the instance and updates and improves its internal theory using the current instance-label pair. Since the algorithm is only supposed to correctly predict/perceive the labels of the examples it provides as input, there is no definition of statistical generalization.

11.4 Big Data

Big data is the trending topic in the technology world. Everyone is talking about big data, and it is expected that the effects of big data will have a huge impact on technology, enterprise, industry, government, and culture. In terms of technology, the processing, exploitation preservation, and transportation of big data are all part of the entire operation. Without a question, the stages of data acquisition, storage, and transportation are important preludes to the ultimate objective of data analytics-based manipulation, which is at the heart of big data analysis [11].

From the perspective of data mining, the four V’s—Volume, Velocity, Veracity, and Variety—have come to describe “big data”. For a problem to be classified as a big data problem, it is presumed that all or any of them must be met. The volume denotes the size of the results, which may be too massive for current algorithms and/or systems to accommodate. The term “velocity” refers to data streaming at a rate higher than conventional algorithms and devices can accommodate. Sensors are constantly reading and transmitting data sources. We are entering the realm of the quantified self, which would include evidence that was previously unavailable. Veracity indicates that, considering the availability of evidence, data accuracy remains a major concern. That is, we cannot believe that better quality comes with big data. In reality, as data increases in scale, consistency problems emerge, which must be solved either during the data pre-processing stage or by the learning algorithm. Variety is the most appealing V’s, of the data mining since it includes displaying data of various types and modalities for an individual item. Each V is not a brand-new concept. For decades, researchers in the associated fields of data mining and ML are continued to work on these problems. However, the growth of Internet-based businesses has presented a challenge to many conventional process-oriented businesses, forcing them to turn into knowledge-based businesses that are powered by data rather than processes [11].

11.5 Data Processing Opportunities and Challenges

The implementation of data transformations and preprocessing pipelines that are outcome in data representative can be useful in successful ML accounts for a significant portion of the overall effort in deploying an ML framework. Data preprocessing aims to solve problems including data duplication, inconsistency, noise, heterogeneity, transformation, labeling [for (semi-)supervised ML], feature representation/selection and data imbalance. Due to the demand for human labor and a wide range of choices to choose from, data preparation and preprocessing is normally expensive [6]. Furthermore, some traditional data perceptions do not actually hold true for big data, making some preprocessing approaches ineffective. Big data, on the other hand, provides the capacity of lowering down dependence on human error by learning directly from large, complex, and streaming data sources.

11.5.1 Data Redundancy

When multiple data samples represent the same object, duplication occurs. Data replication or inconsistency can have a significant impact on ML. Traditional methods such as pairwise similarity comparison are now not feasible for big data, despite a variety of methodology for capturing duplicates produced in the previous 20 years. Furthermore, the conventional presumption that duplicated pairs are rarer than non-redundant pairs is no longer true. Dynamic Time Warping is much better and faster than existing Euclidean algorithms for distance in this regard [12].

11.5.2 Data Noise

Data sparsity, missing and incorrect values, and irregularity can all create errors in ML. When managing with big data, conventional solutions to noisy data problems pose roadblocks. Manual procedures, for example, are no longer viable because of lack of scalability; replacing them with a mean will sacrifice the benefits of big data richness and fine granularity. In certain cases, these noisy data may contain interesting patterns, so removing them is not always the best option. Missing values can be estimated using reliable predictive analytics of big data, which can play crucial role to be used to replace inaccurate readings caused by malfunctioned sensors or broken communication channels. The highest entropy constraint are placed on the inference stage to counter significant bias that could be inserted into predictions by collective effect techniques, needs the predictions to contain the same distribution as observed marks [13]. Despite being known that data sparsity can persist and may be exacerbated by big data, the huge amount of big data provides significant possibilities for predicting the analysis so adequate frequency may be accrued for various sub-samples. Outlier detection has been scaled up (e.g., ONION) to allow analysts to easily analyze anomalies in large datasets [14].

11.5.3 Heterogeneity of Data

Big data promises incorporate multi-view data from a variety of repositories, in a variety of formats, and from a variety of population samples, and thus is highly heterogeneous. The value of these multi-level heterogeneous data (e.g., unstructured, audio, text, and video formats) for a learning task can differ. As a result, combining all of the features and treating them equally relevant is unlikely to result in optimum learning outcomes [15]. Big data helps you to learn from different viewpoints in real time and then assemble multiple outcomes by assessing the relevance of feature/characteristics views to the task. The approach is supposed should be resistant to data outliers and able to solve convergence and optimization problems.

11.5.4 Discretization of Data

Decision trees and Naive Bayes are cases of algorithms for ML that can only deal with discrete attributes. Discretization transforms quantitative data set into qualitative data set, allowing a continuous domain to be separated into non-overlapping parts. The aim of attribute discretization is to determine simplistic representations of the data in the form of groups that are suitable for the learning task while preserving as much knowledge possible from the original continuous attribute. However, some known discretization methods would be unsuccessful while dealing with massive amounts of data. Traditional discretization approaches have been parallelized in big data platforms to handle big data problems, with a distributed variant of the entropy minimization discretizer focused on the Minimum Description Length Principle optimizing both efficiency and accuracy [16].

11.5.5 Data Labeling

Traditional data annotation techniques require a lot of time and effort. To deal with big data and its related problems, several different approaches are introduced by different researchers. For example, online crowd-sourced repositories may be a good place to look for free training in annotated data with a wide range of class sizes and intra-class diversity. Furthermore, probabilistic program inference can be used to accomplish human-level principle understanding. Furthermore, ML algorithms like semi-supervised learning, adaptive learning, and transform learning have the ability to mark results. The number of questions raised to the crowd can be reduced by using active learning as the optimization technique for marking tasks in crowdsourced databases, enabling crowd-sourced applications to scale. Another problem is with dataset that cannot cover all user-specific contexts, resulting in output that is often inferior to user-centric training.

11.5.6 Imbalanced Data

Traditional stratified random sampling approaches have tackled the problem of unbalanced data. However, sub-sample generation and error metrics measurement iteration are required; the whole procedure can take huge amount of time. Furthermore, conventional sampling methods are unable to hold sampling of data over a user-specified data subset that carries value-based sampling efficiently.

11.6 Learning Opportunities and Challenges

It is challenging to find problems that are “entirely different” as a result of big data. Despite these challenges, there are still critical elements to which more time as well as efforts should be directed.

First, despite the fact that we have often attempted to manage (increasingly) vast quantities of data, we have misunderstood the key computation can be kept indefinitely. The current data volume has grown to such an extent that it is very hard for storage and even search numerous times. Many critical learning goals or acceptance criteria, on the other hand, are non-linear, non-convex, non-decomposable, and non-smooth over the samples. Is it possible to learn it by analyzing the data just once, and if so, is the storage requirement minimal and autonomous of data size? This is referred to as “one-pass learning”, and its critical because in many big data application domains the data is not necessarily large, but also accumulates over time, difficult to estimate the dataset’s final size. Apparently, some recent approaches in this direction, such as, have been developed. However, even if we have big data, is all of the data extremely important? They most likely are not, according to the evidence [14]. The question then becomes: can we extract useful data subsets from the initial large dataset?

Secondly, other advantage of big data toward ML is that, as soon as the sample size available for learning increases, the possibility of overfitting decreases. Controlling overfitting is one of the most critical issues in both the design of ML models and the development of ML techniques in use, as already we are aware of. Due to the obvious possibility of overfitting, simple models with fewer parameters to tune were naturally preferred. With big data, however, the parameter tuning constraints can change. We can now attempt to train a model with numerous parameters because we have ample big data and powerful computing resources to support such training. Deep learning has gained massive scale success in all these years, and this serves as a great example. However, many deep learning studies rely heavily on engineering methodology that are not easy to replicate and study by someone except the authors. It is crucial to analyze the challenges of deep learning [15].

At last, one should be aware that big data also includes an excessive number of “interests”, and that we might be able to extract “whatever we want” from those data; in other words, we might get evidence in support of the claim being made. So, how do we evaluate/judge the “findings”? Turning to statistical hypothesis testing is one effective alternative. Statistical assessments can be beneficial in at least two ways: First, we should ensure that whatever we have done is exactly what we intended. Second, we should ensure that the results we have received are not the product of minor data anomalies, especially those occurred due to lack of rigorous data exploration. In spite of the fact that performed statistical tests have been investigated all over the centuries and applied in ML over the decades, designing and deploying appropriate statistical tests is not easy, and statistical tests have been misused in the past. Furthermore, statistical tests suited for data analytics remain an important yet under-explored field of study, for not just the computational efficiency but for the consideration of using only a portion of the data. Deriving interpretable models is another way to verify the validity of the study findings. Even after knowing that many ML techniques are black boxes, there have been works on how to make them more understandable, such as rule extraction.

In addition to the in-depth discussion of the challenges and opportunities that big data poses to ML in entire section, here are some major challenges and opportunities. To overcome several challenging issues, ML on big data necessarily requires a different way of thinking and novel techniques. Big data is a crucial enabler of deep learning that has enhanced state-of-the-art efficiency across a range of applications. At least 1,000 various categories can be identified using deep learning, which is somewhat 2 orders of magnitude more than what a traditional neural network can handle. Furthermore, big data permit for multi-granular learning. Furthermore, big data allows for causality inference based on sequence chains, allowing for efficient decision support [16].

The necessity for ML on big data offers significant opportunities for framework and ML codesign. ML has the potential to change the way frameworks are built. Since several ML programs are profoundly admit error-tolerance and optimization-centric, iterative-convergent algorithmic approaches, an integrative system design based on ML program structure can consider issues like dynamic scheduling and bounded-error network synchronization. Hardware accelerators, including a latest supercomputer, are being developed specifically for ML applications.

11.7 Enabling Machine Learning With IoT

Internet of Things (IoT) is a notion that enables interconnected devices to transmit the data through sensors and software over the network without human interference. IoT has gained widespread popularity in last few years as it has touched every aspect of our day-to-day life which is not limited to television, lamps, refrigerator, and mobile phones. It is much higher than our imagination from smart agriculture, smart cities, autonomous vehicles, e-learning to smart healthcare. The large number of connected smart devices brings a challenge of managing enormous amount of data, storage, privacy and security concerns. Conventional method does not provide effective and reliable solution [17]. It is important to identify which data to keep, what must be removed and how it should be stored. ML algorithms develop their behavior models on arithmetical techniques on large number of datasets. ML eliminates the requirement of writing down instructions for every action of machine. Depending on the input data, models make future prediction.

IoT needs powerful, authentic, and intelligent techniques for massive scale of deployment where ML is a promising and efficient technique. ML is requisite for IoT data to bring intelligence inside the system. Deep learning can be applied with IoT gadgets where complicated sensing work to facilitate the working of application involving real time monitoring with smart objects and human beings. Unsupervised learning can play a prominent role in managing security attacks such as zero-day attack where IoT network do not have knowledge where to start from [18]. ML predictive capabilities have a remarkable use in industrial area. Different algorithms can obtain data from various smart sensors attached to the devices and can recognizing the pattern change or something wrong occurs. It can also predict when an IoT connected device requires maintenance which cuts unnecessary cost. In retail, sensors can be placed on the shopping complex and information can be collected from internet to predict the choice and quality of product the customer will buy. Based on this, the business owner can keep the stock of the products and even customer will get what they want.

In smart healthcare, patients are using various wearable devices to monitor their health on regular basis. Doctors can analyze their previous medical history and can identify and control the disease in its initial stage [19].

Combining ML with IoT is a challenging task as IoT data gives rise to variety of datasets which include structured, semi-structured, or unstructured data. In ML, for labeled data, we use clustering algorithms but when we have unlabeled data, unsupervised algorithms are being used. So, choosing an appropriate ML algorithm is not easy [20]. IoT has versatile hardware technology starting from cloud servers till low powered devices. Applications such as autonomous vehicles and drones require high level of security with least energy consumption. Special hardware processors and accelerators are in demand for efficient integration with ML techniques. Predictability of the ML algorithms lower downs when there is increase in spatial data.

11.8 Conclusion

This chapter provides a summary of the benefits and drawbacks of ML with big data. New possibilities are provided by big data for developing revolutionary and novel ML technologies to solve many associated technological problems and generate real-world impacts, while also posing multiple challenges for conventional ML with regard to scalability, adaptability, and usability. This can be used to guide future research in this field [20].

The majority of current work on ML for big data has concentrated on length, velocity, and variety, but little has been done to resolve the remaining two dimensions of big data: veracity and value. One promising way to solve issue of data veracity is to build algorithms that can access the trustworthiness or integrity of data or data sources, allowing unreliable data to be filtered out during data pre-processing. Another way is to apply new ML models that are inferred from inconsistent or even contradictory results. To fully appreciate the advantage of big data in decision support, users must be able to comprehend ML findings and the motive for each system’s decision. As a result, understandable ML will be a hot topic in the forthcoming years. Furthermore, fundamental research questions such as how to efficiently collect vast volumes of annotated data through crowd sourcing must be answered to help human-in-the-loop big data ML.

Integrating ML with IoT has a great future but still most of the ML techniques are not applicable for effectively managing data generated from IoT devices. IoT devices are resource restraint with limited computational speed so using ML algorithms would not be a good choice. Lots of issues first need to be tackled for combining IoT and ML [21].

Other open research objectives include the following: (1) how to preserve privacy of data when conducting ML; (2) role of making ML more declarative for non-experts can specify and communication can take place easily; (3) how to integrate normal domain knowledge into ML; and (4) how to build a modern architecture based on big data and ML that seamlessly offers decision-making depending on real-time analysis of large sets of heterogeneous data that can be accurate or not. In conclusion, ML is required to resolve the challenges presented by big data and to discover hidden trends, information, and perception from big data in order to translate its potential into real value for business decision-making and scientific exploration. The convergence of ML with big data points to a promising future in a modern frontier.

References

1. Jordan, M.I. and Mitchell, T.M., Machine learning: trends, perspectives, and prospects. Science, 349, 255–260, 2015.

2. Tsai, C.-W., Lai, C.-F., Chao, H.-C., Vasilakos, A.V., Big data analytics: a survey. J. Big Data, 2, 1–32, 2015.

3. Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E., Deep learning applications and challenges in big data analytics. J. Big Data, 2, 1–21, 2015.

4. Japkowicz, N. and Shah, M., Evaluating Learning Algorithms: a Classification Perspective, Cambridge University Press, New York, NY, USA, 2011.

5. Russell, S. and Norvig, P., Artificial Intelligence: A Modern Approach, 3rd ed., Prentice Hall, Upper Saddle River, New Jersey, USA, 2010.

6. Bengio, Y., Courville, A., Vincent, P., Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. Trans., 35, 1798–1828, 2013.

7. Dekel, O., From Online to Batch Learning with Cuto-Averaging. NIPS (2008), pp. 377–384, 2008.

8. Amershi, S., Cakmak, M., Knox, W.B., Kulesza, T., Power to the people: the role of humans in Interactive machine learning. AI Mag., 35, 105–120, 2014.

9. Mirchevska, V., Luštrek, M., Gams, M., Combining domain knowledge and machine learning for robust fall detection. Expert Syst., 31, 163–175, 2014.

10. Yu, T., Incorporating Prior Domain Knowledge into Inductive Machine Learning Computing Sciences, University of Technology, Sydney, Sydney, Australia, 2007.

11. Chen, Q., Zobel, J., Verspoor, K., Evaluation of a machine learning duplicate detection method for bioinformatics Databases. Proc. ACM Ninth Int. Workshop Data Text. Min. Biomed. Inform, pp. 4–12, 2015.

12. Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q. et al., Addressing Big data time series: mining Trillions of time series subsequences Under dynamic time Warping. ACM Trans. Knowl. Discovery Data, 7, 10, 2013.

13. Pfeiffer, III, J.J., Neville, J., Bennett, P.N., Overcoming relational learning biases to accurately predict preferences in large scale networks, in: Proceedings of the 24th International Conference on World Wide Web, pp. 853–863, 2015.

14. Cao, L., Wei, M., Yang, D., Rundensteiner, E.A., Online outlier exploration over large datasets, in: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98, 2015.

15. Gandomi, A. and Haider, M., Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manage., 35, 137–144, 2015.

16. Hussain, F., Hossain, E., Hussain, R., Hassan, S.A., Machine Learning in IoT Security: Current Solutions and Future Challenges. IEEE Commun. Surv. Tutorials, 99, 22, 3, 1686–1721, April 2020.

17. Jindal, M., Bhushan, B., Gupta, J., Machine learning methods for IoT and their Future Applications. Conference: 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), https://ieeexplore.ieee.org/document/8974551

18. Roopak, M., Chambers, J., Tian, G.Y., Deep Learning Models for Cyber Security in IoT Networks. 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC).

19. Paul, D., Chakraborty, T., Paul, D., Datta, S.K., IoT and Machine Learning Based Prediction of Smart Building Indoor Temperature. 2018 4th International Conference on Computer and Information Sciences (ICCOINS), pp. 1–6, 2018.

20. Mishra, R., Singh, R., Srivastava, S., AI and IoT Based Monitoring System for Increasing the Yield in Crop Production. 2020 International Conference on Electrical and Electronics Engineering (ICE3), pp. 301–305, 2020.

21. Kechar, B., Dahane, A., Benyamina, A., Benameur, R., An IoT Based Smart Farming System Using Machine Learning. 2020 International Symposium on Networks, Computers and Communications (ISNCC), pp. 1–6, 2020.

  1. *Corresponding author: [email protected]
  2. Corresponding author: [email protected]
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset