Contents

Preface

Acknowledgments

Permissions

Authors

Chapter 1Introduction

1.1Overview

1.2Supporting Technologies

1.3Stream Data Analytics

1.4Applications of Stream Data Analytics for Insider Threat Detection

1.5Experimental BDMA and BDSP Systems

1.6Next Steps in BDMA and BDSP

1.7Organization of This Book

1.8Next Steps

Part ISupporting Technologies for BDMA and BDSP

Introduction to Part I

Chapter 2Data Security and Privacy

2.1Overview

2.2Security Policies

2.2.1Access Control Policies

2.2.1.1Authorization-Based Access Control Policies

2.2.1.2Role-Based Access Control

2.2.1.3Usage Control

2.2.1.4Attribute-Based Access Control

2.2.2Administration Policies

2.2.3Identification and Authentication

2.2.4Auditing: A Database System

2.2.5Views for Security

2.3Policy Enforcement and Related Issues

2.3.1SQL Extensions for Security

2.3.2Query Modification

2.3.3Discretionary Security and Database Functions

2.4Data Privacy

2.5Summary and Directions

References

Chapter 3Data Mining Techniques

3.1Introduction

3.2Overview of Data Mining Tasks and Techniques

3.3Artificial Neural Networks

3.4Support Vector Machines

3.5Markov Model

3.6Association Rule Mining (ARM)

3.7Multiclass Problem

3.8Image Mining

3.8.1Overview

3.8.2Feature Selection

3.8.3Automatic Image Annotation

3.8.4Image Classification

3.9Summary

References

Chapter 4Data Mining for Security Applications

4.1Overview

4.2Data Mining for Cyber Security

4.2.1Cyber Security Threats

4.2.1.1Cyber Terrorism, Insider Threats, and External Attacks

4.2.1.2Malicious Intrusions

4.2.1.3Credit Card Fraud and Identity Theft

4.2.1.4Attacks on Critical Infrastructures

4.2.2Data Mining for Cyber Security

4.3Data Mining Tools

4.4Summary and Directions

References

Chapter 5Cloud Computing and Semantic Web Technologies

5.1Introduction

5.2Cloud Computing

5.2.1Overview

5.2.2Preliminaries

5.2.2.1Cloud Deployment Models

5.2.2.2Service Models

5.2.3Virtualization

5.2.4Cloud Storage and Data Management

5.2.5Cloud Computing Tools

5.2.5.1Apache Hadoop

5.2.5.2MapReduce

5.2.5.3CouchDB

5.2.5.4HBase

5.2.5.5MongoDB

5.2.5.6Hive

5.2.5.7Apache Cassandra

5.3Semantic Web

5.3.1XML

5.3.2RDF

5.3.3SPARQL

5.3.4OWL

5.3.5Description Logics

5.3.6Inferencing

5.3.7SWRL

5.4Semantic Web and Security

5.4.1XML Security

5.4.2RDF Security

5.4.3Security and Ontologies

5.4.4Secure Query and Rules Processing

5.5Cloud Computing Frameworks Based on Semantic Web Technologies

5.5.1RDF Integration

5.5.2Provenance Integration

5.6Summary and Directions

References

Chapter 6Data Mining and Insider Threat Detection

6.1Introduction

6.2Insider Threat Detection

6.3The Challenges, Related Work, and Our Approach

6.4Data Mining for Insider Threat Detection

6.4.1Our Solution Architecture

6.4.2Feature Extraction and Compact Representation

6.4.2.1Vector Representation of the Content

6.4.2.2Subspace Clustering

6.4.3RDF Repository Architecture

6.4.4Data Storage

6.4.4.1File Organization

6.4.5Answering Queries Using Hadoop MapReduce

6.4.6Data Mining Applications

6.5Comprehensive Framework

6.6Summary and Directions

References

Chapter 7Big Data Management and Analytics Technologies

7.1Introduction

7.2Infrastructure Tools to Host BDMA Systems

7.3BDMA Systems and Tools

7.3.1Apache Hive

7.3.2Google BigQuery

7.3.3NoSQL Database

7.3.4Google BigTable

7.3.5Apache HBase

7.3.6MongoDB

7.3.7Apache Cassandra

7.3.8Apache CouchDB

7.3.9Oracle NoSQL Database

7.3.10Weka

7.3.11Apache Mahout

7.4Cloud Platforms

7.4.1Amazon Web Services’ DynamoDB

7.4.2Microsoft Azure’s Cosmos DB

7.4.3IBM’s Cloud-Based Big Data Solutions

7.4.4Google’s Cloud-Based Big Data Solutions

7.5 Summary and Directions

References

Conclusion to Part I

Part IIStream Data Analytics

Introduction to Part II

Chapter 8Challenges for Stream Data Classification

8.1Introduction

8.2Challenges

8.3Infinite Length and Concept Drift

8.4Concept Evolution

8.5Limited Labeled Data

8.6Experiments

8.7Our Contributions

8.8Summary and Directions

References

Chapter 9Survey of Stream Data Classification

9.1Introduction

9.2Approach to Data Stream Classification

9.3Single-Model Classification

9.4Ensemble Classification and Baseline Approach

9.5Novel Class Detection

9.5.1Novelty Detection

9.5.2Outlier Detection

9.5.3Baseline Approach

9.6Data Stream Classification with Limited Labeled Data

9.6.1Semisupervised Clustering

9.6.2Baseline Approach

9.7Summary and Directions

References

Chapter 10A Multi-Partition, Multi-Chunk Ensemble for Classifying Concept-Drifting Data Streams

10.1Introduction

10.2Ensemble Development

10.2.1Multiple Partitions of Multiple Chunks

10.2.1.1An Ensemble Built on MPC

10.2.1.2MPC Ensemble Updating Algorithm

10.2.2Error Reduction Using MPC Training

10.2.2.1Time Complexity of MPC

10.3Experiments

10.3.1Datasets and Experimental Setup

10.3.1.1Real (Botnet) Dataset

10.3.1.2Baseline Methods

10.3.2Performance Study

10.4Summary and Directions

References

Chapter 11Classification and Novel Class Detection in Concept-Drifting Data Streams

11.1Introduction

11.2ECSMiner

11.2.1Overview

11.2.2High Level Algorithm

11.2.3Nearest Neighborhood Rule

11.2.4Novel Class and Its Properties

11.2.5Base Learners

11.2.6Creating Decision Boundary during Training

11.3Classification with Novel Class Detection

11.3.1High-Level Algorithm

11.3.2Classification

11.3.3Novel Class Detection

11.3.4Analysis and Discussion

11.3.4.1Justification of the Novel Class Detection Algorithm

11.3.4.2Deviation between Approximate and Exact q-NSC Computation

11.3.4.3Time and Space Complexity

11.4Experiments

11.4.1Datasets

11.4.1.1Synthetic Data with only Concept Drift (SynC)

11.4.1.2Synthetic Data with Concept Drift and Novel Class (SynCN)

11.4.1.3Real Data—KDDCup 99 Network Intrusion Detection (KDD)

11.4.1.4Real Data—Forest Covers Dataset from UCI Repository (Forest)

11.4.2Experimental Set-Up

11.4.3Baseline Approach

11.4.4Performance Study

11.4.4.1Evaluation Approach

11.4.4.2Results

11.5Summary and Directions

References

Chapter 12Data Stream Classification with Limited Labeled Training Data

12.1Introduction

12.2Description of ReaSC

12.3Training with Limited Labeled Data

12.3.1Problem Description

12.3.2Unsupervised K-Means Clustering

12.3.3K-Means Clustering with Cluster-Impurity Minimization

12.3.4Optimizing the Objective Function with Expectation Maximization (E-M)

12.3.5Storing the Classification Model

12.4Ensemble Classification

12.4.1Classification Overview

12.4.2Ensemble Refinement

12.4.3Ensemble Update

12.4.4Time Complexity

12.5Experiments

12.5.1Dataset

12.5.2Experimental Setup

12.5.3Comparison with Baseline Methods

12.5.4Running Times, Scalability, and Memory Requirement

12.5.5Sensitivity to Parameters

12.6Summary and Directions

References

Chapter 13Directions in Data Stream Classification

13.1Introduction

13.2Discussion of the Approaches

13.2.1MPC Ensemble Approach

13.2.2Classification and Novel Class Detection in Data Streams (ECSMiner)

13.2.3Classification with Scarcely Labeled Data (ReaSC)

13.3Extensions

13.4Summary and Directions

References

Conclusion to Part II

Part IIIStream Data Analytics for Insider Threat Detection

Introduction to Part III

Chapter 14Insider Threat Detection as a Stream Mining Problem

14.1Introduction

14.2Sequence Stream Data

14.3Big Data Issues

14.4Contributions

14.5Summary and Directions

References

Chapter 15Survey of Insider Threat and Stream Mining

15.1Introduction

15.2Insider Threat Detection

15.3Stream Mining

15.4Big Data Techniques for Scalability

15.5Summary and Directions

References

Chapter 16Ensemble-Based Insider Threat Detection

16.1Introduction

16.2Ensemble Learning

16.3Ensemble for Unsupervised Learning

16.4Ensemble for Supervised Learning

16.5Summary and Directions

References

Chapter 17Details of Learning Classes

17.1Introduction

17.2Supervised Learning

17.3Unsupervised Learning

17.3.1GBAD-MDL

17.3.2GBAD-P

17.3.3GBAD-MPS

17.4Summary and Directions

References

Chapter 18Experiments and Results for Nonsequence Data

18.1Introduction

18.2Dataset

18.3Experimental Setup

18.3.1Supervised Learning

18.3.2Unsupervised Learning

18.4Results

18.4.1Supervised Learning

18.4.2Unsupervised Learning

18.5Summary and Directions

References

Chapter 19Insider Threat Detection for Sequence Data

19.1Introduction

19.2Classifying Sequence Data

19.3Unsupervised Stream-Based Sequence Learning (USSL)

19.3.1Construct the LZW Dictionary by Selecting the Patterns in the Data Stream

19.3.2Constructing the Quantized Dictionary

19.4Anomaly Detection

19.5Complexity Analysis

19.6Summary and Directions

References

Chapter 20Experiments and Results for Sequence Data

20.1Introduction

20.2Dataset

20.3Concept Drift in the Training Set

20.4Results

20.4.1Choice of Ensemble Size

20.5Summary and Directions

References

Chapter 21Scalability Using Big Data Technologies

21.1Introduction

21.2Hadoop Mapreduce Platform

21.3Scalable LZW and QD Construction Using Mapreduce Job

21.3.12MRJ Approach

21.3.21MRJ Approach

21.4Experimental Setup and Results

21.4.1Hadoop Cluster

21.4.2Big Dataset for Insider Threat Detection

21.4.3Results for Big Data Set Related to Insider Threat Detection

21.4.3.1On OD Dataset

21.4.3.2On DBD Dataset

21.5Summary and Directions

References

Chapter 22Stream Mining and Big Data for Insider Threat Detection

22.1Introduction

22.2Discussion

22.3Future Work

22.3.1Incorporate User Feedback

22.3.2Collusion Attack

22.3.3Additional Experiments

22.3.4Anomaly Detection in Social Network and Author Attribution

22.3.5Stream Mining as a Big Data Mining Problem

22.4Summary and Directions

References

Conclusion to Part III

Part IVExperimental BDMA and BDSP Systems

Introduction to Part IV

Chapter 23Cloud Query Processing System for Big Data Management

23.1Introduction

23.2Our Approach

23.3Related Work

23.4Architecture

23.5Mapreduce Framework

23.5.1Overview

23.5.2Input Files Selection

23.5.3Cost Estimation for Query Processing

23.5.4Query Plan Generation

23.5.5Breaking Ties by Summary Statistics

23.5.6MapReduce Join Execution

23.6Results

23.6.1Experimental Setup

23.6.2Evaluation

23.7Security Extensions

23.7.1Access Control Model

23.7.2Access Token Assignment

23.7.3Conflicts

23.8Summary and Directions

References

Chapter 24Big Data Analytics for Multipurpose Social Media Applications

24.1Introduction

24.2Our Premise

24.3Modules of Inxite

24.3.1Overview

24.3.2Information Engine

24.3.2.1Entity Extraction

24.3.2.2Information Integration

24.3.3Person of Interest Analysis

24.3.3.1InXite Person of Interest Profile Generation and Analysis

24.3.3.2InXite POI Threat Analysis

24.3.3.3InXite Psychosocial Analysis

24.3.3.4Other features

24.3.4InXite Threat Detection and Prediction

24.3.5Application of SNOD

24.3.5.1SNOD++

24.3.5.2Benefits of SNOD++

24.3.6Expert Systems Support

24.3.7Cloud-Design of Inxite to Handle Big Data

24.3.8Implementation

24.4Other Applications

24.5Related Work

24.6Summary and Directions

References

Chapter 25Big Data Management and Cloud for Assured Information Sharing

25.1Introduction

25.2Design Philosophy

25.3System Design

25.3.1Design of CAISS

25.3.2Design of CAISS++

25.3.2.1Limitations of CAISS

25.3.3Formal Policy Analysis

25.3.4Implementation Approach

25.4Related Work

25.4.1Our Related Research

25.4.2Overall Related Research

25.4.3Commercial Developments

25.5Extensions for Big Data-Based Social Media Applications

25.6Summary and Directions

References

Chapter 26Big Data Management for Secure Information Integration

26.1Introduction

26.2Integrating Blackbook with Amazon s3

26.3Experiments

26.4Summary and Directions

References

Chapter 27Big Data Analytics for Malware Detection

27.1Introduction

27.2Malware Detection

27.2.1Malware Detection as a Data Stream Classification Problem

27.2.2Cloud Computing for Malware Detection

27.2.3Our Contributions

27.3Related Work

27.4Design and Implementation of the System

27.4.1Ensemble Construction and Updating

27.4.2Error Reduction Analysis

27.4.3Empirical Error Reduction and Time Complexity

27.4.4Hadoop/MapReduce Framework

27.5Malicious Code Detection

27.5.1Overview

27.5.2Nondistributed Feature Extraction and Selection

27.5.3Distributed Feature Extraction and Selection

27.6Experiments

27.6.1Datasets

27.6.2Baseline Methods

27.7Discussion

27.8Summary and Directions

References

Chapter 28A Semantic Web-Based Inference Controller for Provenance Big Data

28.1Introduction

28.2Architecture for the Inference Controller

28.3Semantic Web Technologies and Provenance

28.3.1Semantic Web-Based Models

28.3.2Graphical Models and Rewriting

28.4Inference Control through Query Modification

28.4.1Our Approach

28.4.2Domains and Provenance

28.4.3Inference Controller with Two Users

28.4.4SPARQL Query Modification

28.5Implementing the Inference Controller

28.5.1Our Approach

28.5.2Implementation of a Medical Domain

28.5.3Generating and Populating the Knowledge Base

28.5.4Background Generator Module

28.6Big Data Management and Inference Control

28.7Summary and Directions

References

Conclusion to Part IV

Part VNext Steps for BDMA and BDSP

Introduction to Part V

Chapter 29Confidentiality, Privacy, and Trust for Big Data Systems

29.1Introduction

29.2Trust, Privacy, and Confidentiality

29.2.1Current Successes and Potential Failures

29.2.2Motivation for a Framework

29.3CPT Framework

29.3.1The Role of the Server

29.3.2CPT Process

29.3.3Advanced CPT

29.3.4Trust, Privacy, and Confidentiality Inference Engines

29.4Our Approach to Confidentiality Management

29.5Privacy for Social Media Systems

29.6Trust for Social Networks

29.7Integrated System

29.8CPT within the Context of Big Data and Social Networks

29.9Summary and Directions

References

Chapter 30Unified Framework for Secure Big Data Management and Analytics

30.1Overview

30.2Integrity Management and Data Provenance for Big Data Systems

30.2.1Need for Integrity

30.2.2Aspects of Integrity

30.2.3Inferencing, Data Quality, and Data Provenance

30.2.4Integrity Management, Cloud Services and Big Data

30.2.5Integrity for Big Data

30.3Design of Our Framework

30.4The Global Big Data Security and Privacy Controller

30.5Summary and Directions

References

Chapter 31Big Data, Security, and the Internet of Things

31.1Introduction

31.2Use Cases

31.3Layered Framework for Secure IoT

31.4Protecting the Data

31.5Scalable Analytics for IoT Security Applications

31.6Summary and Directions

References

Chapter 32Big Data Analytics for Malware Detection in Smartphones

32.1Introduction

32.2Our Approach

32.2.1Challenges

32.2.2Behavioral Feature Extraction and Analysis

32.2.2.1Graph-Based Behavior Analysis

32.2.2.2Sequence-Based Behavior Analysis

32.2.2.3Evolving Data Stream Classification

32.2.3Reverse Engineering Methods

32.2.4Risk-Based Framework

32.2.5Application to Smartphones

32.2.5.1Data Gathering

32.2.5.2Malware Detection

32.2.5.3Data Reverse Engineering of Smartphone
Applications

32.3Our Experimental Activities

32.3.1Covert Channel Attack in Mobile Apps

32.3.2Detecting Location Spoofing in Mobile Apps

32.3.3Large Scale, Automated Detection of SSL/TLS Man-in-the-Middle Vulnerabilities in Android Apps

32.4Infrastructure Development

32.4.1Virtual Laboratory Development

32.4.1.1Laboratory Setup

32.4.1.2Programming Projects to Support the Virtual Lab

32.4.1.3An Intelligent Fuzzier for the Automatic Android GUI Application Testing

32.4.1.4Problem Statement

32.4.1.5Understanding the Interface

32.4.1.6Generating Input Events

32.4.1.7Mitigating Data Leakage in Mobile Apps Using a Transactional Approach

32.4.1.8Technical Challenges

32.4.1.9Experimental System

32.4.1.10Policy Engine

32.4.2Curriculum Development

32.4.2.1Extensions to Existing Courses

32.4.2.2New Capstone Course on Secure Mobile Computing

32.5Summary and Directions

References

Chapter 33Toward a Case Study in Healthcare for Big Data Analytics and Security

33.1Introduction

33.2Motivation

33.2.1The Problem

33.2.2Air Quality Data

33.2.3Need for Such a Case Study

33.3Methodologies

33.4The Framework Design

33.4.1Storing and Retrieving Multiple Types of Scientific Data

33.4.1.1The Problem and Challenges

33.4.1.2Current Systems and Their Limitations

33.4.1.3The Future System

33.4.2Privacy and Security Aware Data Management for
Scientific Data

33.4.2.1The Problem and Challenges

33.4.2.2Current Systems and Their Limitations

33.4.2.3The Future System

33.4.3Offline Scalable Statistical Analytics

33.4.3.1The Problem and Challenges

33.4.3.2Current Systems and Their Limitations

33.4.3.3The Future System

33.4.3.4Mixed Continuous and Discrete Domains

33.4.4Real-Time Stream Analytics

33.4.4.1The Problem and Challenges

33.4.5Current Systems and Their Limitations

33.4.5.1The Future System

33.5Summary and Directions

References

Chapter 34Toward an Experimental Infrastructure and Education Program for BDMA and BDSP

34.1Introduction

34.2Current Research and Infrastructure Activities in BDMA and BDSP

34.2.1Big Data Analytics for Insider Threat Detection

34.2.2Secure Data Provenance

34.2.3Secure Cloud Computing

34.2.4Binary Code Analysis

34.2.5Cyber-Physical Systems Security

34.2.6Trusted Execution Environment

34.2.7Infrastructure Development

34.3Education and Infrastructure Program in BDMA

34.3.1Curriculum Development

34.3.2Experimental Program

34.3.2.1Geospatial Data Processing on GDELT

34.3.2.2Coding for Political Event Data

34.3.2.3Timely Health Indicator

34.4Security and Privacy for Big Data

34.4.1Our Approach

34.4.2Curriculum Development

34.4.2.1Extensions to Existing Courses

34.4.2.2New Capstone Course on BDSP

34.4.3Experimental Program

34.4.3.1Laboratory Setup

34.4.3.2Programming Projects to Support the Lab

34.5Summary and Directions

References

Chapter 35Directions for BDSP and BDMA

35.1Introduction

35.2Issues in BDSP

35.2.1Introduction

35.2.2Big Data Management and Analytics

35.2.3Security and Privacy

35.2.4Big Data Analytics for Security Applications

35.2.5Community Building

35.3Summary of Workshop Presentations

35.3.1Keynote Presentations

35.3.1.1Toward Privacy Aware Big Data Analytics

35.3.1.2Formal Methods for Preserving Privacy While Loading Big Data

35.3.1.3Authenticity of Digital Images in Social Media

35.3.1.4Business Intelligence Meets Big Data: An Overview of Security and Privacy

35.3.1.5Toward Risk-Aware Policy-Based Framework for BDSP

35.3.1.6Big Data Analytics: Privacy Protection Using Semantic Web Technologies

35.3.1.7Securing Big Data in the Cloud: Toward a More Focused and Data-Driven Approach

35.3.1.8Privacy in a World of Mobile Devices

35.3.1.9Access Control and Privacy Policy Challenges in Big Data

35.3.1.10Timely Health Indicators Using Remote Sensing and Innovation for the Validity of the Environment

35.3.1.11Additional Presentations

35.3.1.12Final Thoughts on the Presentations

35.4Summary of the Workshop Discussions

35.4.1Introduction

35.4.2Philosophy for BDSP

35.4.3Examples of Privacy-Enhancing Techniques

35.4.4Multiobjective Optimization Framework for Data Privacy

35.4.5Research Challenges and Multidisciplinary Approaches

35.4.6BDMA for Cyber Security

35.5Summary and Directions

References

Conclusion to Part V

Chapter 36Summary and Directions

36.1About This Chapter

36.2Summary of This Book

36.3Directions for BDMA and BDSP

36.4Where Do We Go from Here?

Appendix A: Data Management Systems: Developments and Trends

Appendix B: Database Management Systems

Index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset