Contents
1.4Applications of Stream Data Analytics for Insider Threat Detection
1.5Experimental BDMA and BDSP Systems
1.6Next Steps in BDMA and BDSP
Part ISupporting Technologies for BDMA and BDSP
Chapter 2Data Security and Privacy
2.2.1.1Authorization-Based Access Control Policies
2.2.1.2Role-Based Access Control
2.2.1.4Attribute-Based Access Control
2.2.3Identification and Authentication
2.2.4Auditing: A Database System
2.3Policy Enforcement and Related Issues
2.3.1SQL Extensions for Security
2.3.3Discretionary Security and Database Functions
Chapter 3Data Mining Techniques
3.2Overview of Data Mining Tasks and Techniques
3.6Association Rule Mining (ARM)
3.8.3Automatic Image Annotation
Chapter 4Data Mining for Security Applications
4.2Data Mining for Cyber Security
4.2.1.1Cyber Terrorism, Insider Threats, and External Attacks
4.2.1.3Credit Card Fraud and Identity Theft
4.2.1.4Attacks on Critical Infrastructures
4.2.2Data Mining for Cyber Security
Chapter 5Cloud Computing and Semantic Web Technologies
5.2.2.1Cloud Deployment Models
5.2.4Cloud Storage and Data Management
5.4.4Secure Query and Rules Processing
5.5Cloud Computing Frameworks Based on Semantic Web Technologies
Chapter 6Data Mining and Insider Threat Detection
6.3The Challenges, Related Work, and Our Approach
6.4Data Mining for Insider Threat Detection
6.4.1Our Solution Architecture
6.4.2Feature Extraction and Compact Representation
6.4.2.1Vector Representation of the Content
6.4.3RDF Repository Architecture
6.4.5Answering Queries Using Hadoop MapReduce
Chapter 7Big Data Management and Analytics Technologies
7.2Infrastructure Tools to Host BDMA Systems
7.4.1Amazon Web Services’ DynamoDB
7.4.2Microsoft Azure’s Cosmos DB
7.4.3IBM’s Cloud-Based Big Data Solutions
7.4.4Google’s Cloud-Based Big Data Solutions
Chapter 8Challenges for Stream Data Classification
8.3Infinite Length and Concept Drift
Chapter 9Survey of Stream Data Classification
9.2Approach to Data Stream Classification
9.3Single-Model Classification
9.4Ensemble Classification and Baseline Approach
9.6Data Stream Classification with Limited Labeled Data
9.6.1Semisupervised Clustering
Chapter 10A Multi-Partition, Multi-Chunk Ensemble for Classifying Concept-Drifting Data Streams
10.2.1Multiple Partitions of Multiple Chunks
10.2.1.1An Ensemble Built on MPC
10.2.1.2MPC Ensemble Updating Algorithm
10.2.2Error Reduction Using MPC Training
10.2.2.1Time Complexity of MPC
10.3.1Datasets and Experimental Setup
Chapter 11Classification and Novel Class Detection in Concept-Drifting Data Streams
11.2.3Nearest Neighborhood Rule
11.2.4Novel Class and Its Properties
11.2.6Creating Decision Boundary during Training
11.3Classification with Novel Class Detection
11.3.4.1Justification of the Novel Class Detection Algorithm
11.3.4.2Deviation between Approximate and Exact q-NSC Computation
11.3.4.3Time and Space Complexity
11.4.1.1Synthetic Data with only Concept Drift (SynC)
11.4.1.2Synthetic Data with Concept Drift and Novel Class (SynCN)
11.4.1.3Real Data—KDDCup 99 Network Intrusion Detection (KDD)
11.4.1.4Real Data—Forest Covers Dataset from UCI Repository (Forest)
Chapter 12Data Stream Classification with Limited Labeled Training Data
12.3Training with Limited Labeled Data
12.3.2Unsupervised K-Means Clustering
12.3.3K-Means Clustering with Cluster-Impurity Minimization
12.3.4Optimizing the Objective Function with Expectation Maximization (E-M)
12.3.5Storing the Classification Model
12.5.3Comparison with Baseline Methods
12.5.4Running Times, Scalability, and Memory Requirement
12.5.5Sensitivity to Parameters
Chapter 13Directions in Data Stream Classification
13.2Discussion of the Approaches
13.2.2Classification and Novel Class Detection in Data Streams (ECSMiner)
13.2.3Classification with Scarcely Labeled Data (ReaSC)
Part IIIStream Data Analytics for Insider Threat Detection
Chapter 14Insider Threat Detection as a Stream Mining Problem
Chapter 15Survey of Insider Threat and Stream Mining
15.4Big Data Techniques for Scalability
Chapter 16Ensemble-Based Insider Threat Detection
16.3Ensemble for Unsupervised Learning
16.4Ensemble for Supervised Learning
Chapter 17Details of Learning Classes
Chapter 18Experiments and Results for Nonsequence Data
Chapter 19Insider Threat Detection for Sequence Data
19.3Unsupervised Stream-Based Sequence Learning (USSL)
19.3.1Construct the LZW Dictionary by Selecting the Patterns in the Data Stream
19.3.2Constructing the Quantized Dictionary
Chapter 20Experiments and Results for Sequence Data
20.3Concept Drift in the Training Set
Chapter 21Scalability Using Big Data Technologies
21.3Scalable LZW and QD Construction Using Mapreduce Job
21.4Experimental Setup and Results
21.4.2Big Dataset for Insider Threat Detection
21.4.3Results for Big Data Set Related to Insider Threat Detection
Chapter 22Stream Mining and Big Data for Insider Threat Detection
22.3.1Incorporate User Feedback
22.3.4Anomaly Detection in Social Network and Author Attribution
22.3.5Stream Mining as a Big Data Mining Problem
Part IVExperimental BDMA and BDSP Systems
Chapter 23Cloud Query Processing System for Big Data Management
23.5.3Cost Estimation for Query Processing
23.5.5Breaking Ties by Summary Statistics
23.5.6MapReduce Join Execution
Chapter 24Big Data Analytics for Multipurpose Social Media Applications
24.3.2.2Information Integration
24.3.3Person of Interest Analysis
24.3.3.1InXite Person of Interest Profile Generation and Analysis
24.3.3.2InXite POI Threat Analysis
24.3.3.3InXite Psychosocial Analysis
24.3.4InXite Threat Detection and Prediction
24.3.7Cloud-Design of Inxite to Handle Big Data
Chapter 25Big Data Management and Cloud for Assured Information Sharing
25.4.2Overall Related Research
25.5Extensions for Big Data-Based Social Media Applications
Chapter 26Big Data Management for Secure Information Integration
26.2Integrating Blackbook with Amazon s3
Chapter 27Big Data Analytics for Malware Detection
27.2.1Malware Detection as a Data Stream Classification Problem
27.2.2Cloud Computing for Malware Detection
27.4Design and Implementation of the System
27.4.1Ensemble Construction and Updating
27.4.2Error Reduction Analysis
27.4.3Empirical Error Reduction and Time Complexity
27.4.4Hadoop/MapReduce Framework
27.5.2Nondistributed Feature Extraction and Selection
27.5.3Distributed Feature Extraction and Selection
Chapter 28A Semantic Web-Based Inference Controller for Provenance Big Data
28.2Architecture for the Inference Controller
28.3Semantic Web Technologies and Provenance
28.3.1Semantic Web-Based Models
28.3.2Graphical Models and Rewriting
28.4Inference Control through Query Modification
28.4.3Inference Controller with Two Users
28.4.4SPARQL Query Modification
28.5Implementing the Inference Controller
28.5.2Implementation of a Medical Domain
28.5.3Generating and Populating the Knowledge Base
28.5.4Background Generator Module
28.6Big Data Management and Inference Control
Part VNext Steps for BDMA and BDSP
Chapter 29Confidentiality, Privacy, and Trust for Big Data Systems
29.2Trust, Privacy, and Confidentiality
29.2.1Current Successes and Potential Failures
29.2.2Motivation for a Framework
29.3.4Trust, Privacy, and Confidentiality Inference Engines
29.4Our Approach to Confidentiality Management
29.5Privacy for Social Media Systems
29.8CPT within the Context of Big Data and Social Networks
Chapter 30Unified Framework for Secure Big Data Management and Analytics
30.2Integrity Management and Data Provenance for Big Data Systems
30.2.3Inferencing, Data Quality, and Data Provenance
30.2.4Integrity Management, Cloud Services and Big Data
30.4The Global Big Data Security and Privacy Controller
Chapter 31Big Data, Security, and the Internet of Things
31.3Layered Framework for Secure IoT
31.5Scalable Analytics for IoT Security Applications
Chapter 32Big Data Analytics for Malware Detection in Smartphones
32.2.2Behavioral Feature Extraction and Analysis
32.2.2.1Graph-Based Behavior Analysis
32.2.2.2Sequence-Based Behavior Analysis
32.2.2.3Evolving Data Stream Classification
32.2.3Reverse Engineering Methods
32.2.5Application to Smartphones
32.2.5.3Data Reverse Engineering of Smartphone
Applications
32.3Our Experimental Activities
32.3.1Covert Channel Attack in Mobile Apps
32.3.2Detecting Location Spoofing in Mobile Apps
32.3.3Large Scale, Automated Detection of SSL/TLS Man-in-the-Middle Vulnerabilities in Android Apps
32.4Infrastructure Development
32.4.1Virtual Laboratory Development
32.4.1.2Programming Projects to Support the Virtual Lab
32.4.1.3An Intelligent Fuzzier for the Automatic Android GUI Application Testing
32.4.1.5Understanding the Interface
32.4.1.6Generating Input Events
32.4.1.7Mitigating Data Leakage in Mobile Apps Using a Transactional Approach
32.4.2.1Extensions to Existing Courses
32.4.2.2New Capstone Course on Secure Mobile Computing
Chapter 33Toward a Case Study in Healthcare for Big Data Analytics and Security
33.2.3Need for Such a Case Study
33.4.1Storing and Retrieving Multiple Types of Scientific Data
33.4.1.1The Problem and Challenges
33.4.1.2Current Systems and Their Limitations
33.4.2Privacy and Security Aware Data Management for
Scientific Data
33.4.2.1The Problem and Challenges
33.4.2.2Current Systems and Their Limitations
33.4.3Offline Scalable Statistical Analytics
33.4.3.1The Problem and Challenges
33.4.3.2Current Systems and Their Limitations
33.4.3.4Mixed Continuous and Discrete Domains
33.4.4Real-Time Stream Analytics
33.4.4.1The Problem and Challenges
33.4.5Current Systems and Their Limitations
Chapter 34Toward an Experimental Infrastructure and Education Program for BDMA and BDSP
34.2Current Research and Infrastructure Activities in BDMA and BDSP
34.2.1Big Data Analytics for Insider Threat Detection
34.2.5Cyber-Physical Systems Security
34.2.6Trusted Execution Environment
34.2.7Infrastructure Development
34.3Education and Infrastructure Program in BDMA
34.3.2.1Geospatial Data Processing on GDELT
34.3.2.2Coding for Political Event Data
34.3.2.3Timely Health Indicator
34.4Security and Privacy for Big Data
34.4.2.1Extensions to Existing Courses
34.4.2.2New Capstone Course on BDSP
34.4.3.2Programming Projects to Support the Lab
Chapter 35Directions for BDSP and BDMA
35.2.2Big Data Management and Analytics
35.2.4Big Data Analytics for Security Applications
35.3Summary of Workshop Presentations
35.3.1.1Toward Privacy Aware Big Data Analytics
35.3.1.2Formal Methods for Preserving Privacy While Loading Big Data
35.3.1.3Authenticity of Digital Images in Social Media
35.3.1.4Business Intelligence Meets Big Data: An Overview of Security and Privacy
35.3.1.5Toward Risk-Aware Policy-Based Framework for BDSP
35.3.1.6Big Data Analytics: Privacy Protection Using Semantic Web Technologies
35.3.1.7Securing Big Data in the Cloud: Toward a More Focused and Data-Driven Approach
35.3.1.8Privacy in a World of Mobile Devices
35.3.1.9Access Control and Privacy Policy Challenges in Big Data
35.3.1.11Additional Presentations
35.3.1.12Final Thoughts on the Presentations
35.4Summary of the Workshop Discussions
35.4.3Examples of Privacy-Enhancing Techniques
35.4.4Multiobjective Optimization Framework for Data Privacy
35.4.5Research Challenges and Multidisciplinary Approaches
Chapter 36Summary and Directions
36.3Directions for BDMA and BDSP
Appendix A: Data Management Systems: Developments and Trends