Chapter 12

Web of Things Data Storage

Hongming Cai; Athanasios V. Vasilakos    School of Software, Shanghai Jiao Tong University, Shanghai, China
Department of Computer Science, Electrical and Space Engineering, Luleå University of Technology, Luleå, Sweden

Abstract

With the wide spread of Web of Things (WoT) technology, massive data are generated by huge amounts of distributed sensors and different applications. WoT related applications have emerged as an important area for both engineers and researchers. As a consequence, how to acquire, integrate, store, process and use these data has become an urgent and important problem for enterprises to achieve their business goals. Based on data processing functional analysis, a framework is provided to identify the representation, management, and disposing areas of WoT data. Several associated functional modules are defined and described in terms of their key characteristics and capabilities. Then, current researches in WoT applications are organized and compared to show the state-of-the-art achievements in literature from the view of data processing process. Next, some WoT storage techniques are discussed to enable WoT applications to move into cloud platforms. Lastly, based on application requirement analysis, some future technical tendencies are also proposed.

Keywords

Web of things; Data storage; Cloud computing; Semantic disposing; Linked data

Chapter Points
• On the purpose of building a clear insight for different WoT applications and techniques, a WoT data storage framework with multi-layer structure is given which describes related techniques from the view of data disposing process.
• Data isolation and multi-tenant data storage are discussed to provide a critical and accurate view of current WoT data management in cloud platform.
• Some open issues are given from the considerations of complex data model, semantic data management and real-time data disposing in order to provide a future tendency for WoT data storage techniques.

12.1 Introduction

With the wide spread of Web of Things (WoT) technology, massive data have been generated by huge amounts of distributed sensors and different applications. WoT applications have emerged as an important area for both engineers and researchers. As a consequence, how to acquire, integrate, store, dispose and use these data has become an urgent and important problem for enterprises to implement their business applications such as intelligent transportation, smart home, intelligent manufacturing and wisdom medical system.

The features of WoT data can be summarized as follows:

• Highly heterogeneous data: WoT data are acquired from different sorts of distributed sensors and applications. The data types vary from structured data such as table data, semi-structured like eXtensible Markup Language(XML) or Resource Description Framework(RDF), and unstructured data like images and videos.

• Massive dynamic data: WoT applications are always connected to a huge quantity of sensors or devices. Communications between different objects always generate a large volume of real-time, high-speed, uninterrupted data streams, which change rapidly.

• Weakly semantic data: WoT data are event-driven low-level data with little semantic meaning. We could find little business value unless these raw data are integrated and processed.

For the reason that WoT data are always distributed, unstructured, event-based and time-related, interoperability between massive data generated by heterogeneous WoT objects brings new challenges, especially in cloud environment. Different requirements are given for these massive data processing covering different levels of data representation, data storage, data analysis and data utility. Traditional data storage focuses on resource measurement, management and provision in web-based environment. Therefore, Service Level Agreement (SLA) factors such as performance, scalability, availability, management and price are mainly concerned by owners of information infrastructure. Aiming to trace the latest progress in WoT-based data storage systems, a comprehensive process of WoT data applications and various relevant topics are discussed thoroughly.

First, based on data processing functional analysis, a framework is provided to identify the representation, storage, management, and processing areas of WoT data. Several associated functional modules are defined and described in terms of their key characteristics and capabilities.

Then, current researches in WoT data storage are classified and compared. This paper proposes a timely research of the current WoT data storage methods especially in cloud platform, and gives a timely survey which describe the state-of-the-art techniques from the view of data disposing process.

Next, some WoT data storage techniques are given to enable WoT applications to move into cloud platforms. Some key techniques related to WoT data storage, for the purpose of archiving higher availability and flexible resource provision, are discussed to provide an overview and essential information for current Cloud-based WoT applications.

As WoT technologies are evolving, a substantial amount of related applications have been founded in many industries. Based on research analysis, some future technical tendencies are also described and discussed.

12.2 The Framework of WoT Data Storage

A common WoT framework consists of perception layer, network layer and application layer. Based on the process of WoT data disposing, a framework of cloud-based data storage system for WoT application is given. The framework consists of several modules covering data storage, data representation, data management, inner or external data processing, and also a optimization module based on cloud platform, as Fig. 12.1 shows.

Image
Figure 12.1 A framework of WoT-based storage systems in Cloud Computing

Descriptions of modules are given as follows:

• Data Storage Module: Considering WoT data can be structured, semi-structured, and unstructured format, effective data storage should combine different kinds of data storage type into one body so as to build intelligent complex WoT applications.

• Data Representation Module: How to define and describe heterogeneous data from distributed and mobile devices is a fundamental problem for the data disposing process. Therefore, simple models such as event, message, rdf and other data format, and complex models such as contextual information and semantic relations are both required to represent WoT data.

• Data Management Module: For the reason that data from sensors are always raw or low-level data, different data management approaches are implemented based on data index, metadata, semantic relations and linked data, so as to retrieve and access data from distributed data sources with high efficiency.

• Inner Data Operation Module: For the purpose of disposing data in the distributed platform, massive data processing mechanisms are constructed for parallel and distributed data processing. And querying and reasoning operations can be carried out in a more flexible way inside the platforms.

• External Data Service Module: For the purpose of application, data should be composed to construct a functional service for business users, or interoperate with other applications or services. Then, high-level information needs to be extracted, classified, abstracted and encapsulated for end-user utility.

• Cloud-based Data Optimization Module: Cloud platform brings a high efficiency for current WoT applications. Optimization methods are required for processing WoT data to provide high performances, such as decreased I/O, scalability, availability etc. in cloud platform.

On the whole, the framework of WoT data storage is critical because it is composed of general middlewares and functional models to implement real large-scale WoT applications. Considering that cloud platforms bring a high efficiency and flexible way for end-user currently, much attention should be paid to enable effective and intelligent data processing based on cloud platform.

12.3 Methods and Challenge of WoT Data Storage

In the section, referring to the above framework, related research is given as data storage, data representation, data management, data operations for inner support and data services for external application utility.

12.3.1 Data Storage Type

After being attained from different data sources, WoT data can be persisted for further disposing. There are several data storage types. Relational database management system (RDBMS) is the basic and traditional storage type, which use Structured Query Language (SQL) as its basic query language. Based on RDBMS, lots of storage type are extended or developed, such as Not only SQL (NoSQL) database, database based on Hadoop Distributed File System (HDFS), In-Memory database, Bigtable database, and Graph database. Based on these different data type, the features are given and discussed as follows.

12.3.1.1 Relational Database Management System

Relational database management system has been a popular data storage type for a long time, which was proposed in 1970 in [1]. This model protects users from the details about data organization in machines, and only provides a high level accessing-query language to operate data. However, as the development of Web 2.0 and cloud computing, RDBMS has its shortage. With static schema [2], no linear query execution time and unstable query plan, RDBMS is poor in scalability. For faster and more efficient operations for big data, the authors of [3] provided Cache Augmented Database Management System (CADBMS), improving speed of queries that read and write a certain part of data by caching. CADBMS is very useful for social network applications and others systems with high read-write ratio.

Traditional database queries follow a simple policy that defined constraints must satisfied by each tuple in the query result. This policy is computationally efficient, as the database system can evaluate the query conditions on each tuple individually. However, many practical real-world problems require a collection of result tuples to satisfy constraints collectively, rather than individually. In [4], a new query model named package queries is presented to extend traditional database queries to handle complex constraints. They design PaQL, a SQL-based query language that supports the declarative specification of package queries.

12.3.1.2 NoSQL Database

NoSQL database is also called non-relational database. Data in NoSQL database has no explicit types or patterns, they are in different buckets and related data are linked to each other. In fact, NoSQL database is a general designation, people usually divide them into three main categories: key-value stores, document-based and column-oriented. Data are stored as key-value pairs in key-value database like Amazon's SimpleDB, which supports both structured and unstructured data storage. Document-based databases such as MongoDB and Apache CouchDB store data as a collection of documents, usually JSON-based. Any fields of any length can be added, any type of data can be easily stored. As for column-oriented databases, fairly related data would be linked as an extensible column, which is different to the strictly structured table in RDBMS.

In [5], the authors said that NoSQL database systems nowadays need to make trade-off among consistency, availability and partition tolerance to optimize for their applications. While a hybrid database system can use various kinds of database softwares and take advantage of their features for individual applications and workloads, how to make the database software work together to achieve the highest performance is still a challenging problem. They also provide an extensible database interface for integrating NoSQL databases and adding database operations.

NoSQL databases are mostly non-relational, distributed, open-source and horizontally scalable. The main characteristics of these databases are schema-free, no join, non-relational, easy replication support, simple API and eventually consistent.

12.3.1.3 HDFS-Based Database

Hadoop is now one of the most popular MapReduce data storage solution. However, the programming model of Hadoop is very low level, which makes developers unable to reuse and hard to maintain these programs. Then HDFS [6] comes up. This distributed file system can run on daily-used devices with low cost and high tolerant. The high throughput makes applications with big data set more available and efficient.

Hive [7] is another open-source big data storage solution on the basis of Hadoop. What makes Hive different is that it provides HiveQL, a SQL-like declarative language. Hive compiles HiveSQL into MapReduce jobs executed using Hadoop. The language includes type system and allows user-defined script and custom type. Hive also provides schemas and statistics functions, which make it more useful in query optimization, query compilation and data exploration.

Still, the result of [8] experiments in throughput over the numbers of files, and shows that Hadoop performs poorer when the number grows larger. The bottlenecks are the size of files used, the number of data nodes available and the number of reducers used.

12.3.1.4 In-Memory Database

In-memory database management systems (IMDBMS) are designed for analysis usage like On-Line Transaction Processing (OLTP) and On-Line Analysis Processing (OLAP). MonetDB and Vectorwise are traditional OLAP engines. Nowadays more modern engines occur, including Microsoft Hekaton, HStore/VoltDB, Shore-MT etc. Recently, following the trend of executing OLTP and OLAP within same system on same database state, SAP's Hana and HyPer are developed [9].

In-memory database offers great performance for data with high update rates, thus it can be used in many daily services. One usage scenario of IMDBMS is Location-Based Service (LBS) which plays an important role in different area of WoT applications. In [10], the authors combined a series of techniques to implement in-memory storage for LBS with high inserting efficiency.

Another scenario is managing vector spatial data, which is similar to the former one. The authors of [11] concentrated on reducing the cost of I/O and improving algorithm efficiency by designing and realizing a spatial data access system on the basis of in-memory database.

In-memory database can process large in-memory datasets entirely due to the growth of main memory space currently. However, the speed of main memory operations can not be as fast as CPU's for now. Therefore, the bottleneck of main-memory techniques lies in the process of moving data from memory to CPU caches. The mainstream research aspect turns to near memory computation capabilities by making good use of hardware advantages. A near data processing accelerator named JAFAR [12], is presented for pushing “select” queries down to memory instead of pulling data into caches. By this mean, select operations in column-based data system can enjoy an improvement up to nine times as before.

12.3.1.5 BigTable

BigTable is a distributed storage system which is designed by Google, just as its name, it is proposed to deal with data in large scale. Different from another popular system HDFS, BigTable only supports structured data. Thanks to the distributed features of BigTable, developers and researchers can easily get a cloud storage solution for large-scale data task with no need to build clusters by themselves. However, the public cloud service also becomes a concern for users. How to ensure the integrity of data in cloud becomes a big issue in BigTable.

BigTable serves quantities of projects at Google [13]. Data sizes of these projects can be several petabytes in different data centre and different server. BigTable has been always satisfying the demands on large data scale and low latency.

Aiming to enhance BigTable by providing an integrity solution, the authors of [14] present iBigTable. iBigtable consists of a series of security protocols based on designed data structure and BigTable. These protocols efficiently assure that the data returned by BigTable are integral. Moreover, iBigtable inherits great features of BigTable and has a great compatibility, which allows existing BigTable applications transferring to iBigTable with little change of code.

BigTable provides a flexible, high-performance solution for various products. It is implemented by three significant components: many tablet servers, a master server and a client-based library. Tablet servers manage a set of tablet, including dealing with reading and writing operations on loaded tablets and splitting super large tablets into small ones. These servers are added or removed dynamically from a cluster to accommodate changes in workloads. The master server assigns each tablet to a tablet server in cluster, detects the change in tablet servers, balances load of tablet-servers, collects garbage of files in Google file system and handles schema changes such as creating a table or a column family.

12.3.1.6 Graph Database

Graph database utilizes features of graph to provide a scalable data storage. The queries are based on nodes, properties and edges that represent or store data. Recently, more focus is put in graph for the usability in complicated structure modelling.

In [15], some experiments on graph database Neo4j and relational database MySQL are carried out. The result shows that graph database have great advantage over relational database on structured type queries and full-text searches.

Graph databases have no schema, which is very suitable for XML document storage and biological or chemical data storage. Compare to storage in graph, retrieving data efficiently from large graph database via indices is more difficult and desirable.

Aiming to realize graph mining, a novel solution of indexing graph called gIndex [16] is proposed. Distinguished from existing methods based on path, gIndex utilizes frequent substructure as the basic indices. Frequent substructures have high stability during updates and show the intrinsic features of data, which make it ideal for graph indices. However, the size of indices will grow large in a large data warehouse, so two techniques are proposed to reduce the size, size-increasing support constraint and discriminative fragments. Besides the elegant solving in graph indexing, gIndex also illustrates that data mining can do great help to indexing and query processing, especially frequent pattern mining.

To query graph database is a big issue. Query navigation is the most important part and is heavily used in graph databases. For now, using reachability patterns with regular constraints is widely adopted for query. XPath-like languages is an example [17]. XPath is widely used in XML navigation for its ability to express queries of interest, easy query evaluation for fragments and close connection to yardstick database query languages.

Inspired to use graph to represent genome data, the authors of [18] carry out an investigation in graph-based database and are inspired to use graph to represent genome data. Researchers may build a database based on the adapted graph model storing genome data, which makes genome data storage and retrieval efficient and stable.

12.3.1.7 Comparison Between Different Data Storage Types

Based on the above references analysis, a comparison is given as Table 12.1.

Table 12.1

Comparison Between Data Storage Types

Product RDBMS NoSQL database HDFS-based database In-memory database BigTable Graph database
MySQL MangoDB FB Cassandra HBase Amazon SimpleDB SAP's Hana Google BigTable Neo4j
Data Model Relational database Document Oriented Column database Column database Document Oriented Multi-column database Column database Graph database
Interface TCP/IP TCP/IP TCP/IP HTTP/REST TCP/IP TCP/IP TCP/IP HTTP/REST
Data Storage Disk Disk Disk HDFS S3 (Simple Storage Solution) Memory and disk GFS Disk
Query Method SQL Map/Reduce Map/Reduce Map/Reduce String-based query language SQL and MDX Map/Reduce Cyphe query language
Replication Asynchronous Asynchronous Asynchronous Asynchronous Asynchronous Synchronous Asynchronous/Synchronous Asynchronous
Concurrency Control Locks Locks Multi Version Concurrency Control Locks None Locks Locks
Transactions Local No Local Local No No Local Local
Written In C, C++ C++ Java Java Erlang C, C++ Java
Characteristics Static Schema Consistency High Availability Partition Tolerance Persistence Consistency Partition Tolerance Persistence High Availability Partition Tolerance Persistence Consistency Partition Tolerance Persistence High Availability Scalability High Availability Consistency High Availability Partition Tolerance Persistence High Availability Scalability

Image

From the table, we could find that WoT data storage is similar to other application data. As a traditional data storage type, RDBMS have many complex restrictions which ensure the reliability and consistency, but also make it lack scalability. However, SQL query language still makes great effect in both SQL and NoSQL database. NoSQL database has no tabular relations like traditional RDBMS does, but it owns a unique mechanism for data storage and retrieval. For the NoSQL database, HDFS-based database and BigTable perform great in distributed storage system. In big data era, they will be significant components. In-memory DB improves the performance on frequently updated data, and will show its power in geographical information systems or location-based applications. Graph database is useful for graph storage and retrieval, which makes a significant effort to social networking and semantic web applications [19].

In general, aiming to adapt to the high heterogeneity of WoT data from distributed data sources, it is a popular way to combine different data types such as RDBMS integrated with HDFS, so as to construct a scalable data storage for WoT applications.

12.3.2 Data Representation

Data Representation models are always fundamental for WoT applications. Based on the data disposing level, we divided these data model into three types: simple data model, integrated data model and semantic data model. Simple models are connected with sensor devices, such as messages, events, pictures, videos, and other data. Integrated model is composed of several simple models to construct an integrated view. Semantic model is a combination of simple models, model relationships with related contextual data.

12.3.2.1 Simple Data Model

The authors of [20] demonstrated the most significant factors in WoT: physical entity, resource and service, which can be concluded as physical entities and relationships between them. To describe these key concepts more accurately, the authors built an interlinked metadata model using micro-format such as RDF and micro-data to break the limitation of HTML format and enhance the surface representation metadata.

The authors of [21] explored a well-defined model with good extensibility for WoT information representation and organization. Based on the proposed three mainstream data type as object-cored organizing data, event-based explaining data and knowledge-based using data, the authors presented a model framework. It using two data types as different layers: the object layer and the event layer, to make an improvement over using single type only. The object layer using object-based organizing data represents all objects and relations between them. The event layer contains event-based explaining data which is extracted from raw detailed data processed by the object layer. The event layer regulates events and relations between them based on event semantic link network model with a given reasoning rule set.

The authors of [22] focused on the extraction of event information from heterogeneous and massive raw data. They proposed an approach that can effectively extract events and internal links between them from large dataset based on existing event types in a particular domain. The conceptions of event, event type, link type, and event schema are introduced and a three-layered model which consists of the data-collecting-layer, the event-extracting-layer and the presenting-layer is raised to compress the redundant data.

Aiming to describe dynamic entities in WoT applications, the authors of [23] proposed a specification model to specify the entity services. The model extends OWL-S with service status ontology to illustrate information involved in the services. The extension issues entity status in real-time and releases the information as dynamic services to requesters. With this method, the model constructs and executes transactions intelligently.

12.3.2.2 Integrated Data Model

The authors of [24] proposed an approach of creating ontological models to describe connected objects to implement support of WoT and finally to achieve unified communication between objects. Moreover, a framework is presented to allow seamless integration of semantic models and objects of web applications.

Thing Broker [25] integrates WoT objects with different characteristics, based on different protocols, providing different interfaces and constraints, meanwhile kept simple and flexible to meet the requirement from different applications. Thing Broker provides a uniform Twitter-like RESTful interface to different WoT objects. By giving one abstraction containing configurable attributes to represent each WoT object, Thing Broker manages to involve all kinds of objects in WoT, from physical entities to high-level services.

The authors of [26] delivered a formal model that provides a formal ontology representation of relations between geographic events and observations. The model exploits SEGO, a mechanism based on rules, to reason information about events via in-situ observations, and it illustrates the scenario that ontological vocabularies can be well utilized by a reasoning and querying approach to retrieve events data and sensing information.

12.3.2.3 Semantic Data Model

The authors of [27] proposed an ontology-based WoT data model called the continuum model to reflect entities evolving as space and time change. This model is important in studying the history and predicting future trends and it can track spatial entities evolving through the time and space, which play an important role in capturing semantics of modelled phenomena. This model well combines the spatial functions and temporal capabilities.

The authors of [28] proposed a general methodology that develops consumable semantic data models for smart cities. It transfers large city data of different sources into a uniformed and integrated semantic data model (RDF/OWL) by using different engineering approaches, and it enables semantic interoperability at the concept level and support application developers to design advanced city services and applications.

12.3.2.4 Comparison Between Different Representative Data Models

Based on the above references from the complex level, a comparison is given as Table 12.2.

Table 12.2

Comparison Between Representative Data Models

Article Aim Basic Data Structure Main Methods Representation Type
[21] An extensible and active semantic model of information organizing for the Internet of Things intelligent reasoning Object and event Object-cored organizing data, event-based explaining data, and knowledge-based using data Simple data model
[22] Constructing the web of events from raw data in the web of things to integrate heterogeneous and massive raw data event extracting events and their internal links from large scale data Simple data model
[23] An OWL-S based specification model of dynamic entity services for Internet of Things to construct and execute the transactions intelligently OWL extending OWL-S with Service Status ontology to describe information involved in the services Simple data model
[24] Semantic surface representation of physical entity in the WEB of things to enhance the metadata elements of surface representations RDF describing physical entities, resources and services by means of an interlinked metadata model integrated data model
[25] Thing Broker: A Twitter for Things to integrate WoT objects with different characteristics for further disposing protocols, interfaces providing a uniform Twitter-like RESTful interface to different IoT objects integrated data model
[26] A formal model to infer geographic events from sensor observations to infer information about geographic events from these observations Ontology-based exploiting a rule-based mechanism called SEGO to infer information about events integrated data model
[27] Continuum: A spatio temporal data model to represent and qualify filiation relationships entities evolving in space and time ontology-based spatiotemporal data model tracking the evolution of spatial entities or objects through the time, and combining the spatial functions provided by GeoSPARQL Semantic data model
[28] A Smart City Data Model based on Semantics Best Practice and Principles to enable semantic interoperability at the concept level RDF, OWL transferring large city data sources of different nature into RDF/OWL Semantic data model

Image

In short, data representation is used for further WoT data disposing. Simple models such as event, RDF combined with REST API provide a common format for WoT applications. Aiming to support intelligent interaction for WoT in a contextual level, the data representation should focus on integrated model by the integration of multiple simple model such as sensors, event, RDF and other format. Considering not only the data content, but also data relationships, the semantic model based on ontology and linked open data is a promising new model for web-based applications especially with social network.

12.3.3 Data Management

WoT enables billions of smart things to be accessible based on the RESTful architecture and protocols like HTTP and Constrained Application Protocol (CoAP). Meanwhile, seamless integration and wide scale interoperability are the critical challenges of data management of WoT. The WoT data management can be divided into two kinds: Metadata-based data index methods and semantic-based model annotation methods.

12.3.3.1 Metadata-Based Data Indexing

Metadata is a kind of special data defined for data management. It can make data easily organized and understood by users without being involved with everything concerning the accessing solution.

In [29], an efficient distributed metadata management scheme was proposed for data management in cloud platform. It uses the technique of metadata distribution based on parent directory path, hierarchical directory structure, cooperative double layer cache mechanism to access the distributed data with reduced latency.

Mobile Metadata [30] were proposed to build a mobile code agent supporting image retrieval based on client in web-based computing environments. Both the data model and mapping functions as a mashup and moves them to the client side. The model provides clear object model and view construction based on client, quick query response time and better exploit of network resources, and it is flexible in expansion.

By means of embedding metadata to represent smart things, a system was developed to control and monitor the state of WoT application environment [31]. The system produces a machine-readable state description of application environment. A Web request will be generated when a smart device reach this state, then an application will implement related operations to reconfigure the user's smart environment automatically. Therefore, intelligent application is implemented by means of metadata even huge quantity requests are involved.

12.3.3.2 Semantic-Based Model Annotation

The authors of [32] presented a framework for semantic and location-based services exploiting enriched maps. In particular, the framework contains mainly an approach for semantically annotating crowd-sourced cartographic data, and an innovative ontology-based function for semantic-based searching and disposing capabilities in current navigation systems.

To extract and link related concepts from raw sensor data and represent them as a topical ontology, a clustering approach extended k-means was used on the basis of rules extracted from external sources. The authors of [33] introduced a knowledge acquisition technique for real-world data processing aiming at topical ontologies creation and evolution. Then, these concepts are marked to make them understandable for user for the purpose of data analysis and data reasoning, and a related system is proposed for software support.

On handing unstructured models, the authors of [34] provided a semi-automatic semantic annotation in visualization scene based on three-layer ontology. The three-layer ontology including general ontology, domain ontology and scene ontology is constructed to form a comprehensive knowledge representation for semantic annotation. It is effective for large scale model management.

In short, metadata and simple index are good for structured data management. However, since WoT data are always unstructured or semi-structured in a dynamic and contextual environment, semantic-based techniques are widely studied, designed and applied to overcome these challenges in the past few years. With the rapid increase in the amount of data and their correlation, automatic metadata generation, ontology generation and evolution, and efficient, low cost and dynamic-updating semantic notation for WoT data indexing and model annotation have attracted great attention.

12.3.4 Data Operations for Inner Platform Support

In the WoT applications, data produced and consumed are mostly composed of the sensory data and generated data in stream. Based on data disposing process, related researches can be divided into data collection, data pre-disposing, information fusion and distributed data disposing.

12.3.4.1 Data Collection

A data collection protocol named EDAL [35] is modelled similarly to the open vehicle routing problem which is proved to be NP-hard. The EDAL is efficient in energy using, aware of delay and balancing in lifetime and used to collect data in wireless sensor network (WSN) domain.

The authors of [36] constructed a universal mobile data collection framework for WoT services. They addressed four basic requirements that are task specification, task managing, status sensing and data managing. Architecture for a general-purpose mobile data collection is also proposed which separates the whole system into two parts: one is the back-end operating on server sides and the other is font-end on mobile devices.

Aiming to carry out data stream analytic, OpenWoT approach [37] was proposed. It designs an event and clustering analytic server to collect sensor data from mobile devices and serves as an interface for data stream analysis. In detail, it uses intelligent servers and edge servers for real-time data collection, annotation and processing and adds some extensions for WoT data streams.

A sensor data stream delivery system was proposed with different delivery cycles for WoT environments [38]. When connected to servers and delivering sensor data stream, the system could provide a dynamic computational and communication performance according to the different requirement of clients.

The authors of [39] presented a hybrid system based on RFID and WSN called HRW, which integrates the traditional RFID systems and WSN systems for efficient data collection. HRW has a set of smart nodes which have both RFID and WSN functions and take place of RFID readers to gather data. Moreover, an enhanced data transmission algorithm which avoids the data redundancy and unnecessary overhead in transmission and security mechanisms which avoid data manipulation and data selective forwarding are proposed.

12.3.4.2 Data pre-Disposing

Considering that redundant information will be generated among nodes close to each other, a spatial correlation model [40] was proposed to minimize total consumption in process of data preservation. The problem of data preservation with data correlation can be transformed to the minimum cost flow problem, thus a more efficient and optimal solution than the greedy algorithm is proposed.

To solve the low efficiency and data redundancy of communications in integration, a data-cleaning algorithm called cross-redundant algorithm was proposed to archive higher performance. The authors of [41] proposed a five-layer system architecture for the integration of WSN and RFID, and chose Bluetooth and ZigBee as the communication protocols.

The authors of [42] proposed a classification method for the data streams based on some supervised classification techniques like SVM (Support Vector Machine) and reduces the volume of data by simple aggregation and approximation density. The former classification and labelling steps are the fundamentals of knowledge discovery in data stream.

A framework processing raw RFID data was proposed to reduce the uncertainty of data [43]. This framework is composed of two parts: a model tracking global objects and a model cleaning local RFID data. The former one is implemented with a Markov-based model and the latter one is implemented with a particle filter based approach.

For the purpose of communication and data collection among devices in heterogeneous network interfaces, a middleware that consists of a Multiple Protocol Transport Network (MPTN) gateway and a coordinated model was proposed [44]. Messaging and data alignment among multiple networks are implemented for concurrent data stream collections.

The authors of [45] proposed a technique preserving privacy while collecting data that can be used in healthcare applications with sensor and RFID. It assures secrecy of data via a data privacy protection mechanism and has been tested to pass various attacks. Moreover, it can be adapted for different scales of network.

12.3.4.3 Information Fusion

In [46], the authors applied OLAP techniques on sensor data to integrate data from different sources and to gather the correlative information for analysis and decision-making. An on-the-fly generating solution is proposed with metadata using the W3C semantic sensor network ontology and W3C RDF data cube vocabulary for generating multidimensional data cubes.

An automatic segmentation methodology [47] was proposed for real-time high-level activity prediction. The end of the predicated activity can be automatically marked and the training dataset can be divided into segments according the previous tagging.

In [48], the authors presented an online sensor data segmentation methodology for real-time activity recognition. A two-layer strategy composed of sensor correlation and time correlation manipulation is introduced to facilitate dynamic segmentation.

A data stream clustering algorithm [49] was proposed, which makes use of sliding window and micro-clusters merging in order to batch the similar quality farm products in the agricultural WoT platform.

The authors of [50] proposed an approach to implement some operations such as inquiries for heterogeneous sensor data that is in the format of RDF. The approach can process multiple data resources at the same time on the basis of ontology and can also integrate heterogeneous sensor data. It can construct SPARQL query statements automatically and query sensor data semantically according to the requirements of users.

12.3.4.4 Distributed Data Processing

In WoT data environment, data are changing on types, states and analysis purposes. Other than centralized master-server implementations, a parallel and particle data processing framework is need to enable the execution of MapReduce pattern in dynamic information infrastructures.

MapReduce is not perfect for every large-scale analytical task, and the cost of high communication and redundant processing makes a big challenge on WoT application. An approach which uses the MapReduce framework for large-scale graph data processing was given in [51]. The approach relies on a density-based partitioning to build balanced partitions of a graph database over a set of machines. The experiments show that the performance and scalability are satisfying for large scale data processing. However, in [52], a technical framework for improving MapReduce was given.

In [53], a parallel distributed processing system was proposed for data analysis. The system manages dependent relations between data and data, as well as data and analytic programs. The system aims to illustrate dependency and uses Hadoop Streaming for distributed parallel processing requirement. There are certain repeated executions in a program and may be executed with distinguished data each time. The specification filters these executions and check dependencies separately at each execution.

In [54], a storage system with high security and scalability on the basis of revised secret sharing scheme was proposed. The system is composed of two scalable, flexible and reliable layer: data layer and system layer. Using secret sharing scheme can avoid the complicated key management when using traditional cryptographic algorithms. Moreover, multiple storage servers for WoT data work together to achieve large storage capacity. However individual servers can still join or leave flexibly at system layer.

In [55], the authors proposed vRead, a programmable framework connecting HDFS I/O flows to the application data directly. vRead supports VMs to ‘read’ data node in disk images, which improves I/O flow without the overhead of virtualization.

12.3.4.5 Comparison Between Data Operations Inside the Platform

Data operations plat a fundamental role for inner platform support. Based on disposing steps, a comparison is given as Table 12.3.

Table 12.3

Comparison Between Data Disposing Methods

Researches Data Resources Semi-Structured Data Data Stream Generic Main Methods Topics
EDAL [35] Data in WSN (Wireless Sensor Network) Domain

1. Modelled like OVR problems.

2. A centralized meta-heuristic.

3. A distributed heuristic.

Data Collection
Mobile Data Collection Framework [36] Mobile Sensor Data #4 Basic Rqrmnts + Additional Issues Data Collection
A Sensor Data Stream Delivery System [38] Sensory Data Stream A sensor delivery system for flexible delivery cycles for different clients. Data Collection (Data Stream Delivery)
Data Collection for Large-Scale Mobile Monitoring Applications [39] Sensory Data with tag Methods to improve data transmission efficiency and to protect data privacy and avoid malicious data selective forwarding in data transmission. Data Collection
Data Alignment for Multiple Temporal Data Streams [44] Data Stream in Heterogeneous Network

1. A MPTN gateway for messaging and data alignment among multiple networks.

2. A coordinated model to collect concurrent data streams and convert time.

Data Collection (Data Alignment)
Constructing the web of events from raw data in the web of things [22] Heterogeneous and Massive Raw Data

1. Conceptions (Event, Event Type, Link Type).

2. A Three-layered Model.

Data Pre-disposing (Information Extraction)
Data Preservation in Data-Intensive Sensor Networks with Spatial Correlation [40] Sensory Data Considering spatial correlation to reduce the redundant information. Data Pre-disposing
Data Cleaning for RFID and WSN Integration [41] RFID Data

1. A five-layer system architecture developed to integrate WSNs and RFID

2. Bluetooth and ZigBee are selected as communication protocols.

3. ICRDC is used for redundant data elimination.

Data Pre-disposing (Data Cleaning)
A Novel Learning Method to Classify Data Streams in the Internet of Things [42] High Volume of Multi-dimensional Unlabeled Data Stream

1. Data Stream Classification methods based on SVM.

2. Dimension Reduction methods based on SAX Density.

Information fusion (Data Classification and Labelling)
Automatic Sensor Data Stream Segmentation for Real-time Activity Prediction in Smart Spaces [47] Sensor Data Stream and Time Window Automatic segmentation methods based on the peak value of JWD. Information fusion (Activity Prediction)
A new Clustering Algorithm for sensor Data Streams in an Agricultural IoT [49] Various Types of Sensor Data Stream in Agricultural IoT A new data stream clustering algorithm based on sliding window and micro-clusters merging. Information fusion (Data Stream Clustering )
Dynamic sensor event segmentation for real-time activity recognition in a smart home context [48] Sensory Data An online sensor data segmentation with two-layer strategy: sensor correlation and time correlation. Information fusion (Data Segmentation)
Parallel, distributed, and differential processing system [53] Sensory Data managing dependent relations between data and data, data and analytic programs Distributed data disposing

Image

(1) Data collection is always the first step in WoT applications. Considering WoT Data is often large-scale, dynamic, and at high sampling rates, some researches focus on how to organize the data collection tasks. And also a big part of researches focus on data transmission [35,38,49]. The authors of [36] built a general mobile data collection framework and talked about four basic requirements and some open issues in current mobile data collection framework. The authors of [37] made use of the OpenWoT middleware and designed an intelligent server for real time acquisition. Moreover, some researches paid attention to some open issues in data collection, such as privacy [45].

(2) Data pre-disposing is carried out for further data operation. With different disposing purposes, data pre-disposing methods can be divided into data preservation [40], data cleaning [41,43], data alignment [44] and so on. The authors of [42] proposed a classification method for the data streams based on some supervised classification approaches like Support Vector Machine (SVM) and reduced the volume of data.

(3) After collecting data from sensors, how to dispose the semi-structured, streaming data to extract information is another problem. The authors of [22] proposed an approach to extract event information from heterogeneous and massive raw data. The authors of [47,48] proposed some methods of data segmentation in order to recognize and predict human activities. And the authors of [49] designed a new data stream clustering algorithm in the agricultural WoT platform. Some of these researches take use of sliding windows to implement real-time processing.

(4) Despite its evident merits such as scalability, fault-tolerance, and flexibility, MapReduce [51] has limitation in interactive or real-time processing on handling distributed WoT data disposing. It is not perfect for every large-scale analytical task [54], and the high communication cost and redundant processing make a big challenge for IoT application. Therefore, some optimization work on WoT data are still needed for a large-scale processing purpose.

To sum up, the researches for WoT data operation are mainly concentrated on the following respects.

Firstly, researchers will pay more attention to disposing of different characteristics of WoT data, such as the elimination of redundant data, the alignment and merging of heterogeneous data, and online analysis for dynamic data. Secondly, researchers will take the integrations with existing technologies into consideration. On grounds that data in WoT applications are always at large-scale and with high sampling rate, the data disposing in WoT will be combined with the technologies of distributed computing, and streaming computing, such as Hadoop and Storm [56]. Thirdly, researchers will focus on more usage scenarios, such as activity recognition, complex cooperation, etc.

12.3.5 Data Service for External Application Interoperation

Data service are used to provide functional support for WoT applications. The purposes of data service construction can be divided into three functional aspects as data interoperability, data-centric service composition and data analysis.

12.3.5.1 Data Interoperability

Although data interoperability development keeps innovative, some challenges exist. The researchers proposed a hub-centric framework [57] on interoperability and validated this framework in a large-scale WoT environment.

A novel Semantic WoT framework was proposed based on the Constrained Application Protocol (CoAP) [58]. The framework supports annotated resources retrieval and logical ranking on the basis of semantic matchmaking services with non-standard interface. To detect high-level events and specify them using machine-readable metadata, the framework also includes some approaches on data mining to deal with raw data gathered.

The authors of [59] proposed SAMPLES to classify network traffic generated by mobile applications. SAMPLES is composed of an offline part and an online part. The offline part is a training system in charge of rule generation and the online part is an engine for application identification and traffic classification. For each input flow, a subset of conjunctive rules is applied to the flow on the basis of pre-filtering conditions. Conjunctive rules are decided by context of lexica and a unique identifier of application in HTTP header.

12.3.5.2 Data-Centric Service Composition

Mashup tools [60] are used for the development of WoT applications, which can connect the dataflow between applications and devices in a graphical way. RESTful interfaces are generated from the WoT data models, which represent a set of sensors and actuators. Generic components extend existing mashup concepts and employ concepts to polymorphic functions as in many programming languages.

By means of the composition of web services with data streams from WoT devices [61], WoT devices are connected with web service in an efficient and extensible way. Thus real-time communication, device integration and data stream mashups are elaborated.

DiscoWoT [62] provided a semantic discovery service that supports semantic discovery of the functionality of smart things by human and machines. On the basis of multiple discovery strategies, the service provided by DiscoWoT supports all strategies using RESTful interface created or updated by users at runtime.

12.3.5.3 Data Analysis

To make WoT smarter [63], data mining was introduced into applications. A system architecture for WoT and big data mining system was proposed, in which lots of WoT devices are integrated into this system to perceive the world and generate data continuously. The system focuses on the integration with devices and data mining technologies, where data mining functions will be provided as service.

Condor [64] was proposed to handle data-parallel style execution of analysis algorithms in WoT system. The analytic processes are naturally data-parallel but the executions are not. Therefore, how to execute these processes in fixed time simultaneously becomes an important challenge. The architecture of the framework allows to synchronously execute any algorithms considering them as black boxes.

In [65], an overview of related issues and challenges in aspect of big data provenance research was presented, like Accessing Big Data, Minimum Computational Overhead Requirement and so on.

To sum up, RESTful service is the main form for external applications support. Application integration across heterogeneous and distributed environments is implemented by means of RESTful service. Thus, a flexible application construction and execution environment is provided for application interaction. However, how to combine REST APIs with inner distributed disposing environment for massive data analysis is not an easy task.

12.4 WoT Data Storage in Cloud Platform

In WoT applications, massive data from sensors consume large storage space. Meanwhile, since different roles and tenants require different service and security levels, data should be isolated for various requirements of performance and safety. How to share and isolate these data in cloud platform are the main challenges in WoT data storage.

12.4.1 The Integration of Web of Things with Cloud Computing

The development in cloud computing and WoT provides a hopeful way for the increasing WoT applications. CloudWoT [66] was proposed to integrates Cloud computing and WoT to bridge the gap between Cloud and WoT, which brings new opportunities in both technology and business areas.

The conception of Database-as-a-Service (DBaaS) [67] was constructed to move operational burden from database users to service operators, which means how to configure, adjust performance, backup and so on are not responsible for database users but the service users. Early DBaaS such as Microsoft SQL Azure and Amazon RDS always try to provide such services but do not pay much attention to multi-tenancy, flexible scalability challenge and database privacy.

A new vehicular data cloud with multiple layers [68] was presented under the support of cloud computing and WoT techniques. Two fresh and original cloud services: smart parking service and vehicular data mining service, were also presented to analyze vehicle warranty in the WoT environment. Two models integrating all available sensors or devices in vehicles and road based on Naïve Bayes and logistic regression models were proposed.

Links as a Service (LaaS) [69] was proposed to act as an innovative abstraction in cloud aspect. It provides isolation of network links to decrease interference in the cloud network. A unique link set is assigned to tenants and a virtual fat-tree is formed by these links. With these links, tenants feel just like it is the only one application in the shared cloud by getting the same bandwidth and delay. Finally, the forwarding mechanism will perfectly fit each tenant.

A transactional DBaaS named Relational Cloud [70] was introduced to solve the challenge of DBaaS. Relational Cloud has three significant technical characteristic: workload awareness, graph-based data partition and adjustable security. Firstly, the Cloud has multiple tenant and the system implements an approach identifying co-located workloads on server to gain good performance and high consolidation. Secondly, by exploiting a data partitioning algorithm based on graph, the system achieves near-linear flexible scale-out no matter the transactions are simple or complicated. Thirdly, the system provides an adaptable security scheme that allows certain queries to access encrypted data under secure situations. The concept of workload awareness is an underlying key of the system design. By supervising data accesses and query patterns, the system gathers useful information for sorts of optimization and security functions, which eliminates the effort of configuration for users and operators.

As adoption of cloud-based WoT is hindered by severe privacy concerns, to make users widely apply it in different areas, the authors of [71] presented a comprehensive privacy solution. In this approach, the potentially sensitive data are protected before uploaded to the cloud, the privacy functionality are packed as a service, whether the information is private is decided by users instead of developers, and users can configure privacy easily with transparent interfaces.

12.4.2 Performance Isolation for Multi-tenant Data Storage in Cloud Platform

Aiming to assure the isolation within tenants, the authors proposed an approach based on a fitness function [72] and made some optimization to gain accurate weights to reflect different requirements.

An abstraction data model for performance isolation named SQLVM [73], was presented by the researchers at Microsoft. It is implemented in the condition of reserving key resources for tenants on database server, including CPU, I/O and memory. The main issue is that in a relational database system resource allocation is static, but the abstraction needs to allocate resources dynamically to tenants. Meanwhile, the overheads need to be low and the scale may grow very large. So the overhead and scalability are also great challenge. SQLVM can effectively isolate the performance of a tenant from the others while these tenants are co-located in same database server. And multiple scripted scenarios and a framework of data collection and visualization are applied to demonstrate the abstraction of SQLVM on performance isolation.

The authors of [74] focused on performance isolation when executing multi-tenant SaaS applications. They proposed a middleware architecture that uses a scheduler and a profiler known by tenant on the basis of tenant-specific SLAs to enhance performance isolation. The prototype they implemented reveals satisfying primary results.

In [75], the authors presented a resource allocation method in multi-tenant cloud environments by understanding the subtle interference between network, compute, and storage resources. The experiments provide insight that help cloud administrators know how to best distribute virtual cores to physical cores considering the effect of advanced virtual network technologies on remote block I/O performance.

To compare different performance isolation strategies, a standard metric is needed to quantitatively measure the capability of performance isolation in cloud. The metric should treat the cloud environment as a black box by running external benchmarks. In [76], the authors proposed three different metrics and applied them to a stimulant case mocking various tenants sharing one SaaS application instance.

The authors of [77] presented a tenant-isolated and fair system where every tenant in a data centre is isolated and shares key-value storage averagely. The previous resource allocation strategies rely on per-VM allocations and fix rate limits to make whole workloads achieve a high level. Pisces proposed by the authors corresponds to the weight of each tenant to assign the shared resources and services. The approach also works on the co-located situation and works when the request for many partitions is skewed time-varying or bottlenecked by different server resources.

In [78], the authors implemented an adaptive middleware that enables SaaS providers to efficiently enforce different and competing performance constraints in multi-tenant Software-as-a-Service(SaaS) applications. It can manage a combination of performance constraints in terms of latency, throughput and deadlines at a fine-grained level, and enables rapid response on changing circumstances, while preserving the resource usage efficiency of application-level multi-tenancy.

In short, the problem of sharing and isolating these data in cloud platform is still a main challenge in WoT data storage considering characteristics of different applications. There is still great contradiction between user authority and performance flexibility. Performance isolation has to be implemented in different levels with a consideration of different data types. Therefore, how to implement a data management model that solves the contradiction between secured sharing and performance isolation is the difficulty in current study of data management in cloud computing.

12.5 Tendency for WoT Data Storage Technology

Currently, we are stepping into a new stage of Web3.0, which attracts more widely cooperation and crowd-sourcing both in information creation and information cosumption. Therefore, data storage techniques for WoT applications have also move forwards to a new stage. Future technical tendency and some open issues are given from aspects of complex data representation, data storage and management, and real-time disposing mechanism, such as smart contextual models for data representation, big linked data for semantic data storage and management, and data stream mining for real-time data analysis and application.

12.5.1 Smart Contextual Models for Complex Data Representation

WoT data as a service faces the issues of interoperability and re-usability for massive heterogeneous sensor data and data services. Therefore, how to develop a smart device as an intelligent and self-organizational contextual model in the cloud platform is an important open issue.

The authors of [79] designed a developed platform named Semantic Web of Things (SWoT). The platform can provide semantic-based WoT application templates for developers so as to construct interoperable SWoT applications. And high-level abstractions are carried out to add sensor measurements into templates which help to reuse background or domain knowledge. Therefore, a unified platform for the implementation of interoperable semantic-based WoT applications is provided easily.

Considering complex data associations are generated from different sources or complex data structure, extracting relevant information in multilingual context from massive amounts of unstructured, structured and semi-structured data is a challenging task. Various theories have been developed and applied to ease the access to multicultural and multilingual resources. With the development of intelligent WoT applications, enhanced intelligence and contextualization models will enrich WoT with more expressive semantic association and support social interaction reasoning between smart things. It will facilitate smart things to construct a convenient and powerful devices or environment for intelligent WoT applications.

12.5.2 Big Linked Data for Semantic Data Storage and Management

Linked Data is defined as relationships or connections between data from different data sources such as databases and the Web. For the purpose of effective data management, semantic annotation based on linked data provides a new issue in a massive, complex associated and contextual application scene. These associated and contextual data play a critical role for intelligent application.

The authors of [80] described and annotated WoT data stream by means of linked-data. A novel semantic model containing the observation and measurement data is built to create expressive descriptions of sensor streams. And the semantic model is proved to be efficient, which can reduce the size of the representations of the stream data.

In short, driven-by semantic technology such as linked data and ontology, we could predict that semantic data processing approaches will get a great improvement in the near future. And a more natural and meaningful way with high-level information will be common in different WoT areas. Combined with Natural Language Processing, the semantic technology will be used to create more intelligent applications.

12.5.3 Data Stream Disposing for Real-Time Data Operation

Unstructured data such as video data can not be stored into a structured database system for analysis purpose. And data mining on data stream form different data sources with non-persisted association is a new but important issue. There are several different directions to process data stream with some dynamic methods, for example, to retrieve features from continuous data stream so as to build data association, or to process the whole body of a fragment of data stream by function transformation.

In [81], WoT infrastructure should focus on real-time interaction in the future research. Therefore, a WoT micro-benchmark is designed to combine cloud computing, service decomposition and multi-threading programming. And the benchmark is evaluated over a real WoT system.

The Streaming Linked Data (SLD) framework [82] provides a pluggable system for analysis of RDF streams. By means of a set of visualization widgets, data streams could be collected and analysed based on semantic techniques. And the Streaming Linked Data Format can be used in distributed environments with flexibility.

Data Stream mining involves uncertain reasoning based on partition data and utilizes intermediate result for high efficiency. When unstructured and semi-structured data are also involved in the processing process, there are lots of researches and technical problems left to do.

12.6 Conclusion

As WoT technologies are evolving and acting an important role in many applications, the article surveys the timely literatures to give an overview of WoT data storage researches.

For the purpose of providing a clear insight for different WoT systems and techniques, a WoT data storage framework with multi-layer structure is given firstly. Then related techniques are described and discussed from the view of data disposing process such as data representation, storage, management, inner data operations and external data services and so on.

Cloud platform is a popular information infrastructure for current WoT applications. Data isolation and multi-tenant data storage are discussed to provide a critical and accurate knowledge of the current WoT data management in cloud platform. It is significant for current WoT applications to achieve higher availability and flexible resource provision in cloud platform.

Aiming to provide a future tendency for WoT data storage techniques, some open issues are given from the considerations of complex data model, semantic data management and real-time data disposing.

In short, data storage techniques can be utilized to offer competitive advantage to the intelligent WoT applications. However, lots of efforts still need to be made to respond to the high heterogeneous, massive dynamic, weak semantic features of WoT data.

References

[1] Edgar F. Codd, A relational model of data for large shared data banks, Commun ACM 1970;13(6):377–387.

[2] Rabi Prasad Padhy, Manas Ranjan Patra, Suresh Chandra Satapathy, Rdbms to nosql: reviewing some next-generation non-relational databases, Int J Adv Eng Sci Technol 2011;11(1):15–30.

[3] Shahram Ghandeharizadeh, Jason Yap, Cache augmented database management systems, In: Proceedings of the ACM SIGMOD workshop on databases and social networks. ACM; 2013:31–36.

[4] Matteo Brucato, Juan Felipe Beltran, Azza Abouzied, Alexandra Meliou, Scalable package queries in relational database systems, Proc VLDB Endow 2016;9(7):576–587.

[5] Han-Sheng Huang, Shih-Hao Hung, Chih-Wei Yeh, Load balancing for hybrid nosql database management systems, In: Proceedings of the 2015 conference on research in adaptive and convergent systems. ACM; 2015:80–85.

[6] Dinh-Mao Bui, Shujaat Hussain, Eui-Nam Huh, Sungyoung Lee, Adaptive replication management in hdfs based on supervised learning, IEEE Trans Knowl Data Eng 2016;28(6):1369–1382.

[7] Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, Zheng Shao, Prasad Chakka, Ning Zhang, Suresh Antony, Hao Liu, Raghotham Murthy, Hive-a petabyte scale data warehouse using hadoop, In: 2010 IEEE 26th international conference on data engineering (ICDE). IEEE; 2010:996–1005.

[8] Dipayan Dipayan, Ripon Patgiri, Performance evaluation of hdfs in big data management, In: 2014 international conference on high performance computing and applications (ICHPCA). IEEE; 2014:1–7.

[9] Viktor Leis, Alfons Kemper, Thomas Neumann, Exploiting hardware transactional memory in main-memory databases, In: 2014 IEEE 30th international conference on data engineering (ICDE). IEEE; 2014:580–591.

[10] Suprio Ray, Rolando Blanco, Anil K. Goel, Supporting location-based services in a main-memory database, In: 2014 IEEE 15th international conference on mobile data management (MDM), vol. 1. IEEE; 2014:3–12.

[11] Xiao Yao, Qiang Qiu, Mengfei Zhang, Cuiting Chen, Jinyun Fang, Research on vector spatial data access based on main memory database, In: 2015 IEEE international geoscience and remote sensing symposium (IGARSS). IEEE; 2015:4704–4707.

[12] Oreoluwatomiwa O. Babarinsa, Stratos Idreos, Jafar: near-data processing for databases, In: Proceedings of the 2015 ACM SIGMOD international conference on management of data. ACM; 2015:2069–2070.

[13] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber, Bigtable: a distributed storage system for structured data, ACM Trans Comput Syst (TOCS) 2008;26(2):4.

[14] Wei Wei, Ting Yu, Rui Xue, ibigtable: practical data integrity for bigtable in public cloud, In: Proceedings of the third ACM conference on data and application security and privacy. ACM; 2013:341–352.

[15] Chad Vicknair, Michael Macias, Zhendong Zhao, Xiaofei Nan, Yixin Chen, Dawn Wilkins, A comparison of a graph database and a relational database: a data provenance perspective, In: Proceedings of the 48th annual southeast regional conference. ACM; 2010:42.

[16] Xifeng Yan, Philip S. Yu, Jiawei Han, Graph indexing: a frequent structure-based approach, In: Proceedings of the 2004 ACM SIGMOD international conference on management of data. ACM; 2004:335–346.

[17] Leonid Libkin, Wim Martens, Domagoj Vrgoč, Querying graph databases with xpath, In: Proceedings of the 16th international conference on database theory. ACM; 2013:129–140.

[18] Mark Graves, Ellen R. Bergeman, Charles B. Lawrence, Graph database systems, IEEE Eng Med Biol Mag 1995;14(6):737–745.

[19] Pablo Barceló Baeza, Querying graph databases, In: Proceedings of the 32nd symposium on principles of database systems. ACM; 2013:175–188.

[20] Haining Yu, Binxing Fang, Xiangzhan Yu, Juan Chen, Semantic surface representation of physical entity in the web of things, In: 2012 IEEE 2nd international conference on cloud computing and intelligent systems (CCIS), vol. 3. IEEE; 2012:1032–1036.

[21] Yunchuan Sun, Antonio J. Jara, An extensible and active semantic model of information organizing for the internet of things, Pers Ubiquitous Comput 2014;18(8):1821–1833.

[22] Yunchuan Sun, Hongli Yan, Cheng Lu, Rongfang Bie, Zhangbing Zhou, Constructing the web of events from raw data in the web of things, Mob Inf Syst 2014;10(1):105–125.

[23] Chao Qu, Fagui Liu, Ming Tao, Dacheng Deng, An owl-s based specification model of dynamic entity services for internet of things, J Ambient Intell Humaniz Comput 2016;7(1):73–82.

[24] Benoit Christophe, Semantic profiles to model the “web of things”, In: 2011 seventh international conference on semantics knowledge and grid (SKG). IEEE; 2011:51–58.

[25] Ricardo Aparecido Perez de Almeida, Michael Blackstock, Rodger Lea, Roberto Calderon, Antonio Francisco do Prado, Helio Crestana Guardia, Thing broker: a twitter for things, In: Proceedings of the 2013 ACM conference on pervasive and ubiquitous computing adjunct publication. ACM; 2013:1545–1554.

[26] Anusuriya Devaraju, Werner Kuhn, Chris S. Renschler, A formal model to infer geographic events from sensor observations, Int J Geogr Inf Sci 2015;29(1):1–27.

[27] Benjamin Harbelot, Helbert Arenas, Christophe Cruz, Continuum: a spatiotemporal data model to represent and qualify filiation relationships, In: Proceedings of the 4th ACM SIGSPATIAL international workshop on GeoStreaming. ACM; 2013:76–85.

[28] Sergio Consoli, Misael Mongiovic, Andrea G. Nuzzolese, Silvio Peroni, Valentina Presutti, Diego Reforgiato Recupero, Daria Spampinato, A smart city data model based on semantics best practice and principles, In: Proceedings of the 24th international conference on world wide web companion. 2015:1395–1400 International World Wide Web Conferences Steering Committee.

[29] Yixue Wang, HaiTao Lv, Efficient metadata management in cloud computing, In: 2011 IEEE 3rd international conference on communication software and networks (ICCSN). 2011:514–519.

[30] Daniel Beatty, Noé Lopez-Benitez, Mobile metadata for web-based image query services, In: 2010 eleventh international conference on mobile data management (MDM). IEEE; 2010:53–58.

[31] Simon Mayer, Gianin Basler, Semantic metadata to support device interaction in smart environments, In: Proceedings of the 2013 ACM conference on pervasive and ubiquitous computing adjunct publication. ACM; 2013:1505–1514.

[32] Floriano Scioscia, Mario Binetti, Michele Ruta, Saverio Ieva, Eugenio Di Sciascio, A framework and a tool for semantic annotation of pois in openstreetmap, Proc, Soc Behav Sci 2014;111:1092–1101.

[33] Frieder Ganz, Payam Barnaghi, Francois Carrez, Automated semantic knowledge acquisition from sensor data, IEEE Syst J 2016;10(3):1214–1225.

[34] Hongming Cai, Mengwei Shi, Boyi Xu, Mingjiu Yu, Semantic annotation for web3d scene based on three-layer ontology, Integr Comput-Aided Eng 2015;22(1):87–101.

[35] Yanjun Yao, Qing Cao, Athanasios V. Vasilakos, Edal: an energy-efficient, delay-aware, and lifetime-balancing data collection protocol for heterogeneous wireless sensor networks, IEEE/ACM Trans Netw 2015;23(3):810–823.

[36] Paul Y. Cao, Gang Li, Guoxing Chen, Biao Chen, Mobile data collection frameworks: a survey, In: Proceedings of the 2015 workshop on mobile big data. ACM; 2015:25–30.

[37] Hugo Hromic, Danh Le Phuoc, Martin Serrano, Aleksandar Antonic, Ivana P. Zarko, Conor Hayes, Stefan Decker, Real time analysis of sensor data for the internet of things by means of clustering and event processing, In: 2015 IEEE international conference on communications (ICC). IEEE; 2015:685–691.

[38] Tomoki Yoshihisa, Yoshimasa Ishi, Kodai Mako, Tomoya Kawakami, Yuuichi Teranishi, A sensor data stream delivery system with different delivery cycles for iot environments, In: 2015 10th international conference on P2P, parallel, grid, cloud and internet computing (3PGCIC). IEEE; 2015:748–753.

[39] Haiying Shen, Ze Li, Lei Yu, Chenxi Qiu, Efficient data collection for large-scale mobile monitoring applications, IEEE Trans Parallel Distrib Syst 2014;25(6):1424–1436.

[40] Nathaniel Crary, Bin Tang, Setu Taase, Data preservation in data-intensive sensor networks with spatial correlation, In: Proceedings of the 2015 workshop on mobile big data. ACM; 2015:7–12.

[41] Li Wang, Li Da Xu, Zhuming Bi, Yingcheng Xu, Data cleaning for rfid and wsn integration, IEEE Trans Ind Inform 2014;10(1):408–418.

[42] Muhammad Asad Khan, Ajmal Khan, Muhammad Nasir Khan, Sohel Anwar, A novel learning method to classify data streams in the internet of things, In: 2014 national software engineering conference (NSEC). IEEE; 2014:61–66.

[43] Jiangang Ma, Quan Z. Sheng, Dong Xie, Jen Min Chuah, Yongrui Qin, Efficiently managing uncertain data in rfid sensor networks, World Wide Web 2015;18(4):819–844.

[44] Chi-Sheng Shih, Chan-Ming Yang, Yen-Chien Cheng, Data alignment for multiple temporal data streams without synchronized clocks on iot fusion gateway, In: 2015 IEEE international conference on data science and data intensive systems. IEEE; 2015:667–674.

[45] Farin Rahman, Doug Williams, Sheikh Iqbal Ahamed, Ji-Jiang Yang, Qing Wang, Pridac: privacy preserving data collection in sensor enabled rfid based healthcare services, In: 2014 IEEE 15th international symposium on high-assurance systems engineering (HASE). IEEE; 2014:236–242.

[46] Muntazir Mehdi, Ratnesh Sahay, Wassim Derguech, Edward Curry, On-the-fly generation of multidimensional data cubes for web of things, In: Proceedings of the 17th international database engineering & applications symposium. ACM; 2013:28–37.

[47] Hyunjeong Cho, Jihoon An, Intaek Hong, Younghee Lee, Automatic sensor data stream segmentation for real-time activity prediction in smart spaces, In: Proceedings of the 2015 workshop on IoT challenges in mobile and industrial systems. ACM; 2015:13–18.

[48] Jie Wan, Michael J. O'Grady, Gregory M.P. O'Hare, Dynamic sensor event segmentation for real-time activity recognition in a smart home context, Pers Ubiquitous Comput 2015;19(2):287–301.

[49] Mingze Wu, Yitong Wang, Zhicheng Liao, A new clustering algorithm for sensor data streams in an agricultural iot, In: 2013 IEEE 10th international conference on high performance computing and communications & 2013 IEEE international conference on embedded and ubiquitous computing. IEEE; 2013:2373–2378.

[50] Xiaoming Zhang, Yunping Zhao, Xiang Wang, Dongyu Pan, An approach to provide visual data service for heterogeneous sensor data based on ssn ontology, In: 2015 international conference on identification, information, and knowledge in the internet of things (IIKI). IEEE; 2015:254–257.

[51] Sabeur Aridhi, Laurent d'Orazio, Mondher Maddouri, Engelbert Mephu Nguifo, Density-based data partitioning strategy to approximate large-scale subgraph mining, Inf Sci 2015;48:213–223.

[52] Christos Doulkeridis, Kjetil Nørvåg, A survey of large-scale analytical query processing in mapreduce, VLDB J 2014;23(3):355–380.

[53] Takamichi Toda, Sozo Inoue, Lin Li, Parallel, distributed, and differential processing system for human activity sensing flows, In: Proceedings of the 2013 ACM conference on pervasive and ubiquitous computing adjunct publication. ACM; 2013:689–700.

[54] Hai Jiang, Feng Shen, Su Chen, Kuan-Ching Li, Young-Sik Jeong, A secure and scalable storage system for aggregate data in iot, Future Gener Comput Syst 2015;49:133–141.

[55] Cong Xu, Brendan Saltaformaggio, Sahan Gamage, Ramana Rao Kompella, Dongyan Xu, vread: efficient data access for hadoop in virtualized clouds, In: Proceedings of the 16th annual middleware conference. ACM; 2015:125–136.

[56] Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy, Storm@twitter, In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, SIGMOD '14. ACM; 2014:147–156.

[57] Michael Blackstock, Rodger Lea, Toward interoperability in a web of things, In: Proceedings of the 2013 ACM conference on pervasive and ubiquitous computing adjunct publication. ACM; 2013:1565–1574.

[58] Michele Ruta, Floriano Scioscia, Allan Pinto, Eugenio Di Sciascio, Fabiana Gramegna, Saverio Ieva, Giuseppe Loseto, Resource annotation, dissemination and discovery in the semantic web of things: a coap-based framework, In: Green computing and communications (GreenCom), 2013 IEEE and internet of things (iThings/CPSCom), IEEE international conference on and IEEE cyber, physical and social computing. IEEE; 2013:527–534.

[59] Hongyi Yao, Gyan Ranjan, Alok Tongaonkar, Yong Liao, Zhuoqing Morley Mao, Samples: self adaptive mining of persistent lexical snippets for classifying mobile application traffic, In: Proceedings of the 21st annual international conference on mobile computing and networking. ACM; 2015:439–451.

[60] Christian Prehofer, Dominik Schinner, Generic operations on restful resources in mashup tools, In: Proceedings of the 6th international workshop on the web of things. ACM; 2015:3.

[61] Robert Kleinfeld, Stephan Steglich, Lukasz Radziwonowicz, Charalampos Doukas, glue.things: a mashup platform for wiring the internet of things with the internet of services, In: Proceedings of the 5th international workshop on web of things. ACM; 2014:16–21.

[62] Simon Mayer, Dominique Guinard, An extensible discovery service for smart things, In: Proceedings of the second international workshop on web of things. ACM; 2011:7.

[63] Feng Chen, Pan Deng, Jiafu Wan, Daqiang Zhang, Athanasios V. Vasilakos, Xiaohui Rong, Data mining for the internet of things: literature review and challenges, Int J Distrib Sens Netw 2015;2015(12).

[64] Arijit Mukherjee, Swarnava Dey, Himadri Sekhar Paul, Batsayan Das, Utilising condor for data parallel analytics in an iot context—an experience report, In: 2013 IEEE 9th international conference on wireless and mobile computing, networking and communications (WiMob). IEEE; 2013:325–331.

[65] Alfredo Cuzzocrea, Provenance research issues and challenges in the big data era, In: 2015 IEEE 39th annual computer software and applications conference (COMPSAC), vol. 3. IEEE; 2015:684–686.

[66] Alessio Botta, Walter de Donato, Valerio Persico, Antonio Pescapé, Integration of cloud computing and internet of things: a survey, Future Gener Comput Syst 2016;56:684–700.

[67] Wolfgang Lehner, Kai-Uwe Sattler, Database as a service (dbaas), In: 2010 IEEE 26th international conference on data engineering (ICDE 2010). IEEE; 2010:1216–1217.

[68] Wu He, Gongjun Yan, Li Da Xu, Developing vehicular data cloud services in the iot environment, IEEE Trans Ind Inform 2014;10(2):1587–1595.

[69] Eitan Zahavi, Alexander Shpiner, Ori Rottenstreich, Avinoam Kolodny, Isaac Keslassy, Links as a service (laas): guaranteed tenant isolation in the shared cloud, In: Proceedings of the 2016 symposium on architectures for networking and communications systems. ACM; 2016:87–98.

[70] Carlo Curino, Evan P.C. Jones, Raluca Ada Popa, Nirmesh Malviya, Eugene Wu, Sam Madden, Hari Balakrishnan, Nickolai Zeldovich, Relational cloud: a database-as-a-service for the cloud, In: 5th biennial conference on innovative data systems research, CIDR 2011. 2011:235–240.

[71] Martin Henze, Lars Hermerschmidt, Daniel Kerpen, Roger Häußling, Bernhard Rumpe, Klaus Wehrle, A comprehensive approach to privacy in the cloud-based internet of things, Future Gener Comput Syst 2016;56:701–718.

[72] Rouven Krebs, Philipp Schneider, Nikolas Herbst, Optimization method for request admission control to guarantee performance isolation, In: Proceedings of the 2nd international workshop on hot topics in cloud service scalability. ACM; 2014:4.

[73] Vivek Narasayya, Sudipto Das, Manoj Syamala, Surajit Chaudhuri, Feng Li, Hyunjung Park, A demonstration of sqlvm: performance isolation in multi-tenant relational database-as-a-service, In: Proceedings of the 2013 ACM SIGMOD international conference on management of data. ACM; 2013:1077–1080.

[74] Stefan Walraven, Tanguy Monheim, Eddy Truyen, Wouter Joosen, Towards performance isolation in multi-tenant saas applications, In: Proceedings of the 7th workshop on middleware for next generation internet computing. ACM; 2012:6.

[75] Paul Ruth, Anirban Mandal, Claris Castillo, Robert Fowler, Jeff Tilson, Ilya Baldin, Yufeng Xin, Achieving performance isolation on multi-tenant networked clouds using advanced block storage mechanisms, In: Proceedings of the 6th workshop on scientific cloud computing. ACM; 2015:29–32.

[76] Rouven Krebs, Christof Momm, Samuel Kounev, Metrics and techniques for quantifying performance isolation in cloud environments, Sci Comput Program 2014;90:116–134.

[77] David Shue, Michael J. Freedman, Anees Shaikh, Performance isolation and fairness for multi-tenant cloud storage, In: Presented as part of the 10th USENIX symposium on operating systems design and implementation (OSDI 12). USENIX; 2012:349–362.

[78] Stefan Walraven, Wouter De Borger, Bart Vanbrabant, Bert Lagaisse, Dimitri Van Landuyt, Wouter Joosen, Adaptive performance isolation middleware for multi-tenant saas, In: 2015 IEEE/ACM 8th international conference on utility and cloud computing (UCC). IEEE; 2015:112–121.

[79] Christian Bizer, The emerging web of linked data, IEEE Intell Syst 2009;24(5):87–92.

[80] Payam Barnaghi, Wei Wang, Lijun Dong, Chonggang Wang, A linked-data model for semantic sensor streams, In: Green computing and communications (GreenCom), 2013 IEEE and internet of things (iThings/CPSCom), IEEE international conference on and IEEE cyber, physical and social computing. IEEE; 2013:468–475.

[81] Márcio Miguel Gomes, Rodrigo da Rosa Righi, Cristiano André da Costa, Future directions for providing better iot infrastructure, In: Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing: adjunct publication. ACM; 2014:51–54.

[82] Marco Balduini, Emanuele Della Valle, Daniele Dell'Aglio, Mikalai Tsytsarau, Themis Palpanas, Cristian Confalonieri, Social listening of city scale events using the streaming linked data framework, In: The semantic web–ISWC 2013. Springer; 2013:1–16.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset