Structured data sources

Structured data sources are the one with a high degree of organization. These kinds of data sources follow a specific data model, and the engine which makes the storing activity is programmed to respect this model.

An R data frame is a typical example of structured data, where you can find columns and rows, and every column has a specific type of data among all records. A well-known data model behind structured data is the so-called relational model of data. Following this model, each table has to represent an entity within the considered universe of analysis. Each entity will then have a specific attribute within each column, and a related observation within each row. Finally, each entity can be related to the others through key attributes.

We can think of an example of a relational database of a small factory. Within this database, we have a table recording all customers orders and one table recording all shipments. Finally, a table recording the warehouse's movements will be included.

Within this database, we will have:

  • The warehouse table linked to the shipment table through the product_code attribute
  • The shipment table linked to the customer table through the shipment_code attribute

It can be easily seen that a relevant advantage of this model is the possibility to easily perform queries within tables, and merges between them. The cost to analyze structured data is far lower than the one to be considered when dealing with unstructured data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset