Data transformation is a set of techniques used to convert data from one format or structure to another format or structure. The following are some examples of transformation activities:
- Data deduplication involves the identification of duplicates and their removal.
- Key restructuring involves transforming any keys with built-in meanings to the generic keys.
- Data cleansing involves extracting words and deleting out-of-date, inaccurate, and incomplete information from the source language without extracting the meaning or information to enhance the accuracy of the source data.
- Data validation is a process of formulating rules or algorithms that help in validating different types of data against some known issues.
- Format revisioning involves converting from one format to another.
- Data derivation consists of creating a set of rules to generate more information from the data source.
- Data aggregation involves searching, extracting, summarizing, and preserving important information in different types of reporting systems.
- Data integration involves converting different data types and merging them into a common structure or schema.
- Data filtering involves identifying information relevant to any particular user.
- Data joining involves establishing a relationship between two or more tables.
The main reason for transforming the data is to get a better representation such that the transformed data is compatible with other data. In addition to this, interoperability in a system can be achieved by following a common data structure and format.
Having said that, let's start looking at data transformation techniques with data integration in the next section.