There's more...

DataVec allows us to do much more within the transformation stage. Here are some of the other important transformation features that are available within TransformProcess:

  • addConstantColumn(): Adds a new column in a dataset, where all the values in the column are identical and are as they were specified by the value. This method accepts three attributes: the new column name, the new column type, and the value.
  • appendStringColumnTransform(): Appends a string to the specified column. This method accepts two attributes: the column to append to and the string value to append.
  • conditionalCopyValueTransform(): Replaces the value in a column with the value specified in another column if a condition is satisfied. This method accepts three attributes: the column to replace the values, the column to refer to the values, and the condition to be used.
  • conditionalReplaceValueTransform(): Replaces the value in a column with the specified value if a condition is satisfied. This method accepts three attributes: the column to replace the values, the value to be used as a replacement, and the condition to be used.
  • conditionalReplaceValueTransformWithDefault(): Replaces the value in a column with the specified value if a condition is satisfied. Otherwise, it fills the column with another value. This method accepts four attributes: the column to replace the values, the value to be used if the condition is satisfied, the value to be used if the condition is not satisfied, and the condition to be used.
    We can use built-in conditions that have been written in DataVec with the transformation process or data cleaning process. We can use NaNColumnCondition to replace
     NaN values and NullWritableColumnCondition to replace null values, respectively.
  • stringToTimeTransform(): Converts a string column into a time column. This targets date columns that are saved as a string/object in the dataset. This method accepts three attributes: the name of the column to be used, the time format to be followed, and the time zone.
  • reorderColumns(): Reorders the columns using the newly defined order. We can provide the column names in the specified order as attributes to this method.
  • filter (): Defines a filter process based on the specified condition. If the condition is satisfied, remove the example or sequence; otherwise, keep the examples or sequence. This method accepts only a single attribute, which is the condition/filter to be applied. The filter() method is very useful for the data cleaning process. If we want to remove NaN values from a specified column, we can create a filter, as follows:
Filter filter = new ConditionFilter(new NaNColumnCondition("columnName"));

If we want to remove null values from a specified column, we can create a filter, as follows:

Filter filter =  new ConditionFilter(new NullWritableColumnCondition("columnName"));  

  • stringRemoveWhitespaceTransform(): This method removes whitespace characters from the value of a column. This method accepts only a single attribute, which is the column from which whitespace is to be trimmed.
  • integerMathOp(): This method is used to perform a mathematical operation on an integer column with a scalar value. Similar methods are available for types such as double and long. This method accepts three attributes: the integer column to apply the mathematical operation on, the mathematical operation itself, and the scalar value to be used for the mathematical operation.
TransformProcess is not just meant for data handling – it can also be used to overcome memory bottlenecks by a margin.

Refer to the DL4J API documentation to find more powerful DataVec features for your data analysis tasks. There are other interesting operations supported in TransformPorocess, such as reduce() and convertToString()If you're a data analyst, then you should know that many of the data normalization strategies can be applied during this stage. You can refer to the DL4J API documentation for more information on the normalization strategies that are available on https://deeplearning4j.org/docs/latest/datavec-normalization

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset