Executing a transform process

After the transformation process has been defined, we can execute it in a controlled pipeline. It can be executed using batch processing, or we can distribute the effort to a Spark cluster. Previously, we look at TransformProcessRecordReader, which automatically does the transformation in the background. We cannot feed and execute the data if the dataset is huge. Effort can be distributed to a Spark cluster for a larger dataset. You can also perform regular local execution. In this recipe, we will discuss how to execute a transform process locally as well as remotely.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset