Parallel processing

If faced with a process that is simply taking too long, clearly more memory can help—as already discussed. If that fails, a more powerful processor is obviously something to consider. If there is not enough money to do that or if it simply does not work, a parallel approach can be considered. Some operators can be run in parallel, and RapidMiner Studio allows this to be done where the processor has two cores. To take advantage of this, it is necessary to download the parallel processing extension available from the Rapid-I Marketplace. Once this is done, a new configuration option checkbox appears on some operators; it allows them to be executed in parallel. Affected operators include the main process operator, looping operators, the subprocess operator, the branch and select operators, and the process evaluation operators. Typically, an operator that contains a loop or an implied subprocess can be made to run in a parallel fashion. Operators such as those in the Modeling and Data Transformation groups do not have this option.

For example, the X-Validation operator allows the partitions containing training and testing to be run in parallel. This is possible because the cross validation operation is inherently parallel as the individual partitions are self-contained and do not depend on each other.

Examples of processes that could not be carried out in parallel would include ones where calculations that require all the data are being performed. For example, normalizing an attribute within an example set requires all the data to be processed to determine various statistics to then apply to the individual examples.

In the context of exploring data, some activities could be carried out in parallel. For example, if multiple files are to be read in and processed so that the processing of one file depends only on the contents of that file, it would be possible to take advantage of parallel execution. The simplest possible process would be two Read CSV operators reading two files. If these are placed in the main process and the parallelize main process option is set to true, RapidMiner will execute the file reading across the available CPUs.

Tip

A word of caution about parallel processing. Even if the process can be done in a parallel fashion, there is still a risk that one instance will interfere with another. Perhaps macros are shared between instances, or it could be that the data is shared. Either way, this can cause processes to fail, so be careful.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset