This recipe is about downloading files concurrently from the network. As for most recipes in this chapter, we will use the GPars framework to leverage the concurrent features required by the parallel downloading.
This recipe reuses the same build infrastructure created in the Processing collections concurrently recipe.
The download logic is completely encapsulated in a Groovy class.
FileDownloader
class to the src/main/groovy/org/groovy/cookbook
folder:package org.groovy.cookbook import static groovyx.gpars.GParsPool.* import static com.google.common.collect.Lists.* class FileDownloader { static final int POOL_SIZE = 25 static pool FileDownloader() { pool = createPool(POOL_SIZE) } private void downloadFile(String remoteUrl, String localUrl) { new File("$localUrl").withOutputStream { out -> new URL(remoteUrl).withInputStream { from -> out << from } } } private void parallelDownload(Map fromTo) { withExistingPool(pool) { fromTo.eachParallel { from, to -> downloadFile(from, to) } } } void download(Map fromTo, int maxConcurrent) { if (maxConcurrent > 0) { use(MapPartition) { List maps = fromTo.partition(maxConcurrent) maps.each { downloadMap -> parallelDownload(downloadMap) } } } else { parallelDownload(fromTo) } } } class MapPartition { static List partition(Map delegate, int size) { def rslt = delegate.inject( [ [:] ] ) { ret, elem -> (ret.last() << elem).size() >= size ? ret << [:] : ret } rslt.last() ? rslt : rslt[0..-2] } }
src/main/groovy/org/groovy/cookbook
folder:package org.groovy.cookbook import org.junit.* class FileDownloaderTest2 { static final DOWNLOAD_BASE_DIR = '/tmp' static final TEST_SERVICE = 'https://androidnetworktester.googlecode.com' static final TEST_URL ="${TEST_SERVICE}/files/1mb.txt?cache=" def downloader = new FileDownloader() Map files @Before void before() { files = [:] (1..5).each { files.put( "${TEST_URL}1.${it}", "${DOWNLOAD_BASE_DIR}/${it}MyFile.txt" ) } } @Test void testSerialDownload() { long start = System.currentTimeMillis() files.each{ k,v -> new File(v) << k.toURL().text } long timeSpent = System.currentTimeMillis() - start println "TIME NOPAR: ${timeSpent}" } @Test void testParallelDownload() { long start = System.currentTimeMillis() downloader.download(files, 0) long timeSpent = System.currentTimeMillis() - start println "TIMEPAR: ${timeSpent}" } @Test void testParallelDownloadWithMaxConcurrent() { long start = System.currentTimeMillis() downloader.download(files, 3) long timeSpent = System.currentTimeMillis() - start println "TIMEPAR MAX 3: ${timeSpent}" } }
groovy -i clean test
TIME NOPAR: 635 TIMEPAR: 391 TIMEPAR MAX 3: 586
The FileDownloader
class uses the Parallel Arrays implementation offered by GPars. This implementation provides parallel variants of the common Groovy iteration methods such as each
, collect
, and findAll
. Every time you come across a collection that is slow to process, consider using parallel collection methods. Although enabling collections for parallel processing imposes a certain overhead (mostly because of the cost of initializing a thread pool), it frequently outweighs the ineffectiveness of processing a collection in a sequential fashion. GPars gives you two options here:
GParsExecutorsPool
, which uses the Java 5 executors.In the majority of cases, the first option is more efficient, but it is always worth trying both thread pools, to verify which one performs better for a specific case.
The FileDownloader
class resorts to GParsPool
, which gets initialized in the class constructor. The pool creation operation is an expensive one and adds the higher overhead on the parallel framework.
The class entry point is the download
method, which takes a Map
and the number of parallel downloads to run. The actual parallel downloading process is carried out by the private function parallelDownload
that accepts a Map
containing the URL from where to download a file as a key, and the destination file as a value. The method uses the eachParallel
method to concurrently execute the download operation on each entry of Map
.
One interesting feature of this class is the use of the MapPartition
category. Categories in Groovy are a very elegant way to add a method to a class not under your control. The MapPartition
category allows us to "split" a Map
into smaller maps in order to enable the "concurrency" feature of the FileDownloader
class.