Bypassing GC

What if we do away with GC and manage memory manually like in lower-level programming languages, such as C? Java does provide a way to do that since version 1.7, and it is called sun.misc.Unsafe. Unsafe essentially means that you can build long regions of memory without any safety checks. Utilizing unsafe rows and off-heap memory and doing manual memory management was the first feature of project Tungsten.

Manual memory management by leveraging application semantics, which can be very risky if you do not know what you are doing, is a blessing with Spark. We used our knowledge of data schema (DataFrames) to directly lay out the memory ourselves. It not only gets rid of GC overheads but lets you minimize the memory footprint.

The second point is storing data in CPU cache versus memory. Everyone knows CPU cache is great, as it takes three cycles to get data from the main memory versus one cycle in the cache. This is the second feature of project Tungsten.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset