ntiles

ntiles is a popular aggregation over a window and is commonly used to divide an input dataset into n parts.

For example, if we want to partition the statesPopulationDF by State (window specification is shown previously), order by population, and then divide into two portions, we can use ntile over the windowspec:

import org.apache.spark.sql.functions._
scala> statesPopulationDF.select(col("State"), col("Year"),
ntile(2).over(windowSpec), rank().over(windowSpec)).sort("State",
"Year").show(10)
+-------+----+-------------------------------------------------------------
----------------------------------------------------------+----------------
---------------------------------------------------------------------------
--------------------------+
| State|Year|ntile(2) OVER (PARTITION BY State ORDER BY Population DESC
NULLS LAST ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)|RANK() OVER
(PARTITION BY State ORDER BY Population DESC NULLS LAST ROWS BETWEEN
UNBOUNDED PRECEDING AND CURRENT ROW)|
+-------+----+-------------------------------------------------------------
----------------------------------------------------------+----------------
---------------------------------------------------------------------------
--------------------------+
|Alabama|2010| 2| 6|
|Alabama|2011| 2| 7|
|Alabama|2012| 2| 5|
|Alabama|2013| 1| 4|
|Alabama|2014| 1| 3|
|Alabama|2015| 1| 2|
|Alabama|2016| 1| 1|
| Alaska|2010| 2| 7|
| Alaska|2011| 2| 6|
| Alaska|2012| 2| 5|
+-------+----+-------------------------------------------------------------
----------------------------------------------------------+----------------
--------------------------------------------------------------

As shown previously, we have used the Window function and ntile() together to divide the rows of each State into two equal portions.

A popular use of this function is to compute decile used in data science models.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset