sum

Computes the sum of the values of the column. Optionally, sumDistinct can be used to only add up distinct values.

The sum API has several implementations as follows. The exact API used depends on the specific use case:

def sum(columnName: String): Column
Aggregate function: returns the sum of all values in the given column.
def sum(e: Column): Column
Aggregate function: returns the sum of all values in the expression.
def sumDistinct(columnName: String): Column
Aggregate function: returns the sum of distinct values in the expression
def sumDistinct(e: Column): Column
Aggregate function: returns the sum of distinct values in the expression.

Let's look at example of invoking sum on the DataFrame to print summation (total) Population:

import org.apache.spark.sql.functions._
scala> statesPopulationDF.select(sum("Population")).show
+---------------+
|sum(Population)|
+---------------+
| 2188689780|
+---------------+
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset