- Start the Spark shell or the Databricks Cloud Scala notebook:
$ spark-shell
- Create the Person case class:
scala> case class Person( first_name:String, last_name:String,
age:Int)
- Load the person directory as a Dataset:
scala> val p = spark.read.textFile("s3a://sparkcookbook/person")
- Check the first item to kick the tires:
scala> p.first
- Split each line into an array of strings, based on a comma as the delimiter:
scala> val pmap = p.map( line => line.split(","))
- Convert the Dataset of Array[String] into the Dataset of Person case objects:
scala> val personDS = pmap.map( p => Person(p(0),p(1),p(2).toInt))
- Register personDS as a view:
scala> personDS.createOrReplaceTempView("person")
- Run a SQL query against it:
scala> val people = spark.sql("select * from person")
- Get the output values from people:
scala> people.show
..................Content has been hidden....................
You can't read the all page of ebook, please click
here login for view all page.