How to do it...

  1. Start the Spark shell or the Databricks Cloud Scala notebook:
        $ spark-shell  
  1. Create  the Person case class:
        scala> case class Person( first_name:String, last_name:String, 
age:Int)
  1. Load the person directory as a Dataset:
       scala> val p = spark.read.textFile("s3a://sparkcookbook/person")
  1. Check the first item to kick the tires:
        scala> p.first
  1. Split each line into an array of strings, based on a comma as the delimiter:
        scala> val pmap = p.map( line => line.split(","))
  1. Convert the Dataset of Array[String] into the Dataset of Person case objects:
     scala> val personDS = pmap.map( p => Person(p(0),p(1),p(2).toInt))
  1. Register personDS as a view:
        scala> personDS.createOrReplaceTempView("person")
  1. Run a SQL query against it:
        scala> val people = spark.sql("select * from person")
  1. Get the output values from people:
        scala> people.show
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset