How to do it...

  1. Start the Spark shell:
        $ spark-shell 
  1. Load the data from Parquet; since parquet is the default data source, you do not have to specify it:
        scala> val people = spark.read.load("hdfs://localhost:
9000/user/hduser/people.parquet")
  1. Load the data from parquet by manually specifying the format:
        scala> val people = spark.read.format("parquet").load
("hdfs://localhost:9000/user/hduser/people.parquet")
  1. For inbuilt datatypes, you do not have to specify the full format name; only specifying "parquet", "json", or "jdbc" would work:
        scala> val people = spark.read.format("parquet").load
("hdfs://localhost:9000/user/hduser/people.parquet")
When writing data, there are four save modes: append, overwrite, errorIfExists, and ignore. The append mode adds data to a data source, overwrite overwrites it, errorIfExists throws an exception that the data already exists, and ignore does nothing when the data already exists.
  1. Save people as JSON in the append mode:
      scala> val people = people.write.format("json").mode
("append").save ("hdfs://localhost:9000/user/hduser/people.json")
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset