Parquet Hive interoperability

If there is already some data in a Hive table, say, the person table, you can directly save it in the Parquet format by performing the following steps:

  1. Create a table named person_parquet with the schema, the same as person, but in the Parquet storage format (for Hive 0.13 onward):
        hive> create table person_parquet like person stored as parquet
  1. Insert data in the person_parquet table by importing it from the person table:
      hive> insert overwrite table person_parquet select * from person;
  1. Sometimes, data imported from other sources, such as Impala, saves the string in the binary form. To convert it into a string while reading, set the following property in SparkConf, as shown in the following code:
        scala> spark.setConf("spark.sql.parquet.binaryAsString","true")
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset