If there is already some data in a Hive table, say, the person table, you can directly save it in the Parquet format by performing the following steps:
- Create a table named person_parquet with the schema, the same as person, but in the Parquet storage format (for Hive 0.13 onward):
hive> create table person_parquet like person stored as parquet
- Insert data in the person_parquet table by importing it from the person table:
hive> insert overwrite table person_parquet select * from person;
- Sometimes, data imported from other sources, such as Impala, saves the string in the binary form. To convert it into a string while reading, set the following property in SparkConf, as shown in the following code:
scala> spark.setConf("spark.sql.parquet.binaryAsString","true")