Inferring schema using case classes

In schema-aware formats, such as Parquet and JSON. This is far from the reality, though. A lot of the time data comes in raw format. The next two recipes will cover how to attach a schema to raw data. 

In an ideal world, data is stored in schema-aware formats, such as Parquet and JSON. This is far from the reality, though. A lot of the time, data comes in raw format. The next two recipes will cover how to attach a schema to raw data. Case classes are special classes in Scala that provide you with the boilerplate implementation of the constructor, getters (accessors), equals, and hashCode to implement Serializable. Case classes work really well to encapsulate data as objects. Readers familiar with Java, can relate it to plain old Java objects (POJOs) or Java beans.

The beauty of case classes is that all that grunt work, which is required in Java, can be done with the case classes in a single line of code. Spark uses the reflection feature of the Java programming language on case classes to infer schema.

Scala is a Java virtual machine based language. What that means is that Scala code compiles to byte-code. This is the reason Spark, which is written in Scala, can seamlessly leverage Java features such as reflection.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset