In schema-aware formats, such as Parquet and JSON. This is far from the reality, though. A lot of the time data comes in raw format. The next two recipes will cover how to attach a schema to raw data.
In an ideal world, data is stored in schema-aware formats, such as Parquet and JSON. This is far from the reality, though. A lot of the time, data comes in raw format. The next two recipes will cover how to attach a schema to raw data. Case classes are special classes in Scala that provide you with the boilerplate implementation of the constructor, getters (accessors), equals, and hashCode to implement Serializable. Case classes work really well to encapsulate data as objects. Readers familiar with Java, can relate it to plain old Java objects (POJOs) or Java beans.
The beauty of case classes is that all that grunt work, which is required in Java, can be done with the case classes in a single line of code. Spark uses the reflection feature of the Java programming language on case classes to infer schema.