File based sources can be read using APIs which are listed as follows:
- readTextFile(path)/TextInputFormat: Reads files line wise and returns them as strings.
- readTextFileWithValue(path)/TextValueInputFormat: Reads files line wise and returns them as StringValues. StringValues are mutable strings.
- readCsvFile(path)/CsvInputFormat: Parses files of comma (or another char) delimited fields. Returns a DataSet of tuples, case class objects, or POJOs. Supports the basic Java types and their Value counterparts as field types.
- readFileOfPrimitives(path, delimiter)/PrimitiveInputFormat: Parses files of new-line (or another char sequence) delimited primitive data types such as String or Integer using the given delimiter.
- readHadoopFile(FileInputFormat, Key, Value, path)/FileInputFormat: Creates a JobConf and reads file from the specified path with the specified FileInputFormat, Key class and Value class and returns them as Tuple2<Key, Value>.
- readSequenceFile(Key, Value, path)/SequenceFileInputFormat: Creates a JobConf and reads file from the specified path with type SequenceFileInputFormat, Key class and Value class and returns them as Tuple2<Key, Value>.