File-based

File based sources can be read using APIs which are listed as follows:

  • readTextFile(path)/TextInputFormat: Reads files line wise and returns them as strings.
  • readTextFileWithValue(path)/TextValueInputFormat: Reads files line wise and returns them as StringValues. StringValues are mutable strings.
  • readCsvFile(path)/CsvInputFormat: Parses files of comma (or another char) delimited fields. Returns a DataSet of tuples, case class objects, or POJOs. Supports the basic Java types and their Value counterparts as field types.
  • readFileOfPrimitives(path, delimiter)/PrimitiveInputFormat: Parses files of new-line (or another char sequence) delimited primitive data types such as String or Integer using the given delimiter.
  • readHadoopFile(FileInputFormat, Key, Value, path)/FileInputFormat: Creates a JobConf and reads file from the specified path with the specified FileInputFormat, Key class and Value class and returns them as Tuple2<Key, Value>.
  • readSequenceFile(Key, Value, path)/SequenceFileInputFormat: Creates a JobConf and reads file from the specified path with type SequenceFileInputFormat, Key class and Value class and returns them as Tuple2<Key, Value>.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset