Introduction to Apache Flink

Flink is an open source framework for distributed stream processing, and has the following features:

  • It provides results that are accurate, even in the case of out-of-order or late-arriving data
  • It is stateful and fault tolerant, and can seamlessly recover from failures while maintaining an exactly-once application state
  • It performs at a large scale, running on thousands of nodes with very good throughput and latency characteristics

The following is a screenshot from the official documentation that shows how Apache Flink can be used:

Another way of viewing the Apache Flink framework is shown in the following screenshot:

All Flink programs are executed lazily, when the program’s main method is executed, the data loading and transformations do not happen directly. Rather, each operation is created and added to the program’s plan. The operations are actually executed when the execution is explicitly triggered by an execute() call on the execution environment. Whether the program is executed locally or on a cluster depends on the type of execution environment The lazy evaluation lets you construct sophisticated programs that Flink executes as one holistically planned unit.

Flink programs look like regular programs that transform collections of data. Each program consists of the same basic parts:

  1. Obtain an execution environment
  2. Load the initial data
  3. Specify transformations, aggregations, joins on this data
  4. Specify where to put the results of your computations
  5. Trigger the program execution
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset