Introduction to Apache Flink

Flink is an open source framework for distributed stream processing, and has the following features:

It provides results that are accurate, even in the case of out-of-order or late-arriving data
It is stateful and fault tolerant, and can seamlessly recover from failures while maintaining an exactly-once application state
It performs at a large scale, running on thousands of nodes with very good throughput and latency characteristics

The following is a screenshot from the official documentation that shows how Apache Flink can be used:

Another way of viewing the Apache Flink framework is shown in the following screenshot:

All Flink programs are executed lazily, when the program’s main method is executed, the data loading and transformations do not happen directly. Rather, each operation is created and added to the program’s plan. The operations are actually executed when the execution is explicitly triggered by an execute() call on the execution environment. Whether the program is executed locally or on a cluster depends on the type of execution environment The lazy evaluation lets you construct sophisticated programs that Flink executes as one holistically planned unit.

Flink programs look like regular programs that transform collections of data. Each program consists of the same basic parts:

Obtain an execution environment
Load the initial data
Specify transformations, aggregations, joins on this data
Specify where to put the results of your computations
Trigger the program execution

Table of Contents for Introduction to Apache Flink

Create new playlist

Sign In

Sign Up

Table of Contents for
Introduction to Apache Flink