Event time and watermarks

The Flink Streaming API draws its inspiration from the Google Dataflow model. This API supports different concepts of time. The following are the three common places where you can capture time in a streaming environment:

  • Event timeEvent time is the time when the event occurred on its producing device. For example, in an IoT project, it can be the time at which the sensor captures a reading. Generally, these event times need to embed in the record before they enter Flink. During time processing, these timestamps are extracted and considered for windowing. Event time processing can be used for out-of-order events.
  • Processing timeProcessing time is the machine time for executing the stream of data processing. Processing time windowing considers only the timestamps where an event gets processed. Processing time is the simplest way of stream processing, as it does not require any synchronization between processing machines and producing machines. In distributed asynchronous environment processing, time does not provide determinism as it is dependent on the speed at which records flow in the system.
  • Ingestion timeIngestion time is the time at which a particular event enters Flink. All time-based operations refer to this timestamp. Ingestion time is a more expensive operation than processing, but gives predictable results. Ingestion time programs cannot handle any out-of-order events as it assigns a timestamp only after the event has entered the Flink system.

The following example shows how to set event time and watermarks. In the cases of ingestion time and processing time, just assign the time characteristics and watermark generation is taken care of automatically. The following is a code snippet for this:

In Java:

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime);
//or
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime);

In Scala:

val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.ProcessingTime)
//or
env.setStreamTimeCharacteristic(TimeCharacteristic.IngestionTime)

In the case of event time stream programs, specify the way to assign watermarks and timestamps. There are two ways of assigning watermarks and timestamps:

  • Directly from a data source attribute
  • Using a timestamp assigner

To work with event time streams, assign the time characteristic as follows:

In Java:

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime;

In Scala:

val env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)

It is always best to store event time while storing the record in source. Flink also supports some predefined timestamp extractors and watermark generators.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset