Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Maintaining the state in between batches

In microbatch-based streaming systems, such as Spark, sometimes the state needs to be maintained and/or updated in between batches. In the current Spark implementation, there are various ways to do this, for example, windowing and updateStateByKey. In all these ways, essentially, a join operation is done on batches to maintain the state. This can get very expensive if the window length is long. Another option is to maintain the state in a database. There are in-memory databases, such as MemSQL, but they come with the cost of maintaining the overhead of another database system.

Structured Streaming has rewritten state management to maintain this running intermediate state in the memory, backed by write ahead logs (WAL) in the file system for fault-tolerance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Maintaining the state in between batches

Create new playlist

Sign In

Sign Up

Table of Contents for
Maintaining the state in between batches