Maintaining the state in between batches

In microbatch-based streaming systems, such as Spark, sometimes the state needs to be maintained and/or updated in between batches. In the current Spark implementation, there are various ways to do this, for example, windowing and updateStateByKey. In all these ways, essentially, a join operation is done on batches to maintain the state. This can get very expensive if the window length is long. Another option is to maintain the state in a database. There are in-memory databases, such as MemSQL, but they come with the cost of maintaining the overhead of another database system. 

Structured Streaming has rewritten state management to maintain this running intermediate state in the memory, backed by write ahead logs (WAL) in the file system for fault-tolerance. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset