Data processing: Apache Flink 1.14 combines batch and stream processing


Apache Flink was released in version 1.14. The data flow processing framework is dedicated, among other things, to the standardization of batch and flow processing through new checkpoint options. In addition, it says goodbye to the Legacy SQL Engine, the successor of which is the Blink SQL Engine. The Flink team introduced it over two years ago as a faster and better equipped alternative. In total, the work of over 200 developers who have closed over 1,000 issues is reflected in the new version.

Apache Flink offers both sequential processing of data as batch processing and real-time stream processing (event stream processing). In other words, these approaches can be described as limited streams with a defined start and end point or unlimited streams with a start point but no end point, as can be seen in the Apache documentation. Flink.

Version 1.14 brings additional possibilities to combine the two data processing options. The new version allows you to define checkpoints in applications that are partly still running and partly completed. Originally, this could only be implemented for running applications. Additionally, limited flows are now given a final checkpoint when they have reached their end. Previously, when Flink processed limited data streams as streams instead of batches, setting checkpoints would stop at the end of processing as soon as certain tasks were completed.

Apache Flink 1.14 combines the processing of limited and unlimited data streams.

(Image: Apache)

In order to activate the checkpoints after completing the tasks as well as the final checkpoint, developers must manually execution.checkpointing.checkpoints-after-tasks-finish.enabled: true add to configuration. The new functionality is expected to become the norm in the future.

Apache Flink 1.14 adds function chaining for the Python DataStream API, which should provide better performance. A new loopback mode to simplify debugging of Python functions is also available in PyFlink, as Python functions usually run in a separate process alongside the Flink JVM. With loopback mode, which is enabled by default in local deployments, user-defined Python functions can be executed in the client’s Python process.

In addition, version 1.14 not only removes the obsolete SQL engine, but also the support for Apache Mesos, as this received too little interest from Flink users. The new version can still be run on the Cluster Manager, but only with the help of other projects such as the Marathon Container Orchestration Platform.

Further information on Apache Flink 1.14 provides a blog post in the form of release notes.


Source of the article

Disclaimer: This article is generated from the feed and is not edited by our team.

Leave A Reply

Your email address will not be published.