Beam是一个开源的数据处理编程库,由Google贡献给Apache的项目,前不久刚刚成为Apache TLP项目。它提供了一个高级的、统一的编程模型,允许我们通过构建Pipeline的方式实现批量、流数据处理,并且构建好的Pipeline能够运行在底层不同的执行引擎上。主要目标是统一批处理和流处理的编程范式,为无限,乱序,web-scale的数据集处理提供简单灵活,功能丰富以及表达能力十分强大的SDK。Apache Beam 希望基于 Beam 开发的数据处理程序可以执行在任意的分布式计算引擎上。
<dependency> <groupId>org.apache.beam</groupId> <artifactId>beam-sdks-java-core</artifactId> <version>0.5.0</version> </dependency> <dependency> <groupId>org.apache.beam</groupId> <artifactId>beam-runners-direct-java</artifactId> <version>0.5.0</version> <scope>runtime</scope> </dependency>
The Apache Beam community is pleased to announce the availability of the
0.5.0 release.Apache Beam is a unified programming model for both batch and streaming
data processing, enabling efficient execution across diverse distributed
execution engines and providing extensibility points for connecting to
different technologies and user communities.This release adds support for stateful pipelines via the new State API, and
timers via the new Timer API. Additionally, the release adds new IO
connectors for Elasticsearch and MQ Telemetry Transport (MQTT), along with
a usual batch of bug fixes and improvements. For all major changes in this
release, please refer to the release notes [2].The 0.5.0 release is now the recommended version; we encourage everyone to
upgrade from any earlier releases.We thank all users and contributors who have helped make this release
possible. If you haven't already, we'd like to invite you to join us, as we
work towards our first release with API stability.- Davor Bonaci, on behalf of the Apache Beam community.
[1] https://beam.apache.org/get-started/downloads/
[2]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12319527&version=12338859