Apache Spark 3.0 第一个稳定版发布，终于可以在生产环境中使用啦！

Apache Spark 3.0.0 正式版是2020年6月18日发布的，其为我们带来大量新功能，很多功能加快了数据的计算速度。但是遗憾的是，这个版本并非稳定版。不过就在昨天，Apache Spark 3.0.1 版本悄悄发布了（好像没看到邮件通知）！值得大家高兴的是，这个版本是稳定版，官方推荐所有 3.0 的用户升级到这个版本。Apache Spark 3.0 增加了很多

w397090770 4年前 (2020-09-10) 1292℃ 0评论0喜欢

Spark on Yarn: 你设置的内存都去哪里了？

Efficient processing of big data, especially with Spark, is really all about how much memory one can afford, or how efficient use one can make of the limited amount of available memory. Efficient memory utilization, however, is not what one can take for granted with default configuration shipped with Spark and Yarn. Rather, it takes very careful provisioning and tuning to get as much as possible from the bare metal. In this post I’ll

w397090770 4年前 (2020-09-09) 982℃ 0评论0喜欢

Apache Spark SQL 参数介绍

我们可以在初始化 SparkSession 的时候进行一些设置：[code lang="scala"]import org.apache.spark.sql.SparkSessionval spark: SparkSession = SparkSession.builder .master("local[*]") .appName("My Spark Application") .config("spark.sql.warehouse.dir", "c:/Temp") (1) .getOrCreateSets spark.sql.warehouse.dir for the Spark SQL session[/code]也可以使用 SQL SET

w397090770 4年前 (2020-09-09) 3371℃ 0评论2喜欢

在 Delta Lake 中启用 Spark SQL DDL 和 DML

Delta Lake 0.7.0 是随着 Apache Spark 3.0 版本发布之后发布的，这个版本比较重要的特性就是支持使用 SQL 来操作 Delta 表，包括 DDL 和 DML 操作。本文将详细介绍如何使用 SQL 来操作 Delta Lake 表，关于 Delta Lake 0.7.0 版本的详细 Release Note 可以参见这里。使用 SQL 在 Hive Metastore 中创建表Delta Lake 0.7.0 支持在 Hive Metastore 中定义 Delta 表，而且这

w397090770 4年前 (2020-09-06) 1180℃ 0评论0喜欢

Apache Spark 3.0 第一个稳定版发布，终于可以在生产环境中使用啦！

Spark on Yarn: 你设置的内存都去哪里了？

Apache Spark SQL 参数介绍

在 Delta Lake 中启用 Spark SQL DDL 和 DML

Learning Spark, 2nd Edition 可以免费下载了

Delta Lake 第一篇论文发布了

Presto on Spark：支持即时查询和批处理

Apache Spark 自定义优化规则：Custom Optimizer Rule

Apache Spark 自定义优化规则：Custom Strategy