CarbonData 时间序列 DataMap

时间序列 DataMap 介绍 (Alpha feature in 1.3.0)

Timeseries DataMap 是基于 'pre-aggregate' DataMap 实现的预聚合表。区别在于时间序列 DataMap 内置了对时间层次(time hierarchy)和级别的理解:年、月、日、时、分,以便它支持将查询中时间维度进行自动 roll-up。

数据加载、查询、压缩命令以及行为和预聚合 DataMap 一致,请参考 预聚合 DataMap 了解更多的信息。

要使用这个 datamap,用户可以在主表上创建多个时间序列 datamap,其中主表上有一个 event_time 的列,一个 datamap 对应一个时间粒度。然后 Carbondata 可以为主表上的查询进行自动汇总(automatic roll-up)。

例如,以下语句在名为 timeseries 的主表上有效地创建了多个预聚合表

CREATE DATAMAP agg_year
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
  'event_time'='order_time',
  'year_granularity'='1',
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
 avg(price) FROM sales GROUP BY order_time, country, sex

CREATE DATAMAP agg_month
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
  'event_time'='order_time',
  'month_granularity'='1',
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
 avg(price) FROM sales GROUP BY order_time, country, sex

CREATE DATAMAP agg_day
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
  'event_time'='order_time',
  'day_granularity'='1',
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
 avg(price) FROM sales GROUP BY order_time, country, sex

CREATE DATAMAP agg_sales_hour
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
  'event_time'='order_time',
  'hour_granularity'='1',
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
 avg(price) FROM sales GROUP BY order_time, country, sex

CREATE DATAMAP agg_minute
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
  'event_time'='order_time',
  'minute_granularity'='1',
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
 avg(price) FROM sales GROUP BY order_time, country, sex

CREATE DATAMAP agg_minute
ON TABLE sales
USING "timeseries"
DMPROPERTIES (
  'event_time'='order_time',
  'minute_granularity'='1',
) AS
SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
 avg(price) FROM sales GROUP BY order_time, country, sex

为了查询时间序列数据,Carbondata 内置支持时间相关的 UDF,从而能够自动滚动到所需的聚合级别

timeseries(timeseries column name, 'aggregation level')
SELECT timeseries(order_time, 'hour'), sum(quantity) FROM sales GROUP BY timeseries(order_time,
'hour')

除非查询需要,否则不必为每个粒度创建预聚合表。 Carbondata 可以汇总数据并获取它。

比如: 对于主表 sales , 如果以下时间序列 datamaps 是日级和小时级的预先聚合表

  CREATE DATAMAP agg_day
  ON TABLE sales
  USING "timeseries"
  DMPROPERTIES (
    'event_time'='order_time',
    'day_granularity'='1',
  ) AS
  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
   avg(price) FROM sales GROUP BY order_time, country, sex

  CREATE DATAMAP agg_sales_hour
  ON TABLE sales
  USING "timeseries"
  DMPROPERTIES (
    'event_time'='order_time',
    'hour_granularity'='1',
  ) AS
  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
   avg(price) FROM sales GROUP BY order_time, country, sex

下面这样的查询将被汇总并命中时间序列 datamaps

Select timeseries(order_time, 'month'), sum(quantity) from sales group by timeseries(order_time,
  'month')

Select timeseries(order_time, 'year'), sum(quantity) from sales group by timeseries(order_time,
  'year')

注意 (限制):

  • 仅支持层次级别为 1。 未来的 CarbonData 将支持其他层级。
  • 需要一个接着一个地创建所需级别的时间序列 datamap
  • 每个级别时间序列 datamap 需要分别进行删除
  • 压缩时间序列 datamp

    请参见预聚合 datamap 的压缩章节。同样适用于时间序列 datamap。

    时间序列 datamap 上的数据管理

    请参见预聚合 datamap 的数据管理章节。同样适用于时间序列 datamap。