Hive函数:SUM,AVG,MIN,MAX

摘要:
这个问题最常见的原因是:max_ rsrp数据类型是string而不是double,这会导致错误。在再处理过程中,还再次演示了大数据领域中sum、avg、max和min的功能和用法,因此参考文章可用。

转自:http://lxw1234.com/archives/2015/04/176.htm,Hive分析窗口函数(一) SUM,AVG,MIN,MAX

之前看到大数据田地有关于max()over(partition by)的用法,今天恰好工作中用到了它,但是使用中遇到了一个问题:在max(rsrp)over(partition by buildingid,height) as max_rsrp返回的结果不是分组中的最大值。最中找到了问题的原因:max_rsrp数据类型为string而不是double类型,导致的一个bug问题。

再处理的过程中也再次把大数据田地的中关于sum,avg,max,min的函数用法做了demo,因此有了该参考后的文章。

数据准备:

echo ''>data_file.txt
vim data_file.txt 
cookie1,2015-04-10,1
cookie1,2015-04-11,5
cookie1,2015-04-12,7
cookie1,2015-04-13,3
cookie1,2015-04-14,2
cookie1,2015-04-15,4
cookie1,2015-04-16,4
cookie2,2015-04-10,6
cookie2,2015-04-11,5
cookie2,2015-04-12,7
cookie2,2015-04-13,4
cookie2,2015-04-14,3
cookie2,2015-04-15,5
cookie2,2015-04-16,5
hadoop fs -rm -r /user/jrf/test_data
hadoop fs -mkdir /user/jrf/test_data
hadoop fs -copyFromLocal data_file.txt /user/jrf/test_data/
drop table if exists test_data;
create EXTERNAL TABLE test_data ( cookieid string, createtime string, --day pv INT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as textfile location '/user/jrf/test_data/';
select * from test_data; +---------------------+-----------------------+---------------+--+ | test_data.cookieid | test_data.createtime | test_data.pv | +---------------------+-----------------------+---------------+--+ | cookie1 | 2015-04-10 | 1 | | cookie1 | 2015-04-11 | 5 | | cookie1 | 2015-04-12 | 7 | | cookie1 | 2015-04-13 | 3 | | cookie1 | 2015-04-14 | 2 | | cookie1 | 2015-04-15 | 4 | | cookie1 | 2015-04-16 | 4 | | cookie2 | 2015-04-10 | 6 | | cookie2 | 2015-04-11 | 5 | | cookie2 | 2015-04-12 | 7 | | cookie2 | 2015-04-13 | 4 | | cookie2 | 2015-04-14 | 3 | | cookie2 | 2015-04-15 | 5 | | cookie2 | 2015-04-16 | 5 | +---------------------+-----------------------+---------------+--+

SUM — 注意,结果和ORDER BY相关,默认为升序

SELECT cookieid,createtime,pv,
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 
SUM(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
SUM(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行  
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid  | createtime  | pv  | pv1  | pv2  | pv3  | pv4  | pv5  | pv6  |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1   | 2015-04-10  | 1   | 1    | 1    | 26   | 1    | 6    | 26   |
| cookie1   | 2015-04-11  | 5   | 6    | 6    | 26   | 6    | 13   | 25   |
| cookie1   | 2015-04-12  | 7   | 13   | 13   | 26   | 13   | 16   | 20   |
| cookie1   | 2015-04-13  | 3   | 16   | 16   | 26   | 16   | 18   | 13   |
| cookie1   | 2015-04-14  | 2   | 18   | 18   | 26   | 17   | 21   | 10   |
| cookie1   | 2015-04-15  | 4   | 22   | 22   | 26   | 16   | 20   | 8    |
| cookie1   | 2015-04-16  | 4   | 26   | 26   | 26   | 13   | 13   | 4    |
| cookie2   | 2015-04-10  | 6   | 6    | 6    | 35   | 6    | 11   | 35   |
| cookie2   | 2015-04-11  | 5   | 11   | 11   | 35   | 11   | 18   | 29   |
| cookie2   | 2015-04-12  | 7   | 18   | 18   | 35   | 18   | 22   | 24   |
| cookie2   | 2015-04-13  | 4   | 22   | 22   | 35   | 22   | 25   | 17   |
| cookie2   | 2015-04-14  | 3   | 25   | 25   | 35   | 19   | 24   | 13   |
| cookie2   | 2015-04-15  | 5   | 30   | 30   | 35   | 19   | 24   | 10   |
| cookie2   | 2015-04-16  | 5   | 35   | 35   | 35   | 17   | 17   | 5    |
+-----------+-------------+-----+------+------+------+------+------+------+--+
pv1: 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
pv2: 同pv1
pv3: 分组内(cookie1)所有的pv累加
pv4: 分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
pv5: 分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
pv6: 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10

如果不指定ROWS BETWEEN,默认为从起点到当前行;
如果不指定ORDER BY,则将分组内所有值累加;
关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点,UNBOUNDED PRECEDING 表示从前面的起点, UNBOUNDED FOLLOWING:表示到后面的终点

–其他AVG,MIN,MAX,和SUM用法一样。

--AVG
SELECT cookieid,createtime,pv,
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 
AVG(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
AVG(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行  
FROM test_data  order by cookieid,createtime;
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
| cookieid  | createtime  | pv  |         pv1         |         pv2         |         pv3         |        pv4         |        pv5         |         pv6         |
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
| cookie1   | 2015-04-10  | 1   | 1.0                 | 1.0                 | 3.7142857142857144  | 1.0                | 3.0                | 3.7142857142857144  |
| cookie1   | 2015-04-11  | 5   | 3.0                 | 3.0                 | 3.7142857142857144  | 3.0                | 4.333333333333333  | 4.166666666666667   |
| cookie1   | 2015-04-12  | 7   | 4.333333333333333   | 4.333333333333333   | 3.7142857142857144  | 4.333333333333333  | 4.0                | 4.0                 |
| cookie1   | 2015-04-13  | 3   | 4.0                 | 4.0                 | 3.7142857142857144  | 4.0                | 3.6                | 3.25                |
| cookie1   | 2015-04-14  | 2   | 3.6                 | 3.6                 | 3.7142857142857144  | 4.25               | 4.2                | 3.3333333333333335  |
| cookie1   | 2015-04-15  | 4   | 3.6666666666666665  | 3.6666666666666665  | 3.7142857142857144  | 4.0                | 4.0                | 4.0                 |
| cookie1   | 2015-04-16  | 4   | 3.7142857142857144  | 3.7142857142857144  | 3.7142857142857144  | 3.25               | 3.25               | 4.0                 |
| cookie2   | 2015-04-10  | 6   | 6.0                 | 6.0                 | 5.0                 | 6.0                | 5.5                | 5.0                 |
| cookie2   | 2015-04-11  | 5   | 5.5                 | 5.5                 | 5.0                 | 5.5                | 6.0                | 4.833333333333333   |
| cookie2   | 2015-04-12  | 7   | 6.0                 | 6.0                 | 5.0                 | 6.0                | 5.5                | 4.8                 |
| cookie2   | 2015-04-13  | 4   | 5.5                 | 5.5                 | 5.0                 | 5.5                | 5.0                | 4.25                |
| cookie2   | 2015-04-14  | 3   | 5.0                 | 5.0                 | 5.0                 | 4.75               | 4.8                | 4.333333333333333   |
| cookie2   | 2015-04-15  | 5   | 5.0                 | 5.0                 | 5.0                 | 4.75               | 4.8                | 5.0                 |
| cookie2   | 2015-04-16  | 5   | 5.0                 | 5.0                 | 5.0                 | 4.25               | 4.25               | 5.0                 |
+-----------+-------------+-----+---------------------+---------------------+---------------------+--------------------+--------------------+---------------------+--+
--MIN
SELECT cookieid,createtime,pv,
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2,--从起点到当前行,结果同pv1 
MIN(pv) OVER(PARTITION BY cookieid) AS pv3,--分组内所有行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4,--当前行+往前3行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5,--当前行+往前3行+往后1行
MIN(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行  
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid  | createtime  | pv  | pv1  | pv2  | pv3  | pv4  | pv5  | pv6  |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1   | 2015-04-10  | 1   | 1    | 1    | 1    | 1    | 1    | 1    |
| cookie1   | 2015-04-11  | 5   | 1    | 1    | 1    | 1    | 1    | 2    |
| cookie1   | 2015-04-12  | 7   | 1    | 1    | 1    | 1    | 1    | 2    |
| cookie1   | 2015-04-13  | 3   | 1    | 1    | 1    | 1    | 1    | 2    |
| cookie1   | 2015-04-14  | 2   | 1    | 1    | 1    | 2    | 2    | 2    |
| cookie1   | 2015-04-15  | 4   | 1    | 1    | 1    | 2    | 2    | 4    |
| cookie1   | 2015-04-16  | 4   | 1    | 1    | 1    | 2    | 2    | 4    |
| cookie2   | 2015-04-10  | 6   | 6    | 6    | 3    | 6    | 5    | 3    |
| cookie2   | 2015-04-11  | 5   | 5    | 5    | 3    | 5    | 5    | 3    |
| cookie2   | 2015-04-12  | 7   | 5    | 5    | 3    | 5    | 4    | 3    |
| cookie2   | 2015-04-13  | 4   | 4    | 4    | 3    | 4    | 3    | 3    |
| cookie2   | 2015-04-14  | 3   | 3    | 3    | 3    | 3    | 3    | 3    |
| cookie2   | 2015-04-15  | 5   | 3    | 3    | 3    | 3    | 3    | 5    |
| cookie2   | 2015-04-16  | 5   | 3    | 3    | 3    | 3    | 3    | 5    |
+-----------+-------------+-----+------+------+------+------+------+------+--+
--MAX
SELECT cookieid,createtime,pv,
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime) AS pv1, -- 默认为从起点到当前行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS pv2, --从起点到当前行,结果同pv1 
MAX(pv) OVER(PARTITION BY cookieid) AS pv3, --分组内所有行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) AS pv4, --当前行+往前3行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND 1 FOLLOWING) AS pv5, --当前行+往前3行+往后1行
MAX(pv) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS pv6 --当前行+往后所有行  
FROM test_data order by cookieid,createtime;
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookieid  | createtime  | pv  | pv1  | pv2  | pv3  | pv4  | pv5  | pv6  |
+-----------+-------------+-----+------+------+------+------+------+------+--+
| cookie1   | 2015-04-10  | 1   | 1    | 1    | 7    | 1    | 5    | 7    |
| cookie1   | 2015-04-11  | 5   | 5    | 5    | 7    | 5    | 7    | 7    |
| cookie1   | 2015-04-12  | 7   | 7    | 7    | 7    | 7    | 7    | 7    |
| cookie1   | 2015-04-13  | 3   | 7    | 7    | 7    | 7    | 7    | 4    |
| cookie1   | 2015-04-14  | 2   | 7    | 7    | 7    | 7    | 7    | 4    |
| cookie1   | 2015-04-15  | 4   | 7    | 7    | 7    | 7    | 7    | 4    |
| cookie1   | 2015-04-16  | 4   | 7    | 7    | 7    | 4    | 4    | 4    |
| cookie2   | 2015-04-10  | 6   | 6    | 6    | 7    | 6    | 6    | 7    |
| cookie2   | 2015-04-11  | 5   | 6    | 6    | 7    | 6    | 7    | 7    |
| cookie2   | 2015-04-12  | 7   | 7    | 7    | 7    | 7    | 7    | 7    |
| cookie2   | 2015-04-13  | 4   | 7    | 7    | 7    | 7    | 7    | 5    |
| cookie2   | 2015-04-14  | 3   | 7    | 7    | 7    | 7    | 7    | 5    |
| cookie2   | 2015-04-15  | 5   | 7    | 7    | 7    | 7    | 7    | 5    |
| cookie2   | 2015-04-16  | 5   | 7    | 7    | 7    | 5    | 5    | 5    |
+-----------+-------------+-----+------+------+------+------+------+------+--+


SELECT cookieid,
createtime,
pv,
min(pv) OVER(PARTITION BY cookieid) AS min_pv,
max(pv) OVER(PARTITION BY cookieid) AS max_pv
FROM test_data;
+-----------+-------------+-----+---------+---------+--+
| cookieid  | createtime  | pv  | min_pv  | max_pv  |
+-----------+-------------+-----+---------+---------+--+
| cookie1   | 2015-04-10  | 1   | 1       | 7       |
| cookie1   | 2015-04-16  | 4   | 1       | 7       |
| cookie1   | 2015-04-15  | 4   | 1       | 7       |
| cookie1   | 2015-04-14  | 2   | 1       | 7       |
| cookie1   | 2015-04-13  | 3   | 1       | 7       |
| cookie1   | 2015-04-12  | 7   | 1       | 7       |
| cookie1   | 2015-04-11  | 5   | 1       | 7       |
| cookie2   | 2015-04-16  | 5   | 3       | 7       |
| cookie2   | 2015-04-15  | 5   | 3       | 7       |
| cookie2   | 2015-04-14  | 3   | 3       | 7       |
| cookie2   | 2015-04-13  | 4   | 3       | 7       |
| cookie2   | 2015-04-12  | 7   | 3       | 7       |
| cookie2   | 2015-04-11  | 5   | 3       | 7       |
| cookie2   | 2015-04-10  | 6   | 3       | 7       |
+-----------+-------------+-----+---------+---------+--+

免责声明:文章转载自《Hive函数:SUM,AVG,MIN,MAX》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇Keepalived+Nginx提供前端负载均衡+主从双机热备+自动切换Window Services的调试和非托管dll的引用及其他一些注意问题下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

机器学习--用朴素贝叶斯分类法辨别男女声音

和前面介绍到的kNN,决策树一样,贝叶斯分类法也是机器学习中常用的分类方法。贝叶斯分类法主要以概率论中贝叶斯定理为分类依据,具有很广泛的应用。本文通过一个完整的例子,来介绍如何用朴素贝叶斯分类法实现分类。主要内容有下:     1、条件概率与贝叶斯定理介绍     2、数据集选择及处理     3、朴素贝叶斯分类器实现     4、测试分类效果     5...

吴裕雄 python 机器学习——数据预处理字典学习模型

from sklearn.decomposition import DictionaryLearning #数据预处理字典学习DictionaryLearning模型 def test_DictionaryLearning(): X=[[1,2,3,4,5], [6,7,8,9,10], [10,9,8,7,6,],...

Pytest自动化测试

Allure除了具有Pytest基本状态外,其他几乎所有功能也都支持。 1、严重性 如果你想对测试用例进行严重等级划分,可以使用 @allure.severity 装饰器,它可以应用于函数,方法或整个类。 它以 allure.severity_level 枚举值作为参数,分别为:BLOCKER(中断),CRITICAL(严重),NORMAL(常规),...

解决 Visual Studio For Mac 还原包失败问题

体验了一把改名部最新的杰作,总体感觉挺好,也能看出微软在跨平台这方面所做出的努力。 可能是预览版的缘故,还是遇到一个比较大的问题,创建netcore项目后,依赖包还原失败,错误信息如下: 可以先试着手动还原一下(说不定就成功了呢,反正我没成功) dotnet restore 如果不成功,那就按照我的做法一步一步来。 1.查看 dotnetcore sdk...

scala之 spark连接SQL和HIVE/IDEA操作HDFS

一、连接SQL 方法一、 package com.njbdqn.linkSql import java.util.Properties import org.apache.spark.sql.SparkSession import org.apache.spark.sql._ object LinkSql { def main(args: Arr...

hive函数之~字符串函数

1、字符串长度函数:length 语法: length(string A)返回值: int说明:返回字符串A的长度 hive> selectlength('abcedfg') fromtableName; 7 2、字符串反转函数:reverse 语法: reverse(string A)返回值: string说明:返回字符串A的反转结果 h...