07-06 04:13 阅读 67

hive 与MySQL 的差别

总结

1. Hive数据表分区、分桶的作用分区表产生不同的目录：避免全表扫描
分桶表产生不同的文件： jion 速度快和桶抽样

2. Hive常用的3复合数据类型及访问方式
select * from emp_partition
inner join salaries
on
salaries.emp_no is not null and
emp_partition.emp_no is not null and
emp_partition.emp_no = salaries.emp_no
select * from emp_partition --emp_partition 表小放到左侧
inner join salaries
on
salaries.emp_no is not null and
emp_partition.emp_no is not null and
emp_partition.emp_no = salaries.emp_no
array : 列名[索引_从0开始]
map : 列名["key名"]
结构体：列名.子列名

3. Hive 对部分子查询支持不完善
if / case when
子查询对父查询进行引用时只能在Where 子句中进行引用

4. 抽样查询的几中方法
随机： rand()
块抽样： tablesample(n percen),tablesample(nM),tablesample(n rows)
桶抽样: tablesample(bucket x out of y) :y是桶的倍数或者因

5. 常用Hive 语句调优方法

distinct
group by
join
扩展explode 和 lateral 虚拟表
explode 对复合类型数据列变行
lateral 实现与其他列一同显示
lateral view explode(要炸裂的字段) 虚拟表名字 as 炸裂开的字段的名字;
select explode(emp_name) from emp;
select explode(emp_date) from emp;
select explode(other_info) from emp;
--查找名字中含有小写字母w的（任何一个名字都可以）
--方法一：只能检验两个名字
select * from emp where emp_name[0] like ‘%w%‘ or emp_name[1] like ‘%w%‘ limit
10
--方法二：不受名字个数的限制
select distinct userid from(
select * from emp
lateral view explode(emp_name) subtable as a
) as sub
where a like ‘%w%‘;
select * from emp
lateral view explode(emp_name) nameTable as namefeild;
select * from emp
lateral view explode(emp_date) dateTable as feild_key,feild_value;
--结构题里面的元素个数是固定的（定义表时指定好了），所以它不需要要炸裂函数，把每个元素作为普通列
使用即可

原文：https://www.cnblogs.com/zhang-dan/p/15141923.html