Hive-json表-处理JSON格式数据

摘要:
源区域代码串、目标区域代码串,src dist代码串、src area代码串、src hq代码串,目标dist代码字符串、目标城市代码串、目的区域代码串和目标hq代码字符串,
add jar /home/jasonapp/json-serde-1.3.7-jar-with-dependencies.jar;

create external table if not exists dm_goddog.student(
student map<string,string> comment "学生信息",
class map<string,string> comment "课程信息",
teacher map<string,string> comment "授课老师信息"
)
comment "学生课程信息"
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
stored as textfile;

 ps: 如果查询时,出现json无法解析的错误,删除文件中错误json文件中哪些行数据。

1.下载json-serde包,下载地址如下:

http://www.congiu.net/hive-json-serde/1.3.7/cdh5/json-serde-1.3.7-jar-with-dependencies.jar

2.测试建表

add jar /home/jasonapp/json-serde-1.3.7-jar-with-dependencies.jar;

*******运单信息基础表*******dwd_ids_test.dwd_order_base_info_dtl_di
create external table if not exists dwd_ids_test.dwd_order_base_info_dtl_di(
  order_id   string,
  order_no   string,
  express_flow_code   string,
  source_zone_code   string,
  dest_zone_code   string,
  src_dist_code   string,
  src_city_code   string,
  src_county   string,
  src_division_code   string,
  src_area_code   string,
  src_hq_code   string,
  src_type_code   string,
  dest_dist_code   string,
  dest_city_code   string,
  dest_county   string,
  dest_division_code   string,
  dest_area_code   string,
  dest_hq_code   string,
  dest_type_code   string,
  src_lgt   string,
  src_lat   string,
  dest_lgt   string,
  dest_lat   string,
  meterage_weight_qty   string,
  real_weight_qty   string,
  quantity   double,
  consignee_emp_code   string,
  consigned_tm   string,
  deliver_emp_code   string,
  signer_name   string,
  signin_tm   string,
  cargo_type_code   string,
  limit_type_code   string,
  distance_type_code   string,
  transport_type_code   string,
  express_type_code   string,
  ackbill_type_code   string,
  volume   string,
  bill_long   string,
  bill_width   string,
  bill_high   string,
  version_no   string,
  suff_flag   string,
  unit_weight   string,
  cons_value   string,
  cons_value_currency_code   string,
  product_code   string,
  airport_code   string,
  order_type   string,
  input_tm   string,
  inputer_emp_code   string,
  input_tm_gmt   string,
  modified_tm   string,
  order_remark   string,
  order_no   string,
  contacts_id   string,
  consignor_comp_name   string,
  consignor_addr   string,
  consignor_phone   string,
  consignor_cont_name   string,
  consignor_mobile   string,
  consignee_comp_name   string,
  consignee_addr   string,
  consignee_phone   string,
  consignee_cont_name   string,
  consignee_mobile   string,
  consignor_addr_native   string,
  consignee_addr_native   string,
  addressee_team_code   string,
  parent_order_no   string,
  order_tag   string,
  all_fee_rmb   double,
  cod_fee_rmb   double,
  freight_rmb   double,
  meterage_weight_qty_kg   double,
  real_weight_qty_kg   double,
  addresseeaoicode   string,
  addresseeaoitype   string,
  service_prod_code   array<string>,
  freight_payment_type_code   string,
  freight_monthly_acct_code   string,
  inc_day   string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
stored as textfile;


desc formatted dwd_ids_test.dwd_order_base_info_dtl_di;

dfs -ls hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_order_base_info_dtl_di;
dfs -rm -f hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_order_base_info_dtl_di/*;

hdfs dfs -put /home/jasonapp/dwd_order_base_info_dtl_di*.json hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_order_base_info_dtl_di;

select * from dwd_ids_test.dwd_order_base_info_dtl_di limit 1;

select * from dwd_ids_test.dwd_order_base_info_dtl_di where inc_day='20200103' limit 1;

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.fetch.task.conversion=more;
set hive.exec.parallel=true;
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;
set mapreduce.output.fileoutputformat.compress.type=BLOCK;

DROP TABLE IF EXISTS dwd_ids_test.dwd_order_base_info_dtl_di_parquet;
CREATE EXTERNAL TABLE IF NOT EXISTS dwd_ids_test.dwd_order_base_info_dtl_di_parquet(
  order_id   string,
  order_no   string,
  express_flow_code   string,
  source_zone_code   string,
  dest_zone_code   string,
  src_dist_code   string,
  src_city_code   string,
  src_county   string,
  src_division_code   string,
  src_area_code   string,
  src_hq_code   string,
  src_type_code   string,
  dest_dist_code   string,
  dest_city_code   string,
  dest_county   string,
  dest_division_code   string,
  dest_area_code   string,
  dest_hq_code   string,
  dest_type_code   string,
  src_lgt   string,
  src_lat   string,
  dest_lgt   string,
  dest_lat   string,
  meterage_weight_qty   string,
  real_weight_qty   string,
  quantity   double,
  consignee_emp_code   string,
  consigned_tm   string,
  deliver_emp_code   string,
  signer_name   string,
  signin_tm   string,
  cargo_type_code   string,
  limit_type_code   string,
  distance_type_code   string,
  transport_type_code   string,
  express_type_code   string,
  ackbill_type_code   string,
  volume   string,
  bill_long   string,
  bill_width   string,
  bill_high   string,
  version_no   string,
  suff_flag   string,
  unit_weight   string,
  cons_value   string,
  cons_value_currency_code   string,
  product_code   string,
  airport_code   string,
  order_type   string,
  input_tm   string,
  inputer_emp_code   string,
  input_tm_gmt   string,
  modified_tm   string,
  order_remark   string,
  order_no   string,
  contacts_id   string,
  consignor_comp_name   string,
  consignor_addr   string,
  consignor_phone   string,
  consignor_cont_name   string,
  consignor_mobile   string,
  consignee_comp_name   string,
  consignee_addr   string,
  consignee_phone   string,
  consignee_cont_name   string,
  consignee_mobile   string,
  consignor_addr_native   string,
  consignee_addr_native   string,
  addressee_team_code   string,
  parent_order_no   string,
  order_tag   string,
  all_fee_rmb   double,
  cod_fee_rmb   double,
  freight_rmb   double,
  meterage_weight_qty_kg   double,
  real_weight_qty_kg   double,
  addresseeaoicode   string,
  addresseeaoitype   string,
  service_prod_code   array<string>,
  freight_payment_type_code   string,
  freight_monthly_acct_code   string
)
PARTITIONED BY (
  inc_day string COMMENT 'inc_day used by partition'
)
STORED AS parquet
TBLPROPERTIES('parquet.compression'='SNAPPY');

INSERT OVERWRITE TABLE dwd_ids_test.dwd_order_base_info_dtl_di_parquet PARTITION (inc_day)
select * from dwd_ids_test.dwd_order_base_info_dtl_di;

dfs -ls  hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_order_base_info_dtl_di_parquet;


ALTER TABLE dwd_ids_test.dwd_order_base_info_dtl_di_parquet PARTITION (inc_day='20200101') RENAME TO PARTITION (inc_day='20201125');
ALTER TABLE dwd_ids_test.dwd_order_base_info_dtl_di_parquet PARTITION (inc_day='20200102') RENAME TO PARTITION (inc_day='20201126');
ALTER TABLE dwd_ids_test.dwd_order_base_info_dtl_di_parquet PARTITION (inc_day='20200103') RENAME TO PARTITION (inc_day='20201127');
ALTER TABLE dwd_ids_test.dwd_order_base_info_dtl_di_parquet PARTITION (inc_day='20200104') RENAME TO PARTITION (inc_day='20201128');
ALTER TABLE dwd_ids_test.dwd_order_base_info_dtl_di_parquet PARTITION (inc_day='20200105') RENAME TO PARTITION (inc_day='20201129');
--------订单宽表------dwd.dwd_pub_order_dtl_di
add jar /home/jasonapp/json-serde-1.3.7-jar-with-dependencies.jar;
drop table if exists dwd_ids_test.dwd_pub_order_dtl_di;
create table if not exists dwd_ids_test.dwd_pub_order_dtl_di(
inner_order_no string,
src_order_no string,
src_sys_type string,
src_sys_name string,
src_subsys_code string,
order_type_code string,
access_code string,
online_chnl_code string,
online_chnl_name string,
pickup_type_code string,
order_status_code string,
pay_type_code string,
order_no array<string>, --array<string>  string
client_code string,
client_name string,
monthly_card_no string,
sender_name string,
sender_tel string,
sender_mobile string,
sender_company string,
sender_city_code string,
sender_province_name string,
sender_city_name string,
sender_area_name string,
sender_address string,
receiver_name string,
receiver_tel string,
receiver_mobile string,
receiver_company string,
receiver_city_code string,
receiver_province_name string,
receiver_city_name string,
receiver_area_name string,
receiver_address string,
cargo_name string,
cargo_quantity double,
declared_value_amt double,
declared_currency_code string,
is_insurance_flag int,
insurance_amt double,
est_weight double,
est_price double,
is_cod_flag int,
cod_card_no string,
cod_amt double,
is_signback_flag int,
is_under_call int,
call_tm string,
appoint_start_tm string,
appoint_finish_tm string,
appoint_pickup_tm string,
latest_pickup_tm string,
appoint_mobile string,
appoint_name string,
appoint_city string,
pickup_emp_code string,
dept_code string,
unitarea_code string,
aoi_code string,
is_cancel_flag int,
cancel_reason string,
cancel_tm string,
order_ip string,
limit_type_code string,
limit_type_name string,
create_tm string,
order_tm string,
modify_tm string,
is_photo_flag int,
member_id string,
load_tm string,
cx_order_entry string,
inc_day string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
stored as textfile;



desc formatted dwd_ids_test.dwd_pub_order_dtl_di;

dfs -ls hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_pub_order_dtl_di;
dfs -rm -f hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_pub_order_dtl_di/*;

hdfs dfs -put /home/jasonapp/dwd_pub_order_dtl_di*.json hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_pub_order_dtl_di;

select * from dwd_ids_test.dwd_pub_order_dtl_di limit 1;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.fetch.task.conversion=more;
set hive.exec.parallel=true;
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;
set mapreduce.output.fileoutputformat.compress.type=BLOCK;

DROP TABLE IF EXISTS dwd_ids_test.dwd_pub_order_dtl_di_parquet;
CREATE EXTERNAL TABLE IF NOT EXISTS dwd_ids_test.dwd_pub_order_dtl_di_parquet(
inner_order_no string,
src_order_no string,
src_sys_type string,
src_sys_name string,
src_subsys_code string,
order_type_code string,
access_code string,
online_chnl_code string,
online_chnl_name string,
pickup_type_code string,
order_status_code string,
pay_type_code string,
order_no array<string>, --array<string>  string
client_code string,
client_name string,
monthly_card_no string,
sender_name string,
sender_tel string,
sender_mobile string,
sender_company string,
sender_city_code string,
sender_province_name string,
sender_city_name string,
sender_area_name string,
sender_address string,
receiver_name string,
receiver_tel string,
receiver_mobile string,
receiver_company string,
receiver_city_code string,
receiver_province_name string,
receiver_city_name string,
receiver_area_name string,
receiver_address string,
cargo_name string,
cargo_quantity double,
declared_value_amt double,
declared_currency_code string,
is_insurance_flag int,
insurance_amt double,
est_weight double,
est_price double,
is_cod_flag int,
cod_card_no string,
cod_amt double,
is_signback_flag int,
is_under_call int,
call_tm string,
appoint_start_tm string,
appoint_finish_tm string,
appoint_pickup_tm string,
latest_pickup_tm string,
appoint_mobile string,
appoint_name string,
appoint_city string,
pickup_emp_code string,
dept_code string,
unitarea_code string,
aoi_code string,
is_cancel_flag int,
cancel_reason string,
cancel_tm string,
order_ip string,
limit_type_code string,
limit_type_name string,
create_tm string,
order_tm string,
modify_tm string,
is_photo_flag int,
member_id string,
load_tm string,
cx_order_entry string
)
PARTITIONED BY (
  inc_day string COMMENT 'inc_day used by partition'
)
STORED AS parquet
TBLPROPERTIES('parquet.compression'='SNAPPY');

INSERT OVERWRITE TABLE dwd_ids_test.dwd_pub_order_dtl_di_parquet PARTITION (inc_day)
select * from dwd_ids_test.dwd_pub_order_dtl_di;

dfs -ls hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_pub_order_dtl_di_parquet;


ALTER TABLE dwd_ids_test.dwd_pub_order_dtl_di_parquet PARTITION (inc_day='20200101') RENAME TO PARTITION (inc_day='20201125');
ALTER TABLE dwd_ids_test.dwd_pub_order_dtl_di_parquet PARTITION (inc_day='20200102') RENAME TO PARTITION (inc_day='20201126');
ALTER TABLE dwd_ids_test.dwd_pub_order_dtl_di_parquet PARTITION (inc_day='20200103') RENAME TO PARTITION (inc_day='20201127');
ALTER TABLE dwd_ids_test.dwd_pub_order_dtl_di_parquet PARTITION (inc_day='20200104') RENAME TO PARTITION (inc_day='20201128');
ALTER TABLE dwd_ids_test.dwd_pub_order_dtl_di_parquet PARTITION (inc_day='20200105') RENAME TO PARTITION (inc_day='20201129');


select * from dwd_ids_test.dwd_pub_order_dtl_di_parquet limit 11;

select * from dwd_ids_test.dwd_pub_order_dtl_di limit 11;
----------日期维表------dim_ids_test.dim_calendar
add jar /home/jasonapp/json-serde-1.3.7-jar-with-dependencies.jar;
drop table if exists dim_ids_test.dim_calendar;
create table if not exists dim_ids_test.dim_calendar(
day_wid string,
day_name string,
day_date_chn string,
day_date string,
day_name_of_week string,
day_of_week string,
day_of_month string,
day_of_year string,
week_wid string,
week_name string,
week_no string,
week_start_date_wid string,
week_start_date string,
week_end_date_wid string,
week_end_date string,
month_wid string,
month_name string,
month_no string,
month_days string,
month_start_date_wid string,
month_start_date string,
month_end_date_wid string,
month_end_date string,
quarter_wid string,
quarter_name string,
quarter_no string,
quarter_start_date_wid string,
quarter_start_date string,
quarter_end_date_wid string,
quarter_end_date string,
year_wid string,
year_name string,
year_start_date_wid string,
year_start_date string,
year_end_date_wid string,
year_end_date string,
is_last_day_of_week string,
is_last_day_of_month string,
is_last_day_of_year string,
is_weekend string,
holiday_name string,
day_ago_date_wid string,
day_ago_date string,
week_ago_date_wid string,
week_ago_date string,
month_ago_date_wid string,
month_ago_date string,
quarter_ago_date_wid string,
quarter_ago_date string,
year_ago_date_wid string,
year_ago_date string,
language string,
w_insert_date string,
w_update_date string,
is_work string,
is_business_peak string,
public_holiday string,
work_day double,
inc_year string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
stored as textfile;


desc formatted dim_ids_test.dim_calendar;

dfs -put /home/jasonapp/dim_calendar.json hdfs://test-cluster/user/hive/warehouse/dim_ids_test.db/dim_calendar;

select * from dim_ids_test.dim_calendar limit 1;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.fetch.task.conversion=more;
set hive.exec.parallel=true;
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;
set mapreduce.output.fileoutputformat.compress.type=BLOCK;

DROP TABLE IF EXISTS dim_ids_test.dim_calendar_parquet;
CREATE EXTERNAL TABLE IF NOT EXISTS dim_ids_test.dim_calendar_parquet(
day_wid string,
day_name string,
day_date_chn string,
day_date string,
day_name_of_week string,
day_of_week string,
day_of_month string,
day_of_year string,
week_wid string,
week_name string,
week_no string,
week_start_date_wid string,
week_start_date string,
week_end_date_wid string,
week_end_date string,
month_wid string,
month_name string,
month_no string,
month_days string,
month_start_date_wid string,
month_start_date string,
month_end_date_wid string,
month_end_date string,
quarter_wid string,
quarter_name string,
quarter_no string,
quarter_start_date_wid string,
quarter_start_date string,
quarter_end_date_wid string,
quarter_end_date string,
year_wid string,
year_name string,
year_start_date_wid string,
year_start_date string,
year_end_date_wid string,
year_end_date string,
is_last_day_of_week string,
is_last_day_of_month string,
is_last_day_of_year string,
is_weekend string,
holiday_name string,
day_ago_date_wid string,
day_ago_date string,
week_ago_date_wid string,
week_ago_date string,
month_ago_date_wid string,
month_ago_date string,
quarter_ago_date_wid string,
quarter_ago_date string,
year_ago_date_wid string,
year_ago_date string,
language string,
w_insert_date string,
w_update_date string,
is_work string,
is_business_peak string,
public_holiday string,
work_day double
)
PARTITIONED BY (
  inc_year string COMMENT 'inc_year used by partition'
)
STORED AS parquet
TBLPROPERTIES('parquet.compression'='SNAPPY');

INSERT OVERWRITE TABLE dim_ids_test.dim_calendar_parquet PARTITION (inc_year)
select * from dim_ids_test.dim_calendar;
----------城市维表------dim_ids_test.dim_city
add jar /home/jasonapp/json-serde-1.3.7-jar-with-dependencies.jar;
drop table if exists dim_ids_test.dim_city;
create table if not exists dim_ids_test.dim_city(
dist_id string,
dist_code string,
dist_name string,
dist_alias string,
type_code string,
type_name string,
county_code string,
county_name string,
city_code string,
city_name string,
province_code string,
province_name string,
country_code string,
country_name string,
fbq_area_code string,
ywq_area_code string,
phone_area_code string,
zip_area_code string,
city_level string,
city_desc string,
currency_code string,
weight_unit string,
capital_status string,
provincial_status string,
valid_flag string,
created_emp_code string,
created_time string,
modified_emp_code string,
modified_time string,
fbq_area_name string,
ywq_area_name string,
parent_dist_code string,
inc_day string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
stored as textfile;


desc formatted dim_ids_test.dim_city;

dfs -ls hdfs://test-cluster/user/hive/warehouse/dim_ids_test.db/dim_city;
dfs -rm -f hdfs://test-cluster/user/hive/warehouse/dim_ids_test.db/dim_city/*;

hdfs dfs -put /home/jasonapp/dim_city.*.json hdfs://test-cluster/user/hive/warehouse/dim_ids_test.db/dim_city;

select * from dim_ids_test.dim_city limit 1;

select * from dim_ids_test.dim_city where inc_day='20200103' limit 1;

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.fetch.task.conversion=more;
set hive.exec.parallel=true;
set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.SnappyCodec;
set mapreduce.output.fileoutputformat.compress.type=BLOCK;

DROP TABLE IF EXISTS dim_ids_test.dim_city_parquet;
CREATE EXTERNAL TABLE IF NOT EXISTS dim_ids_test.dim_city_parquet(
dist_id string,
dist_code string,
dist_name string,
dist_alias string,
type_code string,
type_name string,
county_code string,
county_name string,
city_code string,
city_name string,
province_code string,
province_name string,
country_code string,
country_name string,
fbq_area_code string,
ywq_area_code string,
phone_area_code string,
zip_area_code string,
city_level string,
city_desc string,
currency_code string,
weight_unit string,
capital_status string,
provincial_status string,
valid_flag string,
created_emp_code string,
created_time string,
modified_emp_code string,
modified_time string,
fbq_area_name string,
ywq_area_name string,
parent_dist_code string
)
PARTITIONED BY (
  inc_day string COMMENT 'inc_day used by partition'
)
STORED AS parquet
TBLPROPERTIES('parquet.compression'='SNAPPY');

INSERT OVERWRITE TABLE dim_ids_test.dim_city_parquet PARTITION (inc_day)
select * from dim_ids_test.dim_city;


ALTER TABLE dim_ids_test.dim_city_parquet PARTITION (inc_day='20200101') RENAME TO PARTITION (inc_day='20201125');
ALTER TABLE dim_ids_test.dim_city_parquet PARTITION (inc_day='20200102') RENAME TO PARTITION (inc_day='20201126');
ALTER TABLE dim_ids_test.dim_city_parquet PARTITION (inc_day='20200103') RENAME TO PARTITION (inc_day='20201127');
ALTER TABLE dim_ids_test.dim_city_parquet PARTITION (inc_day='20200104') RENAME TO PARTITION (inc_day='20201128');
ALTER TABLE dim_ids_test.dim_city_parquet PARTITION (inc_day='20200105') RENAME TO PARTITION (inc_day='20201129');
----------SAP公司分公司信息表维表------dim_ids_test.dim_company
add jar /home/jasonapp/json-serde-1.3.7-jar-with-dependencies.jar;
drop table if exists dim_ids_test.dim_company;
create table if not exists dim_ids_test.dim_company(
comp_code string,
comp_name string,
city string,
area string,
currency string,
spras string,
ktopl string,
waabw string,
periv string,
kokfi string,
rcomp string,
adrnr string,
fikrs string,
xfmca string,
fmhrdate string,
xfdis string,
xvalv string,
xgjrv string,
xfdmm string,
bukrs_glob string,
fstva string,
opvar string,
xcovr string,
mwskv string,
mwska string,
xnegp string,
fstvare string,
offsacct string,
xcos string,
xsplt string,
dw_insert_tm string,
dw_update_tm string,
dw_datasource_code string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
stored as textfile;


desc formatted dim_ids_test.dim_company;

dfs -put /home/jasonapp/dim_company.json hdfs://test-cluster/user/hive/warehouse/dim_ids_test.db/dim_company;

select * from dim_ids_test.dim_company limit 1;
drop table if exists dim_ids_test.dim_company_parquet;
create table if not exists dim_ids_test.dim_company_parquet(
comp_code string,
comp_name string,
city string,
area string,
currency string,
spras string,
ktopl string,
waabw string,
periv string,
kokfi string,
rcomp string,
adrnr string,
fikrs string,
xfmca string,
fmhrdate string,
xfdis string,
xvalv string,
xgjrv string,
xfdmm string,
bukrs_glob string,
fstva string,
opvar string,
xcovr string,
mwskv string,
mwska string,
xnegp string,
fstvare string,
offsacct string,
xcos string,
xsplt string,
dw_insert_tm string,
dw_update_tm string,
dw_datasource_code string
)
STORED AS parquet
TBLPROPERTIES('parquet.compression'='SNAPPY');

INSERT OVERWRITE TABLE dim_ids_test.dim_company_parquet
select * from dim_ids_test.dim_company;

增加分区,并将数据copy到分区,使用调度平台,每日增量分区数据

ALTER TABLE dim_ids_test.dim_city_parquet PARTITION (inc_day='${dt1}') RENAME TO PARTITION (inc_day='${dt2}');
ALTER TABLE dwd_ids_test.dwd_pub_order_dtl_di_parquet PARTITION (inc_day='${dt1}') RENAME TO PARTITION (inc_day='${dt2}');
ALTER TABLE dwd_ids_test.dwd_order_base_info_dtl_di_parquet PARTITION (inc_day='${dt1}') RENAME TO PARTITION (inc_day='${dt2}');



ALTER TABLE dim_ids_test.dim_city_parquet ADD PARTITION (inc_day='20201122') PARTITION (inc_day='20201123') PARTITION (inc_day='20201124');
ALTER TABLE dwd_ids_test.dwd_pub_order_dtl_di_parquet ADD PARTITION (inc_day='20201122') PARTITION (inc_day='20201123') PARTITION (inc_day='20201124');
ALTER TABLE dwd_ids_test.dwd_order_base_info_dtl_di_parquet ADD PARTITION (inc_day='20201122') PARTITION (inc_day='20201123') PARTITION (inc_day='20201124');


dfs -cp hdfs://test-cluster/user/hive/warehouse/dim_ids_test.db/dim_city_parquet/inc_day=20200101/* hdfs://test-cluster/user/hive/warehouse/dim_ids_test.db/dim_city_parquet/inc_day=20201122;
dfs -cp hdfs://test-cluster/user/hive/warehouse/dim_ids_test.db/dim_city_parquet/inc_day=20200102/* hdfs://test-cluster/user/hive/warehouse/dim_ids_test.db/dim_city_parquet/inc_day=20201123;
dfs -cp hdfs://test-cluster/user/hive/warehouse/dim_ids_test.db/dim_city_parquet/inc_day=20200103/* hdfs://test-cluster/user/hive/warehouse/dim_ids_test.db/dim_city_parquet/inc_day=20201124;
dfs -ls  hdfs://test-cluster/user/hive/warehouse/dim_ids_test.db/dim_city_parquet;


dfs -cp hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_pub_order_dtl_di_parquet/inc_day=20200101/* hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_pub_order_dtl_di_parquet/inc_day=20201122;
dfs -cp hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_pub_order_dtl_di_parquet/inc_day=20200102/* hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_pub_order_dtl_di_parquet/inc_day=20201123;
dfs -cp hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_pub_order_dtl_di_parquet/inc_day=20200103/* hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_pub_order_dtl_di_parquet/inc_day=20201124;
dfs -ls  hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_pub_order_dtl_di_parquet/*;


dfs -cp hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_order_base_info_dtl_di_parquet/inc_day=20200101/* hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_order_base_info_dtl_di_parquet/inc_day=20201122;
dfs -cp hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_order_base_info_dtl_di_parquet/inc_day=20200102/* hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_order_base_info_dtl_di_parquet/inc_day=20201123;
dfs -cp hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_order_base_info_dtl_di_parquet/inc_day=20200103/* hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_order_base_info_dtl_di_parquet/inc_day=20201124;
dfs -ls  hdfs://test-cluster/user/hive/warehouse/dwd_ids_test.db/dwd_order_base_info_dtl_di_parquet/*;


--按年inc_year分区
show partitions dim_ids_test.dim_calendar_parquet;   
--无分区
show partitions dim_ids_test.dim_company_parquet;    
--按天inc_day分区
show partitions dim_ids_test.dim_city_parquet;       

--按天inc_day分区
show partitions dwd_ids_test.dwd_pub_order_dtl_di_parquet;      
--按天inc_day分区            
show partitions dwd_ids_test.dwd_order_base_info_dtl_di_parquet;          

免责声明:文章转载自《Hive-json表-处理JSON格式数据》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇spring中@Scheduled定时任务执行时间不准确问题redis主从机制下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

ThinkPHP5与JQuery实现图片上传和预览效果

内容正文 这篇文章主要为大家详细介绍了thinkphp上传图片功能,和jquery预览图片效果,具有一定的参考价值,感兴趣的小伙伴们可以参考一下  先上效果图: html和js代码如下: <!DOCTYPE html> <html lang="zh-cn"> <head> <meta charset="u...

Flink 读写 iceberg

iceberg 0.11 发布的时候稍微尝试了一下,发现实际并没有说的那么厉害,很多功能其实还在开发中(比如: upsert) 贴段之前写的 flink sql: # HADOOP_HOME is your hadoop root directory after unpack the binary package. export HADOOP_CLASS...

安装dcm4chee-arc-light-5.4.1-mysql步骤

一.进入网址: https://github.com/dcm4che/dcm4chee-arc-light/wiki/Installation这个是GitHub上面给的步骤,可能会比较难理解,按照所给的步骤做,可能会出现各种问题,在此将笔者遇见的问题列举出来,并给出解决方案。 二.安装所需环境: 1、JDK 1.8.0_912、wildfly-10.0...

HBase 伪分布式环境搭建及基础命令使用

一.前提条件: (1)文件存储在HDFS文件系统之上。因此必须启动hadoop服务。(namenode,datanode,resourcemanager,nodemanager,historyserver)(2)源文件依赖于zookeeper。因此需要启动zookeeper服务。(./zkServer ./zkCli.sh) 二,HBase的安装(版本:5...

分库分表带来的完整性和一致性问题

在最近做的一个项目中,由于每天核算的数据量过于庞大,需要把数据库进行分库保存。当数据分散到各个库之后,带来的数据更新操作就会存在一个一致性和完整性的问题。下面是一个典型的场景 假设目前存在三个物理库,现在有一个文件,里面有1W条数据,根据分库的规则,可以把文件里面的数据分到三个库中,现在需要保证这1W条数据要要完整的保存到这三个库里面,并且数据是一致性的...

一种memory问题导致的kernel panic的处理方法

下面是一个在kernel panic或者oops之后,能够打印更多内存信息的patch,主要用到前面介绍的die notify功能注册oops/painc回调函数。 #include <linux/mm.h> #include <linux/spinlock.h> #include <linux/slab.h> #inc...