文章目录
01:其他维度:组织机构
- 目标:实现组织机构维度的设计及构建
- 路径- step1:需求- step2:设计- step3:实现
- 实施- 需求:实现组织机构维度表的构建,得到每个工程师对应的组织机构信息- 统计不同服务人员的工单数、核销数等- 设计- org_employee:员工信息表【员工id、员工编码、员工名称、用户系统id】
select empid,empcode,empname,userid from org_employee;
- org_empposition:员工岗位信息表【员工id、岗位id】select empid,positionid from org_empposition;
- org_position:岗位信息表【岗位id、岗位编码、岗位名称、部门id】select positionid,posicode,posiname,orgid from org_position;
- org_organization:部门信息表【部门id、部门编码、部门名称】select orgid,orgcode,orgname from org_organization;
- 实现- 建维度表-- 创建组织机构维度表,组织机构人员是经常变动的,所以按照日期分区create external tableifnotexists one_make_dws.dim_emporg( empid string comment'人员id', empcode string comment'人员编码(erp对应的账号id)', empname string comment'人员姓名', userid string comment'用户系统id(登录用户名)', posid string comment'岗位id', posicode string comment'岗位编码', posiname string comment'岗位名称', orgid string comment'部门id', orgcode string comment'部门编码', orgname string comment'部门名称')comment'组织机构维度表'partitioned by(dt string)stored as orclocation '/data/dw/dws/one_make/dim_emporg';
- 抽取数据-- 先根据dwd层的表进行关联,然后分别把数据取出来insert overwrite table one_make_dws.dim_emporg partition(dt='20210101')select emp.empid as empid , emp.empcode as empcode , emp.empname as empname , emp.userid as userid , pos.positionid as posid , pos.posicode as posicode , pos.posiname as posiname , org.orgid as orgid , org.orgcode as orgcode , org.orgname as orgnamefrom one_make_dwd.org_employee empleftjoin one_make_dwd.org_empposition emppos on emp.empid = emppos.empid and emp.dt ='20210101'and emppos.dt ='20210101'leftjoin one_make_dwd.org_position pos on emppos.positionid = pos.positionid and pos.dt ='20210101'leftjoin one_make_dwd.org_organization org on pos.orgid = org.orgid and org.dt ='20210101';
- 小结**- 实现组织机构维度的设计及构建
02:其他维度:仓库、物流
- 目标:实现仓库维度、物流维度的构建
- 路径- step1:仓库维度- step2:物流维度
- 实施- 仓库维度- 建表
-- 仓库维度表create external tableifnotexists one_make_dws.dim_warehouse( code string comment'仓库编码', name string comment'仓库名称', company_id string comment'所属公司', company string comment'公司名称', srv_station_id string comment'所属服务网点ID', srv_station_name string comment'所属服务网点名称')comment'仓库维度表'partitioned by(dt string)stored as orclocation '/data/dw/dws/one_make/dim_warehouse';
- 加载insert overwrite table one_make_dws.dim_warehouse partition(dt='20210101')select warehouse.code as code , warehouse.name as name , warehouse.company as company_id , cmp.compmay as compmay , station.id as srv_station_id , station.name as srv_station_namefrom one_make_dwd.ciss_base_warehouse warehouse-- 关联公司信息表leftjoin(select ygcode as company_id,max(companyname)as compmay from one_make_dwd.ciss_base_baseinfo where dt='20210101'-- 需要对company信息进行分组去重,里面有一些重复数据 groupby ygcode) cmp on warehouse.dt ='20210101'and cmp.company_id = warehouse.company-- 关联服务网点和仓库关系表leftjoin one_make_dwd.ciss_r_serstation_warehouse station_r_warehouse on station_r_warehouse.dt ='20210101'and station_r_warehouse.warehouse_code = warehouse.code-- 关联服务网点表 leftjoin one_make_dwd.ciss_base_servicestation station on station.dt ='20210101'and station.id = station_r_warehouse.service_station_id;
- 物流维度- 建表-- 物流维度表(和服务属性表类似)create external tableifnotexists one_make_dws.dim_logistics( prop_name string comment'字典名称', type_id string comment'属性id', type_name string comment'属性名称')comment'物流维度表'partitioned by(dt string)stored as orclocation '/data/dw/dws/one_make/dim_logistics';
- 加载insert overwrite table one_make_dws.dim_logistics partition(dt ='20210101')select dict_t.dicttypename as prop_name , dict_e.dictid as type_id , dict_e.dictname as type_namefrom one_make_dwd.eos_dict_type dict_tinnerjoin one_make_dwd.eos_dict_entry dict_e on dict_t.dt ='20210101'and dict_e.dt ='20210101'and dict_t.dicttypeid = dict_e.dicttypeid and dict_t.dicttypename in('物流公司','物流类型')orderby dict_t.dicttypename, dict_e.dictid;
- 使用如下写法会好一些insert overwrite table one_make_dws.dim_logistics partition(dt ='20210101')select dict_t.dicttypename as prop_name , dict_e.dictid as type_id , dict_e.dictname as type_namefrom one_make_dwd.eos_dict_type dict_t innerjoin one_make_dwd.eos_dict_entry dict_e on dict_t.dt ='20210101'and dict_e.dt ='20210101'and dict_t.dicttypeid = dict_e.dicttypeid -- 通过状态字符串进行关联and dict_t.dicttypename in('物流公司','物流类型')-- 通过和物流相关的字样进行过滤orderby prop_name, type_id;
- 小结**- 实现仓库维度、物流维度的构建
附录一:常见问题
1.错误:没有开启Cross Join
Exception in thread "main" org.apache.spark.sql.AnalysisException: Detected implicit cartesian product for INNER join between logical plans.Use the CROSS JOIN syntax to allow cartesian products between these relations
- Spark2.x默认不允许执行笛卡尔积,除非显示申明cross join或者开启属性:
spark.sql.crossJoin.enabled true
2.错误:Unable to move source
Error: org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://hadoop.bigdata.cn:9000/data/dw/dws/one_make/dim_warehouse/.hive-staging_hive_2020-12-23_04-26-01_363_5663538019799519260-16/-ext-10000/part-00000-63069107-6405-4e31-a55a-6bdeefcd7d9b-c000 to destination hdfs://hadoop.bigdata.cn:9000/data/dw/dws/one_make/dim_warehouse/dt=20210101/part-00000-63069107-6405-4e31-a55a-6bdeefcd7d9b-c000; (state=,code=0)
- 重启SparkSQL的ThriftServer,与MetaStore构建新的会话连接
本文转载自: https://blog.csdn.net/xianyu120/article/details/131081626
版权归原作者 Maynor996 所有, 如有侵权,请联系我们删除。
版权归原作者 Maynor996 所有, 如有侵权,请联系我们删除。