大数据工具之Superset
概述
Apache Superset是一个开源的、现代的、轻量级BI分析工具,能够对接多种数据源、拥有丰富的图标展示形式、支持自定义仪表盘,且拥有友好的用户界面,十分易用。
由于Superset能够对接常用的大数据分析工具,如Trino、Hive、Kylin、Druid等,且支持自定义仪表盘,故可作为数仓的可视化工具,应用于数据仓库的ADS!
官网:https://superset.apache.org/
安装须知
- Superset 没有对 Windows 的官方支持(这个基本上是废话,谁用Windows做服务器)
- Superset是由Python语言编写的Web应用,要求Python3.6+ 的环境
- Superset建议为虚拟机分配至少 8GB 的 RAM,并配置至少 40GB 的硬盘驱动器,以便为操作系统和所有必需的依赖项提供足够的空间
Python环境
安装更新依赖环境
#1、安装相关依赖
yum -y install zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel
#2.安装更新gcc:
yum install gcc
#3.Python3.7版本之后需要安装libffi-devel
yum install libffi-devel -y
下载安装Python
因为我们很多情况下因为财力所制,同一开发服务器会安装多个不同版本的Python以应对不同的”客户“,所以建议安装Miniconda,对不同python版本进行切换,而且Superset官方也强烈建议在虚拟环境中安装 Superset!
安装Conda
Miniconda3-latest-Linux-x86_64.sh
#1、执行以下命令,安装 Miniconda,并按照提示进行操作bash Miniconda3-latest-Linux-x86_64.sh
#2、一直按回车按着别松,出现是否接受协议,输入 yes
Please answer 'yes' or 'no':'
>>> yes
#3、出现确定安装路径,默认是在安装shell脚本目录下
[/root/miniconda3] >>> /opt/module/miniconda3
#4、出现是否进行conda的初始化,输入 yes
Do you wish the installer to initialize Miniconda3
by running conda init? [yes|no]
[no] >>> yes
#5、看到如下表示安装成功
==> For changes to take effect, close and re-open your current shell. <==
If you'd prefer that conda's base environment not be activated on startup,
set the auto_activate_base parameter to false:
conda config --set auto_activate_base false
Thank you for installing Miniconda3!#6、取消激活base环境:Miniconda安装完成后每次打开终端都会激活其默认的base环境,我们可通过以下命令,禁止激活默认base环境。[root@paratera128 ~]# conda config --set auto_activate_base false#7、配置conda国内镜像,多配几个
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
conda config --set show_channel_urls yes
Python环境配置
conda安装python特别简单,superset最新版本最好选python3.7,python3.8
#1、Python版本指定安装
conda create --name superset python=3.7#2、激活superset环境,进入conda python3.7环境进行操作,不影响主机的py环境
conda activate superset
#3、退出当前环境
conda deactivatecon
#4、删除虚拟环境
conda env remove -n superset
部署Superset(Docker)
安装启动
#通过git下载superset包,官网提供了Docker-Compose傻瓜式安装方式(分开发配置和生产配置)[root@paratera128 opt]# git clone https://github.com/apache/superset.git# 进入项目目录[root@paratera128 opt]# cd superset#这种安装方式跟Docker-Compose版本,Docker引擎版本关联非常大,我本地Docker-Compose和Docker版本如下,官网下载的docker-compose.yml文件version需要改成3.6及以下,版本对应关系可以百度:docker与docker-compose版本对应关系[root@paratera128 ~]# docker --version
Docker version 18.03.1-ce, build 9ee9f40
(superset)[root@paratera128 ~]# docker-compose --versiondocker-compose version 1.26.2, build eefe0d31
#启动脚本赋权[root@paratera128 superset]# chmod 777 docker[root@paratera128 superset]# cd docker/[root@paratera128 docker]# ls
docker-bootstrap.sh docker-ci.sh docker-frontend.sh docker-init.sh frontend-mem-nag.sh pythonpath_dev README.md run-server.sh
[root@paratera128 docker]# chmod 777 *#拉取镜像、启动实例(可以一步到位)[root@paratera128 superset]# docker-compose -f docker-compose-non-dev.yml pull[root@paratera128 superset]# docker-compose -f docker-compose-non-dev.yml up -d
superset_cache is up-to-date
superset_db is up-to-date
Starting superset_worker_beat ... done
Starting superset_app ... done
Starting superset_worker ... done
Starting superset_init ... done#创建管理用户[root@paratera128 superset]# docker exec -it superset_app flask fab create-admin
Username [admin]: admin
User first name [admin]: admin
User last name [user]: admin
Email [[email protected]]: admin
Password:
Repeat for confirmation:
Loaded your LOCAL configuration at [/app/docker/pythonpath_dev/superset_config.py]
logging was configured successfully
2022-07-26 04:10:42,285:INFO:superset.utils.logging_configurator:logging was configured successfully
2022-07-26 04:10:42,293:INFO:root:Configured event logger of type<class 'superset.utils.log.DBEventLogger'>
/usr/local/lib/python3.8/site-packages/flask_caching/__init__.py:201: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
warnings.warn(
Recognized Database Authentications.
Error! User already exists admin
#初始化数据库[root@paratera128 superset]# docker exec -it superset_app superset db upgrade
Loaded your LOCAL configuration at [/app/docker/pythonpath_dev/superset_config.py]
logging was configured successfully
2022-07-26 04:11:58,693:INFO:superset.utils.logging_configurator:logging was configured successfully
2022-07-26 04:11:58,700:INFO:root:Configured event logger of type<class 'superset.utils.log.DBEventLogger'>
/usr/local/lib/python3.8/site-packages/flask_caching/__init__.py:201: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
warnings.warn(
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
#superset初始化[root@paratera128 superset]# docker exec -it superset_app superset init
Loaded your LOCAL configuration at [/app/docker/pythonpath_dev/superset_config.py]
logging was configured successfully
2022-07-26 04:12:47,375:INFO:superset.utils.logging_configurator:logging was configured successfully
2022-07-26 04:12:47,382:INFO:root:Configured event logger of type<class 'superset.utils.log.DBEventLogger'>
/usr/local/lib/python3.8/site-packages/flask_caching/__init__.py:201: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
warnings.warn(
Syncing role definition
2022-07-26 04:12:50,958:INFO:superset.security.manager:Syncing role definition
Syncing Admin perms
2022-07-26 04:12:50,980:INFO:superset.security.manager:Syncing Admin perms
Syncing Alpha perms
2022-07-26 04:12:51,220:INFO:superset.security.manager:Syncing Alpha perms
Syncing Gamma perms
2022-07-26 04:12:51,391:INFO:superset.security.manager:Syncing Gamma perms
Syncing granter perms
2022-07-26 04:12:51,554:INFO:superset.security.manager:Syncing granter perms
Syncing sql_lab perms
2022-07-26 04:12:51,705:INFO:superset.security.manager:Syncing sql_lab perms
Fetching a set of all perms to lookup which ones are missing
2022-07-26 04:12:51,874:INFO:superset.security.manager:Fetching a set of all perms to lookup which ones are missing
Creating missing datasource permissions.
2022-07-26 04:12:52,034:INFO:superset.security.manager:Creating missing datasource permissions.
Creating missing database permissions.
2022-07-26 04:12:52,044:INFO:superset.security.manager:Creating missing database permissions.
Cleaning faulty perms
2022-07-26 04:12:52,056:INFO:superset.security.manager:Cleaning faulty perms
#下载样例数据(可选)[root@paratera128 yum]# docker exec -it superset_app superset load_examples
###DockerCompose 配置
#docker-compose 版本、用户、挂在卷变量x-superset-image:&superset-image apache/superset:latest
x-superset-user:&superset-user root
x-superset-depends-on:&superset-depends-on- db
- redis
x-superset-volumes:&superset-volumes# /app/pythonpath_docker will be appended to the PYTHONPATH in the final container- ./docker:/app/docker
- ./superset:/app/superset
- ./superset-frontend:/app/superset-frontend
- superset_home:/app/superset_home
- ./tests:/app/tests
version:"3.6"services:#Superset Flask-Caching缓存,其实就是缓存用户用过的一些操作,如:仪表板过滤器状态,探索图表表格数据redis:image: redis:latest
container_name: superset_cache
restart: unless-stopped
ports:-"127.0.0.1:6379:6379"volumes:- redis:/data
#PostgreSQL数据库,可选db:env_file: docker/.env
image: postgres:14container_name: superset_db
restart: unless-stopped
ports:-"127.0.0.1:5432:5432"volumes:- db_home:/var/lib/postgresql/data
#superset server启动实例superset:env_file: docker/.env
image:*superset-imagecontainer_name: superset_app
command:["/app/docker/docker-bootstrap.sh","app"]restart: unless-stopped
ports:- 8088:8088user:*superset-userdepends_on:*superset-depends-onvolumes:*superset-volumesenvironment:CYPRESS_CONFIG:"${CYPRESS_CONFIG}"volumes:superset_home:external:falsedb_home:external:falseredis:external:false
部署Superset(pip虚拟)
安装启动
#激活superset环境[root@paratera128 ~]# conda activate superset(superset)[root@paratera128 ~]##安装依赖
yum install -y python-setuptools
yum install -y gcc gcc-c++ libffi-devel python-devel python-pip python-wheel openssl-devel cyrus-sasl-devel openldap-devel
#安装(更新)setuptools 和 pip
pip install --upgrade setuptools pip -i https://pypi.douban.com/simple/
#安装superset
pip install apache-superset -i https://pypi.douban.com/simple/
#指定版本安装
pip install apache-superset –v apache-superset==1.3.0 -i https://pypi.tuna.tsinghua.edu.cn/simple/
看到如下信息表示安装成功了,WARNING信息忽略,只是提示你使用root账号可能造成权限过大,生产环境不会有这个提示
初始化管理员
(superset)[root@paratera128 ~]# export FLASK_APP=superset(superset)[root@paratera128 ~]# flask fab create-admin
Username [admin]: admin
User first name [admin]: admin
User last name [user]: admin
Email [[email protected]]: admin
Password:
Repeat for confirmation:
logging was configured successfully
2022-07-25 18:23:46,139:INFO:superset.utils.logging_configurator:logging was configured successfully
2022-07-25 18:23:46,156:INFO:root:Configured event logger of type<class 'superset.utils.log.DBEventLogger'>
/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/flask_caching/__init__.py:120: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
"Flask-Caching: CACHE_TYPE is set to null, "
Recognized Database Authentications.
Admin User admin created.
初始化数据库
Superset说到底其实就是一个Web应用程序,自带数据库,需要初始化
#更新dataclasses,初始化 superset 数据库
pip install dataclasses
superset db upgrade
若提示:UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
找到python3.7/site-packages/superset/config.py打开编辑:
搜索:“CACHE_TYPE”,全部改成"simple"
基础数据初始化
(superset)[root@paratera128 local]# superset init
logging was configured successfully
2022-07-25 02:24:19,136:INFO:superset.utils.logging_configurator:logging was configured successfully
2022-07-25 02:24:19,148:INFO:root:Configured event logger of type<class 'superset.utils.log.DBEventLogger'>
/opt/module/miniconda3/envs/superset/lib/python3.7/site-packages/flask_caching/__init__.py:120: UserWarning: Flask-Caching: CACHE_TYPE is set to null, caching is effectively disabled.
"Flask-Caching: CACHE_TYPE is set to null, "
Syncing role definition
2022-07-25 02:24:27,821:INFO:superset.security.manager:Syncing role definition
Syncing Admin perms
2022-07-25 02:24:27,920:INFO:superset.security.manager:Syncing Admin perms
Syncing Alpha perms
2022-07-25 02:24:28,026:INFO:superset.security.manager:Syncing Alpha perms
Syncing Gamma perms
2022-07-25 02:24:28,410:INFO:superset.security.manager:Syncing Gamma perms
Syncing granter perms
2022-07-25 02:24:28,741:INFO:superset.security.manager:Syncing granter perms
Syncing sql_lab perms
2022-07-25 02:24:29,045:INFO:superset.security.manager:Syncing sql_lab perms
Fetching a set of all perms to lookup which ones are missing
2022-07-25 02:24:29,687:INFO:superset.security.manager:Fetching a set of all perms to lookup which ones are missing
Creating missing datasource permissions.
2022-07-25 02:24:29,769:INFO:superset.security.manager:Creating missing datasource permissions.
Creating missing database permissions.
2022-07-25 02:24:29,776:INFO:superset.security.manager:Creating missing database permissions.
Cleaning faulty perms
2022-07-25 02:24:29,780:INFO:superset.security.manager:Cleaning faulty perms
服务启动
#通过命令模式启动,并设置五个worker节点进程,统一注册到192.168.137.128:8080(superset)[root@paratera128 local]# gunicorn --workers 5 --timeout 120 --bind 192.168.137.128:8080 "superset.app:create_app()" –daemon[2022-07-25 02:28:47 -0700][104753][INFO] Starting gunicorn 20.0.4
[2022-07-25 02:28:47 -0700][104753][INFO] Listening at: http://192.168.137.128:8080 (104753)[2022-07-25 02:28:47 -0700][104753][INFO] Using worker: sync[2022-07-25 02:28:47 -0700][104756][INFO] Booting worker with pid: 104756[2022-07-25 02:28:47 -0700][104757][INFO] Booting worker with pid: 104757[2022-07-25 02:28:47 -0700][104758][INFO] Booting worker with pid: 104758[2022-07-25 02:28:47 -0700][104759][INFO] Booting worker with pid: 104759[2022-07-25 02:28:47 -0700][104760][INFO] Booting worker with pid: 104760
logging was configured successfully
问题解决
补充依赖如下:
pip install flask -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install wtforms_json -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install flask_appbuilder -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install flask_compress -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install celery -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install flask_migrate -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install flask_talisman -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install flask_caching -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install sqlparse -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install bleach -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install markdown -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install numpy -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install pandas -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install parsedatetime -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install pathlib2 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install simplejson -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install humanize -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install python-geohash -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install polyline -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install geopy -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install sqlalchemy -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install sqlalchemy-utils -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install cryptography -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install backoff -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install msgpack -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install pyarrow -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install contextlib2 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install croniter -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install retry -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install selenium -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install isodate -i https://pypi.tuna.tsinghua.edu.cn/simple
#这个地方markupsafe 2.1.1版本会报错,用低版本的2.0.1覆盖掉(superset)[root@paratera128 superset]# pip show markupsafe
Name: MarkupSafe
Version: 2.1.1
Summary: Safely add untrusted strings to HTML/XML markup.
Home-page: https://palletsprojects.com/p/markupsafe/
Author: Armin Ronacher
Author-email: [email protected]
License: BSD-3-Clause
Location: /opt/module/miniconda3/envs/superset/lib/python3.7/site-packages
Requires:
Required-by: Jinja2, Mako, WTForms
(superset)[root@paratera128 superset]# python -m pip install markupsafe==2.0.1
报错:No PIL installation found 解决
(superset)[root@paratera128 local]# pip install pillow -i https://pypi.tuna.tsinghua.edu.cn/simple(superset)[root@paratera128 local]# superset version
logging was configured successfully
2022-07-25 02:20:07,976:INFO:superset.utils.logging_configurator:logging was configured successfully
2022-07-25 02:20:07,983:INFO:root:Configured event logger of type<class 'superset.utils.log.DBEventLogger'>
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Superset 1.3.0
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
到这里Superset conda虚拟环境模式安装完成
访问Superset
账号密码:admin/admin
连接数据库
MySQL
Trino
连接Trino需要安装相关驱动:https://superset.apache.org/docs/databases/installing-database-drivers/
需要先安装pip,并且版本需求比较高,安装后需要更新
[root@paratera128 yum]# yum -y install epel-release[root@paratera128 yum]# yum -y install python-pip[root@paratera128 yum]# wget https://bootstrap.pypa.io/pip/2.7/get-pip.py[root@paratera128 yum]# python3 get-pip.py#下载驱动[root@paratera128 yum]# pip install sqlalchemy-trino#如果是docker部署的superset,还需要把驱动加载到docker容器[root@paratera128 superset]touch ./docker/requirements-local.txt
[root@paratera128 superset]echo"sqlalchemy-trino">> ./docker/requirements-local.txt
[root@paratera128 superset]docker-compose -f docker-compose-non-dev.yml build --force-rm
[root@paratera128 superset]docker-compose -f docker-compose-non-dev.yml up
报表设计
最普通的Table
看图说话
柱状图
需求:统计一个月内每天的新老用户数
饼图
统计各个频段数据占比
面板
我们可以看到以上创建的Chart组件已经保存到同一个面板了
把Chart拖拽进来即可
API二次开发
参考文档:https://superset.apache.org/docs/api
比如我们想查询上面创建的四个Charts集合,可以使用这个接口
不带参数的话就默认输出所有列,所有数据
版权归原作者 八五年的湘哥 所有, 如有侵权,请联系我们删除。