Kubernetes 部署 DolphinScheduler 集群

一、下载

先决条件

  • Helm 3.1.0+
  • Kubernetes 1.12+
  • PV 供应(需要基础设施支持)

安装 dolphinscheduler

1、下载安装包

 wget --no-check-certificate https://dlcdn.apache.org/dolphinscheduler/2.0.5/apache-dolphinscheduler-2.0.5-src.tar.gz

$ tar -zxvf apache-dolphinscheduler-2.0.5-src.tar.gz
$ cd apache-dolphinscheduler-2.0.5-src/docker/kubernetes/dolphinscheduler

下载源代码后,更改路径 apache-dolphinscheduler-2.0.5-src/docker/kubernetes/dolphinscheduler 中的 Chart.yaml 文件,需要同时修改两个地方, 将 repository: https://charts.bitnami.com/bitnami 替换成 repository: https://raw.githubusercontent.com/bitnami/charts/archive-full-index/bitnami

Chart.yaml

[root@k8s-master01 dolphinscheduler]# cat Chart.yaml
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

apiVersion: v2
name: dolphinscheduler
description: Dolphin Scheduler is a distributed and easy-to-expand visual DAG workflow scheduling system, dedicated to solving the complex dependencies in data processing, making the scheduling system out of the box for data processing.
home: https://dolphinscheduler.apache.org
icon: https://dolphinscheduler.apache.org/img/hlogo_colorful.svg
keywords:
- dolphinscheduler
- scheduler
# A chart can be either an 'application' or a 'library' chart.
#
# Application charts are a collection of templates that can be packaged into versioned archives
# to be deployed.
#
# Library charts provide useful utilities or functions for the chart developer. They're included as
# a dependency of application charts to inject those utilities and functions into the rendering
# pipeline. Library charts do not define any templates and therefore cannot be deployed.
type: application

# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
version: 2.0.3

# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application.
appVersion: 2.0.5

dependencies:
- name: postgresql
  version: 10.3.18
  repository: https://raw.githubusercontent.com/bitnami/charts/archive-full-index/bitnami
  condition: postgresql.enabled
- name: zookeeper
  version: 6.5.3
  repository: https://raw.githubusercontent.com/bitnami/charts/archive-full-index/bitnami
  condition: zookeeper.enabled
[root@k8s-master01 dolphinscheduler]# 

2、用 MySQL 作为 DolphinScheduler 的数据库

如何用 MySQL 替代 PostgreSQL 作为 DolphinScheduler 的数据库?

  1. 下载 MySQL 驱动包 mysql-connector-java-8.0.16.jar
cd /root/softwares/ds/ds-image
wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar
  1. 创建一个新的 Dockerfile,用于添加 MySQL 的驱动包:

编写Dockerfile,这里添加Python的环境MiniConda

FROM dolphinscheduler.docker.scarf.sh/apache/dolphinscheduler:2.0.5
COPY mysql-connector-java-8.0.16.jar /opt/dolphinscheduler/lib

# System packages
RUN sed -i s@/archive.ubuntu.com/@/mirrors.aliyun.com/@g /etc/apt/sources.list
RUN apt-get clean
RUN apt-get update && \
    apt-get install -y curl && \
    apt-get install -y expect && \
    apt-get install -y tar

# RUN apt-get update && apt-get install -yq curl wget jq vim

# python env
ARG CONDA_VER=4.12.0
ARG OS_TYPE=x86_64
ARG PY_VER=3.8
ARG PY_VER_CONDA=py38
ARG PANDAS_VER=1.3

# Use the above args
# ARG CONDA_VER
# ARG OS_TYPE
# ARG PY_VER_CONDA

# Install miniconda to /miniconda
# https://repo.anaconda.com/miniconda/Miniconda3-py38_4.12.0-Linux-x86_64.sh
RUN curl -LO "http://repo.continuum.io/miniconda/Miniconda3-${PY_VER_CONDA}_${CONDA_VER}-Linux-${OS_TYPE}.sh"
RUN bash Miniconda3-${PY_VER_CONDA}_${CONDA_VER}-Linux-${OS_TYPE}.sh -p /miniconda -b
RUN rm Miniconda3-${PY_VER_CONDA}_${CONDA_VER}-Linux-${OS_TYPE}.sh
ENV PATH=/miniconda/bin:${PATH}
RUN conda update -y conda
RUN conda init

# ARG PY_VER
# ARG PANDAS_VER
# Install packages from conda
RUN conda install -c anaconda -y python=${PY_VER}
RUN conda install -c anaconda -y \
    pandas=${PANDAS_VER}
  1. 构建一个包含 MySQL 驱动包的新镜像:
    docker build -t apache/dolphinscheduler:mysql-driver .

编译:

[root@quant image]# docker build -t apache/dolphinscheduler:mysql-driver .
...

Downloading and Extracting Packages
openssl-1.1.1q       | 3.8 MB    | ########## | 100% 
python-3.8.13        | 22.7 MB   | ########## | 100% 
ca-certificates-2022 | 131 KB    | ########## | 100% 
certifi-2022.6.15    | 156 KB    | ########## | 100% 
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Retrieving notices: ...working... done
Removing intermediate container 394a97a0668b
 ---> c45736e37149
Step 20/20 : RUN conda install -c anaconda -y     pandas=${PANDAS_VER}
 ---> Running in a28f9dfe2152
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: /miniconda

  added / updated specs:
    - pandas

The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    blas-1.0                   |              mkl           6 KB  anaconda
    bottleneck-1.3.5           |   py38h7deecbd_0         125 KB  anaconda
    intel-openmp-2021.4.0      |    h06a4308_3561         8.8 MB  anaconda
    mkl-2021.4.0               |     h06a4308_640       219.1 MB  anaconda
    mkl-service-2.4.0          |   py38h7f8727e_0          62 KB  anaconda
    mkl_fft-1.3.1              |   py38hd3c417c_0         200 KB  anaconda
    mkl_random-1.2.2           |   py38h51133e4_0         341 KB  anaconda
    numexpr-2.8.3              |   py38h807cd23_0         133 KB  anaconda
    numpy-1.23.1               |   py38h6c91a56_0          10 KB  anaconda
    numpy-base-1.23.1          |   py38ha15fc14_0         7.1 MB  anaconda
    packaging-21.3             |     pyhd3eb1b0_0          35 KB  anaconda
    pandas-1.4.3               |   py38h6a678d5_0        12.6 MB  anaconda
    pyparsing-3.0.4            |     pyhd3eb1b0_0          78 KB  anaconda
    python-dateutil-2.8.2      |     pyhd3eb1b0_0         241 KB  anaconda
    pytz-2022.1                |   py38h06a4308_0         243 KB  anaconda
    six-1.16.0                 |     pyhd3eb1b0_1          19 KB  anaconda
    ------------------------------------------------------------
                                           Total:       249.1 MB

The following NEW packages will be INSTALLED:

  blas               anaconda/linux-64::blas-1.0-mkl
  bottleneck         anaconda/linux-64::bottleneck-1.3.5-py38h7deecbd_0
  intel-openmp       anaconda/linux-64::intel-openmp-2021.4.0-h06a4308_3561
  mkl                anaconda/linux-64::mkl-2021.4.0-h06a4308_640
  mkl-service        anaconda/linux-64::mkl-service-2.4.0-py38h7f8727e_0
  mkl_fft            anaconda/linux-64::mkl_fft-1.3.1-py38hd3c417c_0
  mkl_random         anaconda/linux-64::mkl_random-1.2.2-py38h51133e4_0
  numexpr            anaconda/linux-64::numexpr-2.8.3-py38h807cd23_0
  numpy              anaconda/linux-64::numpy-1.23.1-py38h6c91a56_0
  numpy-base         anaconda/linux-64::numpy-base-1.23.1-py38ha15fc14_0
  packaging          anaconda/noarch::packaging-21.3-pyhd3eb1b0_0
  pandas             anaconda/linux-64::pandas-1.4.3-py38h6a678d5_0
  pyparsing          anaconda/noarch::pyparsing-3.0.4-pyhd3eb1b0_0
  python-dateutil    anaconda/noarch::python-dateutil-2.8.2-pyhd3eb1b0_0
  pytz               anaconda/linux-64::pytz-2022.1-py38h06a4308_0
  six                anaconda/noarch::six-1.16.0-pyhd3eb1b0_1

Downloading and Extracting Packages
numpy-1.23.1         | 10 KB     | ########## | 100% 
pandas-1.4.3         | 12.6 MB   | ########## | 100% 
mkl_random-1.2.2     | 341 KB    | ########## | 100% 
intel-openmp-2021.4. | 8.8 MB    | ########## | 100% 
packaging-21.3       | 35 KB     | ########## | 100% 
python-dateutil-2.8. | 241 KB    | ########## | 100% 
pytz-2022.1          | 243 KB    | ########## | 100% 
pyparsing-3.0.4      | 78 KB     | ########## | 100% 
numpy-base-1.23.1    | 7.1 MB    | ########## | 100% 
bottleneck-1.3.5     | 125 KB    | ########## | 100% 
blas-1.0             | 6 KB      | ########## | 100% 
mkl-service-2.4.0    | 62 KB     | ########## | 100% 
numexpr-2.8.3        | 133 KB    | ########## | 100% 
mkl_fft-1.3.1        | 200 KB    | ########## | 100% 
six-1.16.0           | 19 KB     | ########## | 100% 
mkl-2021.4.0         | 219.1 MB  | ########## | 100% 
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
Retrieving notices: ...working... done
Removing intermediate container a28f9dfe2152
 ---> 5f02810f6511
Successfully built 5f02810f6511
Successfully tagged apache/dolphinscheduler:mysql-driver
[root@quant image]# docker images
REPOSITORY                                                 TAG            IMAGE ID       CREATED          SIZE
apache/dolphinscheduler                                    mysql-driver   5f02810f6511   42 seconds ago   2.5GB
<none>                                                     <none>         58b827c9ff7b   17 minutes ago   434MB
  1. 推送 docker 镜像 apache/dolphinscheduler:mysql-driver 到一个 docker registry 中

  2. 修改 values.yaml 文件中 image 的 repository 字段,并更新 tag 为 mysql-driver

  3. 修改 values.yaml 文件中 postgresql 的 enabled 为 false

  4. 修改 values.yaml 文件中的 externalDatabase 配置 (尤其修改 host, username 和 password)

    externalDatabase:
    type: "mysql"
    driver: "com.mysql.jdbc.Driver"
    host: "localhost"
    port: "3306"
    username: "root"
    password: "root"
    database: "dolphinscheduler"
    params: "useUnicode=true&characterEncoding=UTF-8"

8、Python环境更改

修改 values.yaml 文件中的 PYTHON_HOME 为 /usr/bin/python3

9、部署

$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm dependency update .
$ helm install dolphinscheduler . --set image.tag=2.0.5

相关文章:
官网 | 快速试用 Kubernetes 部署Dolphinscheduler

为者常成,行者常至