03-26 05:34 阅读 186

Rook 1.5.1 部署Ceph实操经验分享

一、Rook概述

1.1 Rook简介

Rook 是一个开源的cloud-native storage编排, 提供平台和框架；为各种存储解决方案提供平台、框架和支持，以便与云原生环境本地集成。目前主要专用于Cloud-Native环境的文件、块、对象存储服务。它实现了一个自我管理的、自我扩容的、自我修复的分布式存储服务。

Rook支持自动部署、启动、配置、分配（provisioning）、扩容/缩容、升级、迁移、灾难恢复、监控，以及资源管理。为了实现所有这些功能，Rook依赖底层的容器编排平台，例如 kubernetes、CoreOS 等。。

Rook 目前支持Ceph、NFS、Minio Object Store、Edegefs、Cassandra、CockroachDB 存储的搭建。

项目地址：https://github.com/rook/rook

网站：https://rook.io/

1.2 Rook组件

Rook的主要组件有三个，功能如下：

Rook Operator

Rook与Kubernetes交互的组件
整个Rook集群只有一个

Agent or Driver
已经被淘汰的驱动方式，在安装之前，请确保k8s集群版本是否支持CSI，如果不支持，或者不想用CSI，选择flex.
默认全部节点安装，你可以通过 node affinity 去指定节点

Ceph CSI Driver

Flex Driver

Device discovery
发现新设备是否作为存储，可以在配置文件ROOK_ENABLE_DISCOVERY_DAEMON设置 enable 开启。

1.3 Rook & Ceph框架

Rook 如何集成在kubernetes 如图：
Rook 1.5.1 部署Ceph实操经验分享

使用Rook部署Ceph集群的架构图如下：
Rook 1.5.1 部署Ceph实操经验分享

部署的Ceph系统可以提供下面三种Volume Claim服务：

Block Storage：目前最稳定；
FileSystem：需要部署MDS，有内核要求；
Object：部署RGW；

二、ROOK 部署

2.1 准备工作

2.1.1 版本要求

kubernetes v.11 以上

2.1.2 存储要求

rook部署的ceph 是不支持lvm direct直接作为osd存储设备的，如果想要使用lvm，可以使用pvc的形式实现。方法在后面的ceph安装会提到

为了配置 Ceph 存储集群，至少需要以下本地存储选项之一：

原始设备（无分区或格式化的文件系统）
原始分区（无格式文件系统）可以 lsblk -f查看，如果 FSTYPE不为空说明有文件系统
可通过 block 模式从存储类别获得 PV

2.1.3 系统要求

本次安装环境

kubernetes 1.18
centos7.8
kernel 5.4.65-200.el7.x86_64
calico 3.16

2.1.3.1 需要安装lvm包

sudo yum install -y lvm2

2.1.3.2 内核要求

RBD

一般发行版的内核都编译有，但你最好确定下：

foxchan@~$ lsmod|grep rbd
rbd 114688 0
libceph 368640 1 rbd

可以用以下命令放到开机启动项里

cat > /etc/sysconfig/modules/rbd.modules << EOF
modprobe rbd
EOF

CephFS

如果你想使用cephfs,内核最低要求是4.17。

2.2 部署ROOK

Github上下载Rook最新release

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.gits

安装公共部分

cd rook/cluster/examples/kubernetes/ceph
kubectl create -f crds.yaml -f common.yaml

安装operator

kubectl apply -f operator.yaml

如果放到生产环境，请提前规划好。operator的配置在ceph安装后不能修改，否则rook会删除集群并重建。

修改内容如下：

# 启用cephfs ROOK_CSI_ENABLE_CEPHFS: "true"# 开启内核驱动替换ceph-fuseCSI_FORCE_CEPHFS_KERNEL_CLIENT: "true"#修改csi镜像为私有仓，加速部署时间ROOK_CSI_CEPH_IMAGE: "harbor.foxchan.com/google_containers/cephcsi/cephcsi:v3.1.2"
ROOK_CSI_REGISTRAR_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-node-driver-registrar:v2.0.1"
ROOK_CSI_RESIZER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-resizer:v1.0.0"
ROOK_CSI_PROVISIONER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-provisioner:v2.0.0"
ROOK_CSI_SNAPSHOTTER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-snapshotter:v3.0.0"
ROOK_CSI_ATTACHER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-attacher:v3.0.0"# 可以设置NODE_AFFINITY 来指定csi 部署的节点# 我把plugin 和 provisioner分开了，具体调度方式看你集群资源。CSI_PROVISIONER_NODE_AFFINITY: "app.rook.role=csi-provisioner"
CSI_PLUGIN_NODE_AFFINITY: "app.rook.plugin=csi"#修改metrics端口，可以不改，我因为集群网络是host，为了避免端口冲突# Configure CSI CSI Ceph FS grpc and liveness metrics portCSI_CEPHFS_GRPC_METRICS_PORT: "9491"
CSI_CEPHFS_LIVENESS_METRICS_PORT: "9481"# Configure CSI RBD grpc and liveness metrics portCSI_RBD_GRPC_METRICS_PORT: "9490"
CSI_RBD_LIVENESS_METRICS_PORT: "9480"# 修改rook镜像，加速部署时间image: harbor.foxchan.com/google_containers/rook/ceph:v1.5.1# 指定节点做存储
        - name: DISCOVER_AGENT_NODE_AFFINITY
          value: "app.rook=storage"# 开启设备自动发现
        - name: ROOK_ENABLE_DISCOVERY_DAEMON
          value: "true"

2.3 部署ceph集群

cluster.yaml文件里的内容需要修改，一定要适配自己的硬件情况，请详细阅读配置文件里的注释，避免我踩过的坑。

修改内容如下：

此文件的配置，除了增删osd设备外，其他的修改都要重装ceph集群才能生效，所以请提前规划好集群。如果修改后不卸载ceph直接apply，会触发ceph集群重装，导致集群异常挂掉

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
# 命名空间的名字，同一个命名空间只支持一个集群
  name: rook-ceph
  namespace: rook-ceph
spec:
# ceph版本说明
# v13 is mimic, v14 is nautilus, and v15 is octopus.
  cephVersion:
#修改ceph镜像，加速部署时间
    image: harbor.foxchan.com/google_containers/ceph/ceph:v15.2.5
# 是否允许不支持的ceph版本
    allowUnsupported: false
#指定rook数据在节点的保存路径
  dataDirHostPath: /data/rook
# 升级时如果检查失败是否继续
  skipUpgradeChecks: false
# 从1.5开始，mon的数量必须是奇数
  mon:
    count: 3
# 是否允许在单个节点上部署多个mon pod
    allowMultiplePerNode: false
  mgr:
    modules:
    - name: pg_autoscaler
      enabled: true
# 开启dashboard，禁用ssl，指定端口是7000，你可以默认https配置。我是为了ingress配置省事。
  dashboard:
    enabled: true
    port: 7000
    ssl: false
# 开启prometheusRule
  monitoring:
    enabled: true
# 部署PrometheusRule的命名空间，默认此CR所在命名空间
    rulesNamespace: rook-ceph
# 开启网络为host模式，解决无法使用cephfs pvc的bug
  network:
    provider: host
# 开启crash collector，每个运行了Ceph守护进程的节点上创建crash collector pod
  crashCollector:
    disable: false
# 设置node亲缘性，指定节点安装对应组件
  placement:
    mon:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: ceph-mon
              operator: In
              values:
              - enabled

    osd:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: ceph-osd
              operator: In
              values:
              - enabled

    mgr:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: ceph-mgr
              operator: In
              values:
              - enabled
# 存储的设置，默认都是true，意思是会把集群所有node的设备清空初始化。
  storage: # cluster level storage configuration and selection
    useAllNodes: false     #关闭使用所有Node
    useAllDevices: false   #关闭使用所有设备
    nodes:
    - name: "192.168.1.162"  #指定存储节点主机
      devices:
      - name: "nvme0n1p1"    #指定磁盘为nvme0n1p1
    - name: "192.168.1.163"
      devices:
      - name: "nvme0n1p1"
    - name: "192.168.1.164"
      devices:
      - name: "nvme0n1p1"
    - name: "192.168.1.213"
      devices:
      - name: "nvme0n1p1"

更多 cluster 的 CRD 配置参考：

https://github.com/rook/rook/blob/master/Documentation/ceph-cluster-crd.md

执行安装

kubectl apply -f cluster.yaml# 需要等一段时间，所有pod都已正常启动[foxchan@k8s-master ceph]$ kubectl get pods -n rook-ceph
NAME                                                      READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-b5tlr                                    3/3     Running     0          19h
csi-cephfsplugin-mjssm                                    3/3     Running     0          19h
csi-cephfsplugin-provisioner-5cf5ffdc76-mhdgz             6/6     Running     0          19h
csi-cephfsplugin-provisioner-5cf5ffdc76-rpdl8             6/6     Running     0          19h
csi-cephfsplugin-qmvkc                                    3/3     Running     0          19h
csi-cephfsplugin-tntzd                                    3/3     Running     0          19h
csi-rbdplugin-4p75p                                       3/3     Running     0          19h
csi-rbdplugin-89mzz                                       3/3     Running     0          19h
csi-rbdplugin-cjcwr                                       3/3     Running     0          19h
csi-rbdplugin-ndjcj                                       3/3     Running     0          19h
csi-rbdplugin-provisioner-658dd9fbc5-fwkmc                6/6     Running     0          19h
csi-rbdplugin-provisioner-658dd9fbc5-tlxd8                6/6     Running     0          19h
prometheus-rook-prometheus-0                              2/2     Running     1          3d17h
rook-ceph-mds-myfs-a-5cbcdc6f9c-7mdsv                     1/1     Running     0          19h
rook-ceph-mds-myfs-b-5f4cc54b87-m6m6f                     1/1     Running     0          19h
rook-ceph-mgr-a-f98d4455b-bwhw7                           1/1     Running     0          20h
rook-ceph-mon-a-5d445d4b8d-lmg67                          1/1     Running     1          20h
rook-ceph-mon-b-769c6fd76f-jrlc8                          1/1     Running     0          20h
rook-ceph-mon-c-6bfd8954f5-tbsnd                          1/1     Running     0          20h
rook-ceph-operator-7d8cc65dc-8wtl8                        1/1     Running     0          20h
rook-ceph-osd-0-c558ff759-bzbgw                           1/1     Running     0          20h
rook-ceph-osd-1-5c97d69d78-dkxbb                          1/1     Running     0          20h
rook-ceph-osd-2-7dddc7fd56-p58mw                          1/1     Running     0          20h
rook-ceph-osd-3-65ff985c7d-9gfgj                          1/1     Running     0          20h
rook-ceph-osd-prepare-192.168.1.213-pw5gr                 0/1     Completed   0          19h
rook-ceph-osd-prepare-192.168.1.162-wtkm8                 0/1     Completed   0          19h
rook-ceph-osd-prepare-192.168.1.163-b86r2                 0/1     Completed   0          19h
rook-ceph-osd-prepare-192.168.1.164-tj79t                 0/1     Completed   0          19h
rook-discover-89v49                                       1/1     Running     0          20h
rook-discover-jdzhn                                       1/1     Running     0          20h
rook-discover-sl9bv                                       1/1     Running     0          20h
rook-discover-wg25w                                       1/1     Running     0          20h

2.4 增删osd

2.4.1 添加相关label

kubectl label nodes 192.168.1.165 app.rook=storage
kubectl label nodes 192.168.1.165 ceph-osd=enabled

2.4.2 修改cluster.yaml

    nodes:
    - name: "192.168.1.162"
      devices:
      - name: "nvme0n1p1"
    - name: "192.168.1.163"
      devices:
      - name: "nvme0n1p1"
    - name: "192.168.1.164"
      devices:
      - name: "nvme0n1p1"
    - name: "192.168.17.213"
      devices:
      - name: "nvme0n1p1"
  #添加165的磁盘信息
    - name: "192.168.1.165"
      devices:
      - name: "nvme0n1p1"

2.4.3 apply cluster.yaml

kubectl apply -f cluster.yaml

2.4.4 删除osd

cluster.yaml去掉相关节点，再apply

2.5 安装dashboard

这是我自己的traefik ingress，yaml目录里有很多dashboard暴露方式，自行选择

dashboard已经在前述的步骤中包含了，这里只需要把dashboard service的服务暴露出来。有多种方法，我使用的是ingress的方式来暴露：

apiVersion: traefik.containo.us/v1alpha1
kind: Ingre***oute
metadata:
  name: traefik-ceph-dashboard
  annotations:
    kubernetes.io/ingress.class: traefik-v2.3
spec:
  entryPoints:
    - web
  routes:
  - match: Host(`ceph.foxchan.com`)
    kind: Rule
    services:
    - name: rook-ceph-mgr-dashboard
      namespace: rook-ceph
      port: 7000
    middlewares:
      - name: gs-ipwhitelist

要检索生成的密码，可以运行以下命令：

kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo

2.6 安装toolbox

执行下面的命令：

kubectl apply -f toolbox.yaml

成功后，可以使用下面的命令来确定toolbox的pod已经启动成功：

kubectl -n rook-ceph get pod -l "app=rook-ceph-tools"

然后可以使用下面的命令登录该pod，执行各种ceph命令：

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

比如：

ceph status
ceph osd status
ceph df
rados df

删除toolbox

kubectl -n rook-ceph delete deploy/rook-ceph-tools

2.7 prometheus监控

监控部署很简单，利用Prometheus Operator，独立部署一套prometheus

安装prometheus operator

kubectl apply -f https://raw.githubusercontent.com/coreos/prometheus-operator/v0.40.0/bundle.yaml

安装prometheus

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph/monitoring
kubectl create -f service-monitor.yaml
kubectl create -f prometheus.yaml
kubectl create -f prometheus-service.yaml

默认是nodeport方式暴露

echo "http://$(kubectl -n rook-ceph -o jsonpath={.status.hostIP} get pod prometheus-rook-prometheus-0):30900"

开启Prometheus Alerts

此操作必须在ceph集群安装之前

安装rbac

kubectl create -f cluster/examples/kubernetes/ceph/monitoring/rbac.yaml

确保cluster.yaml 开启

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
[...]
spec:
[...]
  monitoring:
    enabled: true
    rulesNamespace: "rook-ceph"
[...]

Grafana Dashboards

Grafana 版本大于等于 7.2.0

推荐一下dashboard

2.8 删除ceph集群

删除ceph集群前，请先清理相关pod

删除块存储和文件存储

kubectl delete -n rook-ceph cephblockpool replicapool
kubectl delete storageclass rook-ceph-block
kubectl delete -f csi/cephfs/filesystem.yaml
kubectl delete storageclass csi-cephfs rook-ceph-block

删除operator和相关crd

kubectl delete -f operator.yaml
kubectl delete -f common.yaml
kubectl delete -f crds.yaml

清除主机上的数据

删除Ceph集群后，在之前部署Ceph组件节点的/data/rook/目录，会遗留下Ceph集群的配置信息。

若之后再部署新的Ceph集群，先把之前Ceph集群的这些信息删除，不然启动monitor会失败；

# cat clean-rook-dir.shhosts=(
  192.168.1.213
  192.168.1.162
  192.168.1.163
  192.168.1.164
)

for host in ${hosts[@]} ; do
ssh $host "rm -rf /data/rook/*"
done

清除device

#!/usr/bin/env bashDISK="/dev/nvme0n1p1"# Zap the disk to a fresh, usable state (zap-all is important, b/c MBR has to be clean)# You will have to run this step for all disks.sgdisk --zap-all $DISK# hdd 用以下命令dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync# ssd 用以下命令blkdiscard $DISK# These steps only have to be run once on each node# If rook sets up osds using ceph-volume, teardown leaves some devices mapped that lock the disks.ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %# ceph-volume setup can leave ceph-<UUID> directories in /dev (unnecessary clutter)rm -rf /dev/ceph-*

如果因为某些原因导致删除ceph集群卡主，可以先执行以下命令，再删除ceph集群就不会卡主了

kubectl -n rook-ceph patch cephclusters.ceph.rook.io rook-ceph -p '{"metadata":{"finalizers": []}}' --type=merge

2.9 rook升级

2.9.1 小版本升级

Rook v1.5.0 to Rook v1.5.1

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.gits
cd $YOUR_ROOK_REPO/cluster/examples/kubernetes/ceph/
kubectl apply -f common.yaml -f crds.yaml
kubectl -n rook-ceph set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.5.1

2.9.2 跨版本升级

Rook v1.4.x to Rook v1.5.x.

准备

设置环境变量

# Parameterize the environmentexport ROOK_SYSTEM_NAMESPACE="rook-ceph"
export ROOK_NAMESPACE="rook-ceph"

升级之前需要保证集群健康

所有pod 是running

kubectl -n $ROOK_NAMESPACE get pods

通过tool 查看ceph集群状态是否正常

TOOLS_POD=$(kubectl -n $ROOK_NAMESPACE get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}')
kubectl -n $ROOK_NAMESPACE exec -it $TOOLS_POD -- ceph status

  cluster:
    id:     194d139f-17e7-4e9c-889d-2426a844c91b
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 25h)
    mgr: a(active, since 5h)
    mds: myfs:1 {0=myfs-b=up:active} 1 up:standby-replay
    osd: 4 osds: 4 up (since 25h), 4 in (since 25h)

  task status:
    scrub status:
        mds.myfs-a: idle
        mds.myfs-b: idle

  data:
    pools:   4 pools, 97 pgs
    objects: 2.08k objects, 7.6 GiB
    usage:   26 GiB used, 3.3 TiB / 3.3 TiB avail
    pgs:     97 active+clean

io:
client: 1.2 KiB/s rd, 2 op/s rd, 0 op/s wr

升级operator

1、升级common和crd

git clone --single-branch --branch v1.5.1 https://github.com/rook/rook.gits
cd rook/cluster/examples/kubernetes/ceph
kubectl apply -f common.yaml -f crds.yaml

2、升级 Ceph CSI versions

可以修改cm来自己制定镜像版本，如果是默认的配置，无需修改

kubectl -n rook-ceph get configmap rook-ceph-operator-config

ROOK_CSI_CEPH_IMAGE: "harbor.foxchan.com/google_containers/cephcsi/cephcsi:v3.1.1"
ROOK_CSI_REGISTRAR_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-node-driver-registrar:v2.0.1"
ROOK_CSI_PROVISIONER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-provisioner:v2.0.0"
ROOK_CSI_SNAPSHOTTER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-snapshotter:v3.0.0"
ROOK_CSI_ATTACHER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-attacher:v3.0.0"
ROOK_CSI_RESIZER_IMAGE: "harbor.foxchan.com/google_containers/k8scsi/csi-resizer:v1.0.0"

3、升级 Rook Operator

kubectl -n $ROOK_SYSTEM_NAMESPACE set image deploy/rook-ceph-operator rook-ceph-operator=rook/ceph:v1.5.1

4、等待集群升级完毕

watch --exec kubectl -n $ROOK_NAMESPACE get deployments -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \trook-version="}{.metadata.labels.rook-version}{"\n"}{end}'

5、验证集群升级完毕

kubectl -n $ROOK_NAMESPACE get deployment -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{"rook-version="}{.metadata.labels.rook-version}{"\n"}{end}' | sort | uniq

升级ceph 版本

如果集群状态不监控，operator会拒绝升级

1、升级ceph镜像

NEW_CEPH_IMAGE='ceph/ceph:v15.2.5'
CLUSTER_NAME=rook-ceph
kubectl -n rook-ceph patch CephCluster rook-ceph --type=merge -p "{\"spec\": {\"cephVersion\": {\"image\": \"$NEW_CEPH_IMAGE\"}}}"

2、观察pod 升级

watch --exec kubectl -n $ROOK_NAMESPACE get deployments -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{.metadata.name}{" \treq/upd/avl: "}{.spec.replicas}{"/"}{.status.updatedReplicas}{"/"}{.status.readyReplicas}{" \tceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}'

3、查看ceph集群是否正常

kubectl -n $ROOK_NAMESPACE get deployment -l rook_cluster=$ROOK_NAMESPACE -o jsonpath='{range .items[*]}{"ceph-version="}{.metadata.labels.ceph-version}{"\n"}{end}' | sort | uniq

三、部署块存储

3.1 创建pool和StorageClass

# 定义一个块存储池
apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
  namespace: rook-ceph
spec:
  # 每个数据副本必须跨越不同的故障域分布，如果设置为host，则保证每个副本在不同机器上
  failureDomain: host
  # 副本数量
  replicated:
    size: 3
    # Disallow setting pool with replica 1, this could lead to data loss without recovery.
    # Make sure you're *ABSOLUTELY CERTAIN* that is what you want
    requireSafeReplicaSize: true
    # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
    # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
    #targetSizeRatio: .5
---
# 定义一个StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: rook-ceph-block
# 该SC的Provisioner标识，rook-ceph前缀即当前命名空间
provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
    # clusterID 就是集群所在的命名空间名
    # If you change this namespace, also change the namespace below where the secret namespaces are defined
    clusterID: rook-ceph

    # If you want to use erasure coded pool with RBD, you need to create
    # two pools. one erasure coded and one replicated.
    # You need to specify the replicated pool here in the `pool` parameter, it is
    # used for the metadata of the images.
    # The erasure coded pool must be set as the `dataPool` parameter below.
    #dataPool: ec-data-pool
    # RBD镜像在哪个池中创建
    pool: replicapool

# RBD image format. Defaults to "2".
imageFormat: "2"

# 指定image特性，CSI RBD目前仅仅支持layering
imageFeatures: layering

    # Ceph admin 管理凭证配置,由operator 自动生成
    # in the same namespace as the cluster.
    csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
    csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
    csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
    csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
    csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
    csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
    # 卷的文件系统类型，默认ext4，不建议xfs，因为存在潜在的死锁问题（超融合设置下卷被挂载到相同节点作为OSD时）
    csi.storage.k8s.io/fstype: ext4
# uncomment the following to use rbd-nbd as mounter on supported nodes
# **IMPORTANT**: If you are using rbd-nbd as the mounter, during upgrade you will be hit a ceph-csi
# issue that causes the mount to be disconnected. You will need to follow special upgrade steps
# to restart your application pods. Therefore, this option is not recommended.
#mounter: rbd-nbd
allowVolumeExpansion: true
reclaimPolicy: Delete

3.2 demo示例

推荐pvc 和应用写到一个yaml里面

#创建pvc
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-demo-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-ceph-block
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: csirbd-demo-pod
  labels:
    test-cephrbd: "true"
spec:
  replicas: 1
  selector:
    matchLabels:
      test-cephrbd: "true"
  template:
    metadata:
      labels:
        test-cephrbd: "true"
    spec:
      containers:
       - name: web-server-rbd
         image: harbor.foxchan.com/sys/nginx:1.19.4-alpine
         volumeMounts:
           - name: mypvc
             mountPath: /usr/share/nginx/html
      volumes:
       - name: mypvc
         persistentVolumeClaim:
           claimName: rbd-demo-pvc
           readOnly: false

四、部署文件系统

4.1 创建CephFS

CephFS的CSI驱动使用Quotas来强制应用PVC声明的大小，仅仅4.17+内核才能支持CephFS quotas。
如果内核不支持，而且你需要配额管理，配置Operator环境变量 CSI_FORCE_CEPHFS_KERNEL_CLIENT: false来启用FUSE客户端。
使用FUSE客户端时，升级Ceph集群时应用Pod会断开mount，需要重启才能再次使用PV。

apiVersion: ceph.rook.io/v1
kind: CephFilesystem
metadata:
  name: myfs
  namespace: rook-ceph
spec:
  # The metadata pool spec. Must use replication.
  metadataPool:
    replicated:
      size: 3
      requireSafeReplicaSize: true
    parameters:
      # Inline compression mode for the data pool
      # Further reference: https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-ref/#inline-compression
      compression_mode: none
        # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
      # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
      #target_size_ratio: ".5"
  # The list of data pool specs. Can use replication or erasure coding.
  dataPools:
    - failureDomain: host
      replicated:
        size: 3
        # Disallow setting pool with replica 1, this could lead to data loss without recovery.
        # Make sure you're *ABSOLUTELY CERTAIN* that is what you want
        requireSafeReplicaSize: true
      parameters:
        # Inline compression mode for the data pool
        # Further reference: https://docs.ceph.com/docs/nautilus/rados/configuration/bluestore-config-ref/#inline-compression
        compression_mode: none
          # gives a hint (%) to Ceph in terms of expected consumption of the total cluster capacity of a given pool
        # for more info: https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-expected-pool-size
        #target_size_ratio: ".5"
  # Whether to preserve filesystem after CephFilesystem CRD deletion
  preserveFilesystemOnDelete: true
  # The metadata service (mds) configuration
  metadataServer:
    # The number of active MDS instances
    activeCount: 1
    # Whether each active MDS instance will have an active standby with a warm metadata cache for faster failover.
    # If false, standbys will be available, but will not have a warm cache.
    activeStandby: true
    # The affinity rules to apply to the mds deployment
    placement:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: app.storage
              operator: In
              values:
              - rook-ceph
    #  topologySpreadConstraints:
    #  tolerations:
    #  - key: mds-node
    #    operator: Exists
    #  podAffinity:
      podAntiAffinity:
         requiredDuringSchedulingIgnoredDuringExecution:
         - labelSelector:
             matchExpressions:
             - key: ceph-mds
               operator: In
               values:
               - enabled
            # topologyKey: kubernetes.io/hostname will place MDS across different hosts
           topologyKey: kubernetes.io/hostname
         preferredDuringSchedulingIgnoredDuringExecution:
         - weight: 100
           podAffinityTerm:
             labelSelector:
               matchExpressions:
               - key: ceph-mds
                 operator: In
                 values:
                  - enabled
              # topologyKey: */zone can be used to spread MDS across different AZ
              # Use <topologyKey: failure-domain.beta.kubernetes.io/zone> in k8s cluster if your cluster is v1.16 or lower
              # Use <topologyKey: topology.kubernetes.io/zone>  in k8s cluster is v1.17 or upper
             topologyKey: topology.kubernetes.io/zone
    # A key/value list of annotations
    annotations:
    #  key: value
    # A key/value list of labels
    labels:
    #  key: value
    resources:
    # The requests and limits set here, allow the filesystem MDS Pod(s) to use half of one CPU core and 1 gigabyte of memory
    #  limits:
    #    cpu: "500m"
    #    memory: "1024Mi"
    #  requests:
    #    cpu: "500m"
    #    memory: "1024Mi"
    # priorityClassName: my-priority-class

4.2 创建StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-cephfs
provisioner: rook-ceph.cephfs.csi.ceph.com
parameters:
  # clusterID is the namespace where operator is deployed.
  clusterID: rook-ceph

# CephFS filesystem name into which the volume shall be created
fsName: myfs

  # Ceph pool into which the volume shall be created
  # Required for provisionVolume: "true"
  pool: myfs-data0

  # Root path of an existing CephFS volume
  # Required for provisionVolume: "false"
  # rootPath: /absolute/path

  # The secrets contain Ceph admin credentials. These are generated automatically by the operator
  # in the same namespace as the cluster.
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

  # (optional) The driver can use either ceph-fuse (fuse) or ceph kernel client (kernel)
  # If omitted, default volume mounter will be used - this is determined by probing for ceph-fuse
  # or by setting the default mounter explicitly via --volumemounter command-line argument.
  #使用kernel client
  mounter: kernel
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
  # uncomment the following line for debugging
  #- debug

4.3 创建pvc

在创建cephfs 的pvc 发现一直处于pending状态，社区有人认为是网络组件的差异，目前我的calico无法成功，只能改为host模式，flannel可以。

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cephfs-pvc
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-cephfs

4.4 demo示例

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cephfs-demo-pvc
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-cephfs
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: csicephfs-demo-pod
  labels:
    test-cephfs: "true"
spec:
  replicas: 2
  selector:
    matchLabels:
      test-cephfs: "true"
  template:
    metadata:
      labels:
        test-cephfs: "true"
    spec:
      containers:
      - name: web-server
        image: harbor.foxchan.com/sys/nginx:1.19.4-alpine
        imagePullPolicy: Always
        volumeMounts:
        - name: mypvc
          mountPath: /usr/share/nginx/html
      volumes:
      - name: mypvc
        persistentVolumeClaim:
          claimName: cephfs-demo-pvc
          readOnly: false

五、遇到问题

5.1 lvm direct 不能直接做osd 存储

官方issue

https://github.com/rook/rook/issues/5751

https://github.com/rook/rook/issues/2047

解决方式：

可以手动创建本地pvc，把lvm挂载上在做osd设备。如果手动闲麻烦，可以使用local-path-provisioner

5.2 Cephfs pvc pending

官方issue

https://github.com/rook/rook/issues/6183

https://github.com/rook/rook/issues/4006

解决方式：

更换k8s 网络组件，或者把ceph集群网络开启host