This document describes the strucutre of our ceph cluster as of May 12, 2021.
Ceph¶
The ceph cluster used for our OpenStack, OpenShift and baremetal environments is made up of 10 OSD servers and 3 monitors.
Each OSD server has 10 HDDs and 1 nVme drive. The HDDs make up the “size” root, while the nVme drives make the “performance” root.
Everything is triple replicated except RGW storage pools.
How do I login?¶
- Login to the monitors over the foreman network (172.16.0.0/19)
This works if you have access to the MOC VPN or a gateway like kzn-ipmi-gw.infra.massopen.cloud
.
Host | IP Address |
---|---|
kzn-mon1 | 172.16.19.15 |
kzn-mon2 | 172.16.17.14 |
kzn-mon3 | 172.16.5.14 |
From the monitors you can run various ceph commands. Look into /etc/hosts to SSH to the osd servers and RGWs.
- Login to the public RGW that we use for OpenStack Swift.
Host | IP Address |
---|---|
kzn-rgw1.infra.massopen.cloud | 128.31.24.16 |
kzn-rgw2.infra.massopen.cloud | 128.31.24.17 |
kzn-swift.infra.massopen.cloud | 128.31.24.18 |
The swift endpoint will take you to 1 of the gateways.
From there look into the hosts file and then you can get to the monitors.
Server Configuration¶
All our OSD servers are Dell PowerEdge R730xd. Here’s the configuration of an OSD server.
Field | Value |
---|---|
CPU(s) | 40 |
Thread(s) per core | 2 |
Core(s) per socket | 10 |
Socket(s) | 2 |
Model Name | Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz |
RAM | 128 GB DDR3 |
Model | PowerEdge R730xd |
Networking | Dual 10GB Nics |
Storage | 10 X 10TB 7.2k rpm drives for "size" root |
Storage | 1 x 8TB PCIe nVme drive for "performance" root |
Storage | 2 X 128GB SSDs for OS |
We have 3 Dell PowerEdge R330. Each server acts as a monitor (mon), manager (mgr) and metadata (mds) server. Here’s the configuration of those servers:
Field | Value |
---|---|
CPU(s) | 4 |
Thread(s) per core | 1 |
Core(s) per socket | 4 |
Socket(s) | 1 |
Model Name | Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz |
RAM | 64 GB DDR3 |
Model | PowerEdge R330 |
Networking | Dual 10GB Nics |
Storage | 2 X 128GB SSDs for OS |
The Rados Gateways are virtual machines on our RHEV/ovirt cluster.
The test iSCSI server is a VM on the BMI intel node.
Here’s the output of ceph status which shows various ceph services.
[root@kzn-mon1 ~]# ceph -s
cluster:
id: ed141a90-e501-481c-90c2-b9e24a7c9b54
health: HEALTH_OK
services:
mon: 3 daemons, quorum kzn-mon1,kzn-mon2,kzn-mon3 (age 6d)
mgr: kzn-mon1(active, since 6d), standbys: kzn-mon2, kzn-mon3
mds: cephfs:1 {0=kzn-mon1=up:active} 2 up:standby
osd: 130 osds: 129 up (since 5d), 129 in (since 8d)
rgw: 4 daemons active (kzn-rgw-j-01, kzn-rgw-j-02, kzn-rgw1, kzn-rgw2)
tcmu-runner: 1 daemon active (kzn-vbmi01stack.kzn.moc:m2/ceph-iscsi-1)
task status:
scrub status:
mds.kzn-mon1: idle
data:
pools: 22 pools, 2000 pgs
objects: 50.08M objects, 76 TiB
usage: 223 TiB used, 910 TiB / 1.1 PiB avail
pgs: 1995 active+clean
5 active+clean+scrubbing+deep
io:
client: 43 MiB/s rd, 417 MiB/s wr, 781 op/s rd, 3.12k op/s wr
OSD tree¶
Here’s the output of ceph osd tree
.
[root@kzn-mon1 ~]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1002 72.99988 root performance
-35 7.29999 host kzn-osd05-nvme
272 ssd 7.29999 osd.272 up 1.00000 1.00000
-37 7.29999 host kzn-osd06-nvme
273 ssd 7.29999 osd.273 up 1.00000 1.00000
-33 7.29999 host kzn-osd07-nvme
274 ssd 7.29999 osd.274 up 1.00000 1.00000
-39 7.29999 host kzn-osd08-nvme
275 ssd 7.29999 osd.275 up 1.00000 1.00000
-41 7.29999 host kzn-osd09-nvme
276 ssd 7.29999 osd.276 up 1.00000 1.00000
-43 7.29999 host kzn-osd10-nvme
277 ssd 7.29999 osd.277 up 1.00000 1.00000
-45 7.29999 host kzn-osd11-nvme
278 ssd 7.29999 osd.278 up 1.00000 1.00000
-47 7.29999 host kzn-osd12-nvme
279 ssd 7.29999 osd.279 up 1.00000 1.00000
-49 7.29999 host kzn-osd13-nvme
280 ssd 7.29999 osd.280 up 1.00000 1.00000
-51 7.29999 host kzn-osd14-nvme
281 ssd 7.29999 osd.281 up 1.00000 1.00000
-1001 1069.16003 root size
-6 106.91600 host kzn-osd05
152 hdd 8.90999 osd.152 up 1.00000 1.00000
153 hdd 8.90999 osd.153 up 1.00000 1.00000
154 hdd 8.90999 osd.154 up 1.00000 1.00000
155 hdd 8.90999 osd.155 up 1.00000 1.00000
156 hdd 8.90999 osd.156 up 1.00000 1.00000
157 hdd 8.90999 osd.157 up 1.00000 1.00000
158 hdd 8.90999 osd.158 up 1.00000 1.00000
159 hdd 8.90999 osd.159 up 1.00000 1.00000
160 hdd 8.90999 osd.160 up 1.00000 1.00000
161 hdd 8.90999 osd.161 up 1.00000 1.00000
162 hdd 8.90999 osd.162 up 1.00000 1.00000
163 hdd 8.90999 osd.163 up 1.00000 1.00000
-7 106.91600 host kzn-osd06
164 hdd 8.90999 osd.164 up 1.00000 1.00000
165 hdd 8.90999 osd.165 up 1.00000 1.00000
166 hdd 8.90999 osd.166 up 1.00000 1.00000
167 hdd 8.90999 osd.167 up 1.00000 1.00000
168 hdd 8.90999 osd.168 up 1.00000 1.00000
169 hdd 8.90999 osd.169 up 1.00000 1.00000
170 hdd 8.90999 osd.170 up 1.00000 1.00000
171 hdd 8.90999 osd.171 up 1.00000 1.00000
172 hdd 8.90999 osd.172 up 1.00000 1.00000
173 hdd 8.90999 osd.173 up 1.00000 1.00000
174 hdd 8.90999 osd.174 up 1.00000 1.00000
175 hdd 8.90999 osd.175 up 1.00000 1.00000
-8 106.91600 host kzn-osd07
176 hdd 8.90999 osd.176 up 1.00000 1.00000
177 hdd 8.90999 osd.177 up 1.00000 1.00000
178 hdd 8.90999 osd.178 up 1.00000 1.00000
179 hdd 8.90999 osd.179 up 1.00000 1.00000
180 hdd 8.90999 osd.180 up 1.00000 1.00000
181 hdd 8.90999 osd.181 up 1.00000 1.00000
182 hdd 8.90999 osd.182 up 1.00000 1.00000
183 hdd 8.90999 osd.183 up 1.00000 1.00000
184 hdd 8.90999 osd.184 up 1.00000 1.00000
185 hdd 8.90999 osd.185 up 1.00000 1.00000
186 hdd 8.90999 osd.186 up 1.00000 1.00000
187 hdd 8.90999 osd.187 up 1.00000 1.00000
-9 106.91600 host kzn-osd08
188 hdd 8.90999 osd.188 up 1.00000 1.00000
189 hdd 8.90999 osd.189 up 1.00000 1.00000
190 hdd 8.90999 osd.190 up 1.00000 1.00000
191 hdd 8.90999 osd.191 up 1.00000 1.00000
192 hdd 8.90999 osd.192 up 1.00000 1.00000
193 hdd 8.90999 osd.193 up 1.00000 1.00000
194 hdd 8.90999 osd.194 up 1.00000 1.00000
195 hdd 8.90999 osd.195 up 1.00000 1.00000
196 hdd 8.90999 osd.196 up 1.00000 1.00000
197 hdd 8.90999 osd.197 up 1.00000 1.00000
198 hdd 8.90999 osd.198 up 1.00000 1.00000
199 hdd 8.90999 osd.199 up 1.00000 1.00000
-21 106.91600 host kzn-osd09
200 hdd 8.90999 osd.200 up 1.00000 1.00000
201 hdd 8.90999 osd.201 up 1.00000 1.00000
202 hdd 8.90999 osd.202 up 1.00000 1.00000
203 hdd 8.90999 osd.203 up 1.00000 1.00000
204 hdd 8.90999 osd.204 up 1.00000 1.00000
205 hdd 8.90999 osd.205 up 1.00000 1.00000
206 hdd 8.90999 osd.206 up 1.00000 1.00000
207 hdd 8.90999 osd.207 up 1.00000 1.00000
208 hdd 8.90999 osd.208 up 1.00000 1.00000
209 hdd 8.90999 osd.209 up 1.00000 1.00000
210 hdd 8.90999 osd.210 up 1.00000 1.00000
211 hdd 8.90999 osd.211 up 1.00000 1.00000
-23 106.91600 host kzn-osd10
212 hdd 8.90999 osd.212 up 1.00000 1.00000
213 hdd 8.90999 osd.213 up 1.00000 1.00000
214 hdd 8.90999 osd.214 up 1.00000 1.00000
215 hdd 8.90999 osd.215 up 1.00000 1.00000
216 hdd 8.90999 osd.216 up 1.00000 1.00000
217 hdd 8.90999 osd.217 up 1.00000 1.00000
218 hdd 8.90999 osd.218 up 1.00000 1.00000
219 hdd 8.90999 osd.219 up 1.00000 1.00000
220 hdd 8.90999 osd.220 up 1.00000 1.00000
221 hdd 8.90999 osd.221 up 1.00000 1.00000
222 hdd 8.90999 osd.222 up 1.00000 1.00000
223 hdd 8.90999 osd.223 up 1.00000 1.00000
-25 106.91600 host kzn-osd11
224 hdd 8.90999 osd.224 up 1.00000 1.00000
225 hdd 8.90999 osd.225 up 1.00000 1.00000
226 hdd 8.90999 osd.226 up 1.00000 1.00000
227 hdd 8.90999 osd.227 up 1.00000 1.00000
228 hdd 8.90999 osd.228 up 1.00000 1.00000
229 hdd 8.90999 osd.229 up 1.00000 1.00000
230 hdd 8.90999 osd.230 up 1.00000 1.00000
231 hdd 8.90999 osd.231 up 1.00000 1.00000
232 hdd 8.90999 osd.232 up 1.00000 1.00000
233 hdd 8.90999 osd.233 up 1.00000 1.00000
234 hdd 8.90999 osd.234 up 1.00000 1.00000
235 hdd 8.90999 osd.235 up 1.00000 1.00000
-27 106.91600 host kzn-osd12
236 hdd 8.90999 osd.236 up 1.00000 1.00000
237 hdd 8.90999 osd.237 up 1.00000 1.00000
238 hdd 8.90999 osd.238 up 1.00000 1.00000
239 hdd 8.90999 osd.239 up 1.00000 1.00000
240 hdd 8.90999 osd.240 up 1.00000 1.00000
241 hdd 8.90999 osd.241 up 1.00000 1.00000
242 hdd 8.90999 osd.242 up 1.00000 1.00000
243 hdd 8.90999 osd.243 up 1.00000 1.00000
244 hdd 8.90999 osd.244 up 1.00000 1.00000
245 hdd 8.90999 osd.245 up 1.00000 1.00000
246 hdd 8.90999 osd.246 up 1.00000 1.00000
247 hdd 8.90999 osd.247 up 1.00000 1.00000
-29 106.91600 host kzn-osd13
248 hdd 8.90999 osd.248 up 1.00000 1.00000
249 hdd 8.90999 osd.249 up 1.00000 1.00000
250 hdd 8.90999 osd.250 up 1.00000 1.00000
251 hdd 8.90999 osd.251 up 1.00000 1.00000
252 hdd 8.90999 osd.252 up 1.00000 1.00000
253 hdd 8.90999 osd.253 down 0 1.00000
254 hdd 8.90999 osd.254 up 1.00000 1.00000
255 hdd 8.90999 osd.255 up 1.00000 1.00000
256 hdd 8.90999 osd.256 up 1.00000 1.00000
257 hdd 8.90999 osd.257 up 1.00000 1.00000
258 hdd 8.90999 osd.258 up 1.00000 1.00000
259 hdd 8.90999 osd.259 up 1.00000 1.00000
-31 106.91600 host kzn-osd14
260 hdd 8.90999 osd.260 up 1.00000 1.00000
261 hdd 8.90999 osd.261 up 1.00000 1.00000
262 hdd 8.90999 osd.262 up 1.00000 1.00000
263 hdd 8.90999 osd.263 up 1.00000 1.00000
264 hdd 8.90999 osd.264 up 1.00000 1.00000
265 hdd 8.90999 osd.265 up 1.00000 1.00000
266 hdd 8.90999 osd.266 up 1.00000 1.00000
267 hdd 8.90999 osd.267 up 1.00000 1.00000
268 hdd 8.90999 osd.268 up 1.00000 1.00000
269 hdd 8.90999 osd.269 up 1.00000 1.00000
270 hdd 8.90999 osd.270 up 1.00000 1.00000
271 hdd 8.90999 osd.271 up 1.00000 1.00000
-1 0 root default
The hosts are from kzn-osd05
to kzn-osd14
.
Hosts kzn-osd01
to kzn-osd04
were the fujitsu servers that used to host the performance root, those machines are now decommisioned/repurposed.
You’ll notice that for every kzn-osdX
host we have a corresponding kzn-osdX-nvme
this was done to trick ceph into letting us
put the nVme drives in a different root even though they are in the same physical host.
This configuration flag must be set to false, otherwise ceph will move the nvmes to their respective hosts.
[root@kzn-mon1 ~]# grep "osd crush update on start" /etc/ceph/ceph.conf
osd crush update on start = false
Pools¶
Here are the current pools we have. This will, of course, change over time.
[root@kzn-mon1 ~]# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 1.0 PiB 861 TiB 199 TiB 199 TiB 18.78
ssd 73 TiB 48 TiB 24 TiB 24 TiB 33.39
TOTAL 1.1 PiB 910 TiB 223 TiB 223 TiB 19.72
POOLS:
POOL ID STORED OBJECTS USED %USED MAX AVAIL
.rgw.root 72 1.1 KiB 7 1.1 MiB 0 239 TiB
rbd 92 45 GiB 11.71k 135 GiB 0.02 239 TiB
.rgw 112 541 KiB 2.39k 448 MiB 0 239 TiB
.rgw.gc 113 173 KiB 32 13 MiB 0 239 TiB
.users.uid 114 226 KiB 278 36 MiB 0 239 TiB
.usage 115 65 MiB 32 65 MiB 0 239 TiB
.rgw.buckets.index 116 45 MiB 1.21k 45 MiB 0 239 TiB
.rgw.control 119 0 B 8 0 B 0 239 TiB
.rgw.buckets 133 6.7 TiB 1.94M 10 TiB 1.43 477 TiB
.log 139 28 B 177 192 KiB 0 239 TiB
.users 141 121 B 3 576 KiB 0 239 TiB
.rgw.buckets.extra 143 132 KiB 5 1.1 MiB 0 239 TiB
m2 144 6.6 TiB 1.74M 20 TiB 2.70 239 TiB
ostack2 145 42 TiB 40.56M 129 TiB 15.30 239 TiB
performance2 146 8.1 TiB 2.12M 24 TiB 37.15 14 TiB
cephfs 149 13 TiB 3.72M 39 TiB 5.16 239 TiB
cephfsmeta 150 257 MiB 158 771 MiB 0 239 TiB
openshift4 153 532 MiB 194 1.6 GiB 0 239 TiB
default.rgw.meta 159 0 B 0 0 B 0 239 TiB
default.rgw.control 160 0 B 0 0 B 0 239 TiB
default.rgw.log 161 0 B 0 0 B 0 239 TiB
.users.email 168 0 B 0 0 B 0 32 TiB
I have enabled the pg autoscaler so it automatillcay manages the number of placement groups for each pool.
[root@kzn-mon1 ~]# ceph osd pool autoscale-status
POOL SIZE TARGET SIZE RATE RAW CAPACITY RATIO TARGET RATIO EFFECTIVE RATIO BIAS PG_NUM NEW PG_NUM AUTOSCALE
.users.email 0 1.28571426868 74520G 0.0000 1.0 16 on
.users.uid 225.8k 3.0 1060T 0.0000 1.0 16 on
.rgw 541.0k 3.0 1060T 0.0000 1.0 32 on
.rgw.buckets 6837G 1.5 1060T 0.0094 1.0 64 on
openshift4 532.3M 3.0 1060T 0.0000 1.0 64 on
.rgw.gc 173.1k 3.0 1060T 0.0000 1.0 16 on
performance2 8271G 3.0 74520G 0.3330 1.0 256 on
default.rgw.meta 0 3.0 1060T 0.0000 1.0 16 on
.log 28 3.0 1060T 0.0000 1.0 16 on
m2 6778G 3.0 1060T 0.0187 1.0 128 on
default.rgw.log 0 3.0 1060T 0.0000 1.0 16 on
.rgw.buckets.extra 132.4k 3.0 1060T 0.0000 1.0 32 on
.rgw.buckets.index 46307k 3.0 1060T 0.0000 1.0 32 on
.rgw.root 1090 3.0 1060T 0.0000 1.0 16 on
rbd 46038M 3.0 1060T 0.0001 1.0 32 on
.users 121 3.0 1060T 0.0000 1.0 16 on
.rgw.control 0 3.0 1060T 0.0000 1.0 16 on
default.rgw.control 0 3.0 1060T 0.0000 1.0 16 on
.usage 66180k 3.0 1060T 0.0000 1.0 32 on
ostack2 42525G 3.0 1060T 0.1175 1.0 1024 on
cephfsmeta 256.9M 3.0 1060T 0.0000 1.0 16 on
cephfs 13303G 3.0 1060T 0.0368 1.0 128 on
Rados Gateways¶
We have 4 RGWs, setup in pairs of 2.
The first pair, kzn-rgw1
and kzn-rgw2
, are used for OpenStack Swift and as such are configured to work with keystone. These have public IP addresses.
The second pair, kzn-rgw-j-01
and kzn-rgw-j-02
, are used by the baremetal OpenShift 4.X cluster with the OCS operator. These hosts do not have a public IP since it’s not required.
Both pairs of RGWs have a virtual IP that is managed by corosync
, pcs
, and pacemaker
for high availibilty. To check that status of the cluster, run pcs status
or crm_mon
[root@kzn-rgw-j-01 ~]# pcs status
Cluster name: openshift_rgws
Stack: corosync
Current DC: kzn-rgw-j-02 (version 1.1.23-1.el7_9.1-9acf116022) - partition with quorum
Last updated: Fri May 14 10:41:42 2021
Last change: Fri May 14 10:41:35 2021 by hacluster via crmd on kzn-rgw-j-01
2 nodes configured
1 resource instance configured
Online: [ kzn-rgw-j-01 kzn-rgw-j-02 ]
Full list of resources:
virtual_ip (ocf::heartbeat:IPaddr2): Started kzn-rgw-j-01
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
You can notice that there are 2 hosts that are online and that the virtual_ip
is currently assigned to host kzn-rgw-j-01
.