Ceph Distributed Storage: Complete Production Deployment Guide

Ceph is a highly scalable, distributed storage system that provides object, block, and file storage in a unified system. Designed for performance, reliability, and scalability, Ceph is used by organizations worldwide for petabyte-scale deployments. This comprehensive guide covers Ceph architecture, deployment, and production best practices.

What is Ceph?

Ceph is an open-source, software-defined storage platform that delivers:

Key Features

Unified Storage: Object (RGW), Block (RBD), and File (CephFS) storage
Scalability: Scale from gigabytes to exabytes
No Single Point of Failure: Fully distributed architecture
Self-Healing: Automatic data replication and recovery
Performance: Parallel access to data across cluster
CRUSH Algorithm: Intelligent data distribution

Ceph vs. Other Storage Solutions

Feature	Ceph	GlusterFS	MinIO	Traditional SAN
Object Storage	✅ RGW	❌ No	✅ Native	❌ No
Block Storage	✅ RBD	❌ No	❌ No	✅ Yes
File Storage	✅ CephFS	✅ Yes	❌ No	✅ NFS/SMB
Scalability	Petabytes+	Petabytes	Exabytes	Limited
Self-Healing	✅ Yes	✅ Yes	⚠️ Erasure coding	❌ Manual
Cost	Hardware only	Hardware only	Hardware only	High (HW+SW)

Architecture

Ceph Components

┌──────────────────────────────────────────────────────────────┐
│                    Client Layer                               │
│                                                                │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐              │
│  │   RBD    │    │   RGW    │    │  CephFS  │              │
│  │ (Block)  │    │ (Object) │    │  (File)  │              │
│  └────┬─────┘    └────┬─────┘    └────┬─────┘              │
│       │               │               │                       │
└───────┼───────────────┼───────────────┼───────────────────────┘
        │               │               │
        └───────────────┴───────────────┘
                        │
        ┌───────────────┴───────────────┐
        ▼                               ▼
┌──────────────┐                 ┌──────────────┐
│   librados   │                 │   CRUSH Map  │
│ (C, Python,  │◄───────────────►│ (Data        │
│  Java, etc)  │                 │  Placement)  │
└──────┬───────┘                 └──────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────────┐
│                    RADOS (Storage Layer)                      │
│                                                                │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐              │
│  │   MON    │◄──►│   MON    │◄──►│   MON    │              │
│  │(Monitor) │    │(Monitor) │    │(Monitor) │              │
│  └──────────┘    └──────────┘    └──────────┘              │
│       ▲               ▲               ▲                       │
│       └───────────────┴───────────────┘                       │
│                       │                                        │
│  ┌────────────────────┼────────────────────────┐             │
│  │    ┌───────────────┴───────────────┐        │             │
│  │    ▼               ▼               ▼        │             │
│  │  ┌────┐         ┌────┐         ┌────┐      │             │
│  │  │OSD1│         │OSD2│         │OSD3│      │             │
│  │  │ ┌──┴──┐      │ ┌──┴──┐      │ ┌──┴──┐   │             │
│  │  │ │Disk1│      │ │Disk2│      │ │Disk3│   │             │
│  │  └─┴─────┘      └─┴─────┘      └─┴─────┘   │             │
│  │                                               │             │
│  │  ┌────┐         ┌────┐         ┌────┐      │             │
│  │  │OSD4│         │OSD5│         │OSD6│      │             │
│  │  │ ┌──┴──┐      │ ┌──┴──┐      │ ┌──┴──┐   │             │
│  │  │ │Disk4│      │ │Disk5│      │ │Disk6│   │             │
│  │  └─┴─────┘      └─┴─────┘      └─┴─────┘   │             │
│  └────────────────────────────────────────────┘             │
│                                                                │
│  ┌──────────┐    ┌──────────┐                                │
│  │   MGR    │◄──►│   MGR    │                                │
│  │(Manager) │    │(Manager) │                                │
│  └──────────┘    └──────────┘                                │
│                                                                │
│  ┌──────────┐    ┌──────────┐                                │
│  │   MDS    │◄──►│   MDS    │    (CephFS only)              │
│  │(Metadata)│    │(Metadata)│                                │
│  └──────────┘    └──────────┘                                │
└──────────────────────────────────────────────────────────────┘

Key Components

MON (Monitor): Maintains cluster maps, requires odd number (3, 5, 7)
OSD (Object Storage Daemon): Stores data, handles replication, recovery
MGR (Manager): Cluster monitoring, dashboard, metrics
MDS (Metadata Server): CephFS metadata management
RGW (RADOS Gateway): S3/Swift-compatible object storage API
RBD (RADOS Block Device): Block device interface

Installation and Deployment

System Requirements

Minimum Test Cluster:

3 nodes (can combine MON+OSD)
4 CPU cores per node
8 GB RAM per node (+ 2 GB per OSD)
10 GB for OS + dedicated disks for OSDs
1 Gbps network

Production Cluster:

3+ dedicated MON nodes
3+ OSD nodes (more recommended)
16+ CPU cores per OSD node
64+ GB RAM per OSD node
Enterprise SSDs/NVMe for OSDs
10/25/100 Gbps network (separate public/cluster networks)

Deployment with cephadm (Recommended)

# Install cephadm on admin node
curl --silent --remote-name --location \
  https://github.com/ceph/ceph/raw/quincy/src/cephadm/cephadm

chmod +x cephadm
mkdir -p /etc/ceph

# Add Ceph repository
./cephadm add-repo --release quincy
./cephadm install

# Bootstrap first monitor
cephadm bootstrap \
  --mon-ip 10.0.1.11 \
  --cluster-network 10.0.2.0/24 \
  --initial-dashboard-user admin \
  --initial-dashboard-password 'SecurePassword123!' \
  --ssh-user root

# The bootstrap command will output the dashboard URL and credentials
# Save the admin keyring and config
cephadm shell -- ceph config generate-minimal-conf > /etc/ceph/ceph.conf
cephadm shell -- ceph auth get client.admin > /etc/ceph/ceph.client.admin.keyring

# Install ceph-common for CLI access
cephadm install ceph-common

# Verify cluster status
ceph -s
ceph health detail

Add Nodes to Cluster

# Copy SSH key to new nodes
ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-node2
ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph-node3

# Add nodes to cluster
ceph orch host add ceph-node2 10.0.1.12
ceph orch host add ceph-node3 10.0.1.13

# Label nodes for specific roles
ceph orch host label add ceph-node2 mon
ceph orch host label add ceph-node3 mon

# List hosts
ceph orch host ls

# Deploy additional monitors
ceph orch apply mon "ceph-node1,ceph-node2,ceph-node3"

# Deploy managers
ceph orch apply mgr --placement="3 ceph-node1 ceph-node2 ceph-node3"

Add OSDs

# List available devices
ceph orch device ls

# Add all available devices as OSDs
ceph orch apply osd --all-available-devices

# Add specific device
ceph orch daemon add osd ceph-node2:/dev/sdb

# Add OSD with separate DB/WAL device (SSD/NVMe)
ceph orch daemon add osd ceph-node2:data_devices=/dev/sdc,db_devices=/dev/nvme0n1

# Advanced OSD specification
cat > osd-spec.yaml << 'EOF'
service_type: osd
service_id: default_drive_group
placement:
  host_pattern: 'ceph-node*'
data_devices:
  all: true
db_devices:
  paths:
    - /dev/nvme0n1
    - /dev/nvme1n1
wal_devices:
  paths:
    - /dev/nvme0n1
    - /dev/nvme1n1
EOF

ceph orch apply -i osd-spec.yaml

# View OSD status
ceph osd tree
ceph osd stat

Pool Configuration

Create Pools

# Calculate PG number (recommended: 100-200 PGs per OSD)
# Formula: (Target PGs per OSD) × (OSDs) / (Replica size) = PG number
# Round to nearest power of 2

# Create replicated pool
ceph osd pool create rbd-pool 128 128 replicated

# Create erasure-coded pool
ceph osd erasure-code-profile set ec-profile \
  k=4 m=2 \
  crush-failure-domain=host

ceph osd pool create ec-pool 128 128 erasure ec-profile

# Set pool application
ceph osd pool application enable rbd-pool rbd
ceph osd pool application enable ec-pool rgw

# Pool with custom CRUSH rule
ceph osd crush rule create-replicated ssd-rule default host ssd
ceph osd pool set rbd-pool crush_rule ssd-rule

# Configure pool parameters
ceph osd pool set rbd-pool size 3        # Replica count
ceph osd pool set rbd-pool min_size 2    # Min replicas for I/O
ceph osd pool set rbd-pool pg_autoscale_mode on

# Pool quotas
ceph osd pool set-quota rbd-pool max_bytes $((10 * 1024**4))  # 10 TB
ceph osd pool set-quota rbd-pool max_objects 1000000

# List pools
ceph osd pool ls detail

RBD (Block Storage)

Create and Use RBD Images

# Create RBD image
rbd create rbd-pool/disk1 --size 100G

# Create with features
rbd create rbd-pool/disk2 \
  --size 500G \
  --image-feature layering,exclusive-lock,object-map,fast-diff

# List images
rbd ls rbd-pool
rbd info rbd-pool/disk1

# Resize image
rbd resize rbd-pool/disk1 --size 200G

# Create snapshot
rbd snap create rbd-pool/disk1@snap1

# List snapshots
rbd snap ls rbd-pool/disk1

# Clone from snapshot
rbd snap protect rbd-pool/disk1@snap1
rbd clone rbd-pool/disk1@snap1 rbd-pool/disk1-clone

# Map RBD device
rbd map rbd-pool/disk1
# Creates /dev/rbd0

# Format and mount
mkfs.ext4 /dev/rbd0
mount /dev/rbd0 /mnt/ceph-disk

# Unmap
umount /mnt/ceph-disk
rbd unmap /dev/rbd0

# Delete image
rbd rm rbd-pool/disk1

RBD with Kubernetes

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-rbd
provisioner: rbd.csi.ceph.com
parameters:
  clusterID: b9127830-b0cc-4e34-aa47-9d1a2e9949a8
  pool: rbd-pool
  imageFeatures: layering,exclusive-lock,object-map,fast-diff
  csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
  csi.storage.k8s.io/provisioner-secret-namespace: ceph-csi
  csi.storage.k8s.io/controller-expand-secret-name: csi-rbd-secret
  csi.storage.k8s.io/controller-expand-secret-namespace: ceph-csi
  csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
  csi.storage.k8s.io/node-stage-secret-namespace: ceph-csi
  csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
  - discard
---
# pvc.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ceph-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: ceph-rbd

CephFS (File Storage)

Create CephFS

# Create pools for CephFS
ceph osd pool create cephfs_data 128
ceph osd pool create cephfs_metadata 64

# Create filesystem
ceph fs new cephfs cephfs_metadata cephfs_data

# Verify
ceph fs ls
ceph fs status cephfs

# Create MDS daemons
ceph orch apply mds cephfs --placement="3 ceph-node1 ceph-node2 ceph-node3"

# Mount CephFS (kernel client)
mount -t ceph 10.0.1.11:6789:/ /mnt/cephfs \
  -o name=admin,secret=AQBsomething==

# Mount with ceph-fuse (userspace client)
ceph-fuse /mnt/cephfs -n client.admin

# Persistent mount in /etc/fstab
# 10.0.1.11:6789:/    /mnt/cephfs    ceph    name=admin,secretfile=/etc/ceph/admin.secret,_netdev,noatime    0 2

# Create subdirectories with quotas
mkdir /mnt/cephfs/project1
setfattr -n ceph.quota.max_bytes -v 100000000000 /mnt/cephfs/project1  # 100 GB
setfattr -n ceph.quota.max_files -v 1000000 /mnt/cephfs/project1

CephFS Subvolumes

# Create volume
ceph fs volume create myfs

# Create subvolume group
ceph fs subvolumegroup create myfs group1

# Create subvolume
ceph fs subvolume create myfs sub1 --group_name group1 --size 10737418240  # 10 GB

# Get subvolume path
ceph fs subvolume getpath myfs sub1 --group_name group1

# Create snapshot
ceph fs subvolume snapshot create myfs sub1 snap1 --group_name group1

# Clone subvolume
ceph fs subvolume snapshot clone myfs sub1 snap1 sub1-clone --group_name group1

# Delete subvolume
ceph fs subvolume rm myfs sub1 --group_name group1

RGW (Object Storage)

Deploy RGW

# Deploy RGW daemon
ceph orch apply rgw myrgw \
  --placement="2 ceph-node1 ceph-node2" \
  --port=8080

# Verify
ceph orch ps --daemon-type rgw

# Create RGW user
radosgw-admin user create \
  --uid=johndoe \
  --display-name="John Doe" \
  --email=john@example.com

# Output includes access_key and secret_key

# Grant admin privileges
radosgw-admin caps add \
  --uid=johndoe \
  --caps="users=*;buckets=*;metadata=*;usage=*;zone=*"

# Create subuser for Swift
radosgw-admin subuser create \
  --uid=johndoe \
  --subuser=johndoe:swift \
  --access=full \
  --secret=secretkey123

Use RGW with S3

import boto3

# Configure S3 client
s3 = boto3.client(
    's3',
    endpoint_url='http://10.0.1.11:8080',
    aws_access_key_id='ACCESS_KEY',
    aws_secret_access_key='SECRET_KEY'
)

# Create bucket
s3.create_bucket(Bucket='my-bucket')

# Upload object
s3.upload_file('local-file.txt', 'my-bucket', 'remote-file.txt')

# Download object
s3.download_file('my-bucket', 'remote-file.txt', 'downloaded-file.txt')

# List objects
response = s3.list_objects_v2(Bucket='my-bucket')
for obj in response.get('Contents', []):
    print(obj['Key'])

# Delete object
s3.delete_object(Bucket='my-bucket', Key='remote-file.txt')

Performance Tuning

OSD Tuning

# BlueStore cache (per OSD)
ceph config set osd bluestore_cache_size_hdd 4294967296  # 4 GB for HDD
ceph config set osd bluestore_cache_size_ssd 8589934592  # 8 GB for SSD

# Thread pool
ceph config set osd osd_op_num_threads_per_shard 2
ceph config set osd osd_op_num_shards 8

# Recovery tuning
ceph config set osd osd_recovery_max_active 3
ceph config set osd osd_max_backfills 1

# Scrubbing
ceph config set osd osd_scrub_begin_hour 1
ceph config set osd osd_scrub_end_hour 6
ceph config set osd osd_scrub_during_recovery false

Network Optimization

# Separate public and cluster networks
# Public network: Client traffic
# Cluster network: Replication and recovery

# Configure in ceph.conf
cat >> /etc/ceph/ceph.conf << 'EOF'
[global]
public_network = 10.0.1.0/24
cluster_network = 10.0.2.0/24

# Network tuning
ms_bind_port_min = 6800
ms_bind_port_max = 7300
EOF

# Apply to all OSDs
ceph config set osd public_network 10.0.1.0/24
ceph config set osd cluster_network 10.0.2.0/24

Client-Side Tuning

# RBD cache
rbd config global set rbd_cache true
rbd config global set rbd_cache_size 67108864  # 64 MB

# CephFS client cache
ceph config set client client_cache_size 1073741824  # 1 GB

Monitoring and Maintenance

Ceph Dashboard

# Dashboard is enabled by default with cephadm
# Access at https://ceph-node1:8443

# Enable additional modules
ceph mgr module enable prometheus
ceph mgr module enable diskprediction_local
ceph mgr module enable telemetry

# Dashboard user management
ceph dashboard ac-user-create admin password administrator
ceph dashboard ac-user-set-roles admin administrator

Monitoring Commands

# Cluster status
ceph -s
ceph health detail

# OSD status
ceph osd status
ceph osd tree
ceph osd df

# Pool usage
ceph df
ceph osd pool stats

# PG status
ceph pg stat
ceph pg dump

# Performance stats
ceph osd perf
ceph daemonperf osd.0

# MON status
ceph mon stat
ceph quorum_status -f json-pretty

# Check slow requests
ceph daemon osd.0 dump_historic_ops

Prometheus Integration

# Prometheus is enabled by default
# Example prometheus.yml
cat >> prometheus.yml << 'EOF'
scrape_configs:
  - job_name: 'ceph'
    static_configs:
      - targets: ['ceph-node1:9283', 'ceph-node2:9283']
EOF

Backup and Disaster Recovery

RBD Snapshots and Backups

# Create snapshot
rbd snap create rbd-pool/disk1@backup-$(date +%Y%m%d)

# Export snapshot
rbd export rbd-pool/disk1@backup-20260210 /backup/disk1-20260210.img

# Incremental backup
rbd export-diff rbd-pool/disk1@backup-20260210 /backup/disk1-diff-20260210.img

# Import backup
rbd import /backup/disk1-20260210.img rbd-pool/disk1-restored

# RBD mirroring for DR
ceph mgr module enable rbd_support
rbd mirror pool enable rbd-pool image
rbd mirror image enable rbd-pool/disk1 snapshot

CephFS Snapshots

# Enable snapshots
ceph fs set cephfs allow_new_snaps true

# Create snapshot
mkdir /mnt/cephfs/.snap/backup-$(date +%Y%m%d)

# List snapshots
ls /mnt/cephfs/.snap/

# Remove snapshot
rmdir /mnt/cephfs/.snap/backup-20260210

Production Checklist

Infrastructure

Configuration

Monitoring

Security

Operations

Conclusion

Ceph provides enterprise-grade distributed storage with exceptional scalability and flexibility. Its unified storage approach eliminates the need for separate storage systems, while the self-healing capabilities ensure data durability and availability.

Success with Ceph requires proper hardware selection, careful capacity planning, and ongoing operational expertise. Organizations that invest in Ceph gain a powerful, open-source storage platform capable of scaling from terabytes to exabytes while maintaining high performance and reliability.

Master storage technologies including Ceph with our infrastructure training programs. Contact us for customized training designed for your team’s needs.