A few weeks back I wondered why my Ceph backed KVM virtual machines were running so slowly. I quickly found the source in poor I/O performance on the virtual hard drives. Since those were located in Ceph the problem had to be there.
I tried a ton of tips found in several articles on how to improve Ceph performance. I tweaked settings, increased the journal size and even thought about buying some SSDs for the journals.
I am ashamed to admit that I did not think about the most obvious solution myself. A coworker asked me: “Do you think adding more OSDs to the cluster would help?”. Then it dawned on me, that too few OSDs were indeed the problem. Too many I/O operations were distributed over too few hard drives (about 20-30 VMs over 5 HDDs). So I added another three OSDs and the performance became much better!
So if anyone stumbles over this article: Check whether your cluster is spread over enough hard drives before trying anything else 🙂