A failed experiment with GlusterFS
GlusterFS is a clustered file system that can be used when you want to share the content across different machines which can be accomplished by NFS as well. But the difference is, NFS failover is hard.
In GlusterFS, you can add two servers known as bricks in Gluster’s terminology on which your volume can be created as a replica. All data is replicated to both the servers by Gluster. GlusterFS has support for advertising the volumes as NFS shares as well, but I didn’t use it because of the basic reason – failover.
The native GlusterFS client can do automatic failover when one of the server dies. When both the servers are up, it will read data in parallel to get better throughput. When the dead server comes back up, all data changes are synced between the two by a process called healing.
GlusterFS as such is a great product and the developers have put a lot of work behind the same. But it has lots of issues and it’s probably a bad idea to use it without Redhat support. The community support is not so good due to low number of community users. The issues may not be perfectly reproducible again. Clustering is a very hard thing to do, that too at a scale of lots of IOPS per second. GlusterFS gets it right to a great extent.
Some detail about the environment where I used GlusterFS 3.4.4:
- Two dedicated servers: i7-3770, 32 GB RAM and RAID10 over 4 disks.
- 1 Gbit line between the two, both of them located in the same DC. The connection was common for internal network as well as the Internet.
- Gentoo Linux
- Both servers acting as Gluster clients as well.
- Gluster volume for home directories, which contains lots of websites (WordPress mostly, Joomla, etc.)
- One of the servers is a database server (MySQL, PostgreSQL) along with a serving node (i.e. PHP, Apache).
Before trying out Gluster, I did read recommendations that the Gluster server should be run on it’s own servers and no other services should be run on the same. But anyway, I gave it a shot and I don’t think that because of running it along with was the root cause of the problem.
Since one of the servers was a database server as well, I set my load balancer, HAProxy to forward 60% of the traffic to the node without database and 40% of the traffic to the node with database, so that the database servers get enough CPU for themselves. Database response is important for any website. All WordPress sites were configured with W3 Total Cache and some of the sites were high traffic ones.
The IO throughput of Gluster was good when the volumes were mounted on each of the nodes. But for some reason, I there was frequent split-brain between the two bricks. When a split-brain occurs and Gluster isn’t able to heal it by it’s self healing daemon, the file throws input/output error on the client. This used to happen only on one of the nodes and I don’t know the reason for the same.
The weirder part is that split-brain occurred on random files and random times. Sometimes it would be the .htaccess or wp-config.php, and I can vouch that neither of those were modified on any node when the split brain occurred.
This would cause HTTP Error 500 to be thrown by Apache because the file cannot be read by PHP or Apache, ultimately causing troubles for clients. Whenever the traffic hit the second node, people would see random 500s or some assets would fail to load (again due to IO Errors).
Another issue was servers crashing due to excessive CPU load, on a machine with 8 CPUs, 4 real 4 logical (Intel hyper threading) the maximum load shouldn’t cross 8.00. You can extend it a bit to 10.00. The initial data sync between the two servers caused the load to cross 50-60. The server would simply crash and require a hard reboot to come back. And again the same thing would happen.
In short, it was a nightmare for me dealing with Gluster. I tried Gluster mailing list, and someone from Redhat has replied, but that doesn’t seem to contain any solution. It would be apt to say that Gluster has taken away more hair away from my head during this month of experiment than what I lost in the past year.
I have now moved off Gluster, to a traditional NFS mount. One server contains all the data and it’s simply NFS-mounted on the other node. This seems to perform far better than Gluster. The IO performance I’m getting out of NFS is better, as well as CPU usage is low.
But indeed, NFS doesn’t give me the advantage of Gluster that is failover. It seems NFS can have the failover thing as well if it is combined with DRBD over two servers. I’ll give it a thought next time, and I definitely think it would perform better than Gluster. Or may be not. That’s a future experiment.
So, for TL;DR, don’t use Gluster without Redhat’s support.