One of the things I do on the side other than the local science fair (see “Science Fair Rules”) is help out at a school that my children attended. They are now married and I have one grandson already so it has been a while since they were there.
I help manage their computers and network. The servers have progressed from a single mini-tower with a pair of disks in a RAID-1 configuration to the current incarnation. The latest server configuration (Fig. 1) has a good bit of redundancy in it so data does not disappear due to hardware failures.
At the top of the hierarchy are a pair of Super Microcomputer (SuperMicro) AS-2022 systems running CentOS 6 that hosts almost a dozen VMs. These include everything including gateway, email, web servers, library server, inventory server, antivirus server and a PBX. I get lots of calls if things go down. There are a few more servers that are independent including backup and management services.
The CentOS systems are linked with the cluster software that uses DRBD-based storage. DRBD (distributed replicated block device) is commonly used to replicate storage across multiple servers. Gigabit Ethernet connections support the data exchange. One of these days I will get a bonded Ethernet pair working. For now a single link meets our data needs.
Our DRBD system uses RAID-1 disks. DRBD provides a level of redundancy as does RAID but I have need for both. DRBD is needed for the cluster but I have swapped out bad disks without having to take down the system.
I am trying to simplify the system. I originally had an RAID-LVM-DRBD-LVM-GFS2 layout. I am trying to get down to a RAID-DRBD-GFS2 system but that will require a major reconfiguration. On the plus side, the VMs don't care.
There is duplication on the power side as well. The AS-2022 has two hot-swappable power supplies. The system can run with just one. These are cross linked to a pair of UPSes in the same rack.
Lately I added another UPS that handles the management servers and switches but it is really a backup for the other two which actually failed to come up after a power outage. Turns out the batteries failed. At least there was no data loss. I did have to run for a couple days without any UPS though. I was not a happy camper.
I have also moved to networked-based management of the UPSes. I had been getting SNMP management going for most of the environment. The management server runs Centreon networking monitoring software. As you may guess, most of the server software is open source with the exception of the antivirus software.
Remote management is the name of the game since I don't get over to the school very often. This is another reason for the amount of redundancy in the system. A hardware failure is usually not fatal although the battery failure did require a visit.
I have redundant links into the management network but the Comcast gateway is still a single point of failure. There is also the main Ethernet switch
Compared to the cloud servers this set up is minor but it has most of the features found in those systems. Most of the components in the SuperMicro server like fans and power supplies are easily replaced and all the disks are hot-swappable.
Of course, redundancy is not the silver bullet for all things computer. I have had my share of software SNAFUs like not having enough backup space and not trying to do a recovery before backups got overwritten. I still have some issues with automatic system recovery but things could be a lot worse without the redundancy I have built into the system. So hopefully, short of a meteor strike, things will continue to chug along. At least until I have to update all the software.