Mistakes And Backups

I actually wrote this blog entry once before but the Internet ate it. Actually it was a combination of my network, Opera and anything else in between. I made the usual mistake of writing directly in the data entry tool for our blogs. The whole article was done. I had clicked on the Save button. And, ...........

As you might guess, the article didn't save. It could have been just about anything from our website to the browser. In any case, this iteration goes back to my usual process of writing the article offline. Using the cloud, or any web-based solution, is handy but dependent on many factors. I don't recommend relying on it.

In any case, I was going to add some comments on my recent article about Intel's SATA bug in the Sandy Bridge hub controller (see Bad Transistor Causes Billion Dollar Mistake). Instead, I've decide instead to rant about my recent server upgrades.

I usually run three servers in the lab so I have two levels of hot backup. I know that's a bit of overkill but if you have had as many systems crash as I do you learn to love backups. With backups, at least mistakes related to storage problems, accidental deletions and bad luck are usually a matter of time to restore than something more severe.

The trick was to migrate to RAID 5 on all the systems. Only my primary server had been using RAID 5 and that was only for half the drives. Now all three server are running RAID 5 or 6 for all drive arrays.

The time was in making sure all the backups were up to date. Upgrading to a gigabit switch cut the backup time from 90 hours to 9.

All the servers are running Centos 5.5 but this one wound up using software RAID support. I started with the tertiary server first. I had to reinstall the operating system because it was a lot easier to let the install handle the RAID support. I've done a RAID 1 upgrade in the past but an in place RAID 5 upgrade is a bit more challenging.

The new secondary server had a new LSI SAS MegaRAID controller. This server just required copying data and setting up the backup scripts. I did run into a character set problem because I had some new files on the main server that were using special characters in the file names. The problem arose because I was using rsync for backups using Samba shares. Getting the right iocharset settings in /etc/fstab as well as a matching setting in the primary server took care of the problem.

The primary server was where I ran into the big problem. I was using an Adaptec 4805S SAS RAID controller. The new drives were 1.5 Tbyte Seagate 7200.11 Barracudas. It seems that the controller does not support these along with a number of other Seagate drives I had on hand. Luckily I had a couple of 500 Gbyte and 750 Gbyte Seagate SATA drives that were supported. I came up with an interesting configuration with a RAID 5 virtual drive using 500 Gbytes on each drive along with a RAID 1 virtual drive using the 250 Gbyte partitions on the larger drives. I then combined the two virtual drives into one logical volume using LVM (Logical Volume Manager).

The mixed RAID 5 and RAID 1 configuration actually works well from a hot swap view since both RAID systems support a single drive failure. It is simply a matter of having two rebuilds going on when one of the larger drives is swapped out. As it turns out, this was useful because during this process I found three drives that found a home in the trash.

Things are finally back to normal in the lab. Daily backups are running nicely and all the RAID arrays are happy. Hopefully I won't have to repeat this for a couple years.

Hide comments


  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.