Problem with HDD/Raid Array

For system help, all hardware / software topics NOTE: use Coders Corner for all coders topics.

Moderators: Krom, Grendel

Post Reply
User avatar
FireFox
DBB Ace
DBB Ace
Posts: 440
Joined: Sun Jun 03, 2001 2:01 am
Contact:

Problem with HDD/Raid Array

Post by FireFox »

Okay so it's probably just a month now that I have my RAID 5 setup in my server up and running. All was going fine until I got home today to find my server moaning about something. Investigated and found it to be some of the HDD's (instantly I thought of what Krom said about the more HDD's you have the more likely you are to run into a HDD failure, guess you were right again lol).

Anyway it would seem 2 of my 6 HDD's in the array failed. In the bios they should detect as ST2000VX002-1AH166 (Seagate SV35) but shows up as ST_M13FQBL. Somehow I managed to get the one drive running again by just booting it to the BIOS on my other pc but no luck with the other one. Reading up on this ST_M13FQBL it would seem to be some sort of problem with some of the seagate drives as it looks like it can't access the platter where the correct data is. So I'm now stuck with this drive and wondering is there anything else I can attempt to fix it that doesn't involve rocket science or voiding the warranty before I return it to be fixed or replaced as it is under warranty? Second question is, it seams to be a firmware issue on some of the other drives that did the same so should I be concerned here of a repeat or would this be maybe sorted by a similar fix by updating the firmware of the drives that are running in my server now? (thou I don't know how to do this)

Thirdly my RAID array is running at the moment and my data seems intact as I removed this one persistent drive from the array. I'm guessing then that if they replace the drive with a new one I can just re-insert the new drive in the place of the faulty one in the array and the array should rebuild itself or how does this work? What is the the best practice for me at this stage as well with my data on the array at the moment, besides backing it up? Should I avoid adding data at this point till the array is fixed etc.

Many thanks
All that is necessary for evil to triumph is for good men to do nothing!
User avatar
Krom
DBB Database Master
DBB Database Master
Posts: 16114
Joined: Sun Nov 29, 1998 3:01 am
Location: Camping the energy center. BTW, did you know you can have up to 100 characters in this location box?
Contact:

Re: Problem with HDD/Raid Array

Post by Krom »

Check and see if Seagate has firmware updates for the drives, and then check and see if the update addresses the issue. If they have one available but it doesn't do anything for that particular issue then ignore it. For the drive that dropped with the corrupted firmware, unless there is a specific recovery method listed online and a firmware update that makes the problem go away, send it in for warranty service. HDD manufacturers are generally good and fast about warranty service because a small level of failure is the nature of hard drives.

To update the firmware, the easiest way depends on how Seagate does them. Most likely it is a small windows application that flashes the firmware on the drive and then you have to reboot, in which case the easiest way would be to power off the array, pull the drives and plug them into a windows machine and update them one at a time. For added comfort you could boot into hirens boot CD in its windows mode and have only the hard drive you want to flash connected to the system.

The purpose of RAID5 is to produce high uptime, so generally speaking when a drive fails you can replace it and rebuild the array without bringing down the system, however things don't always go as planned so keep backups just in case it requires a complete wipe and rebuild.

The only other thing to keep in mind is that hard drives have an optimal operating temperature, usually around 30-35C so you should make sure the airflow over the drives and ventilation in the case is good for every drive. Keeping drives cool is a good practice for keeping them running a long time, although some drives are just defective and nothing will keep them running.
User avatar
FireFox
DBB Ace
DBB Ace
Posts: 440
Joined: Sun Jun 03, 2001 2:01 am
Contact:

Re: Problem with HDD/Raid Array

Post by FireFox »

Okay I'll go check on their procedure of firmware updates and if there is any reports off a firmware update for these drives that adresses the issue otherwise it was just luck of the draw I guess, rotten luck that is. :x

Currently the array is functioning so the Server is up and running in "critical mode" if I can call it that, thank goodness. So I'm basically set to operate as normal right now and just get them to replace the drive or have them fix it and I'm good to go again after I re-install it in the array.

Good stuff! Considering myself lucky I got the 1 HDD running as the array did what is was intended for (redundancy) with the1 HDD failure. By the looks of it the array went offline I think when both failed not sure now, somehow I thought I had redundancy for 2 failures in a 6 drive array, could be wrong as the system really went all bonkers at one point dropping 3 drives out of the array due to this 1 drive I think but like I said it is up and running right now so will do my backup of all the new data I added quickly and sort this out next week then when I can send back the HDD.

Oh and I've got a 120mm cooling fan in front of both banks of drives (3HDD per bank) so they are well ventilated and cooled :wink:
All that is necessary for evil to triumph is for good men to do nothing!
User avatar
FireFox
DBB Ace
DBB Ace
Posts: 440
Joined: Sun Jun 03, 2001 2:01 am
Contact:

Re: Problem with HDD/Raid Array

Post by FireFox »

Reviving this topic for some help quickly.

The faulty drive was send back for RMA and I finally got a replacement yesterday. Tried installing it today and rebuild the array. As I understood I'm supposed to just be able to plug in the drive and boot up set it up in place of the faulty drive and all should be good. Thou the array is registering is as a single drive and doesn't want to add it in as 1-5 which it is replacing. I've look in the raid setup config menu's of the onboard raid but see no way I can make it register the new drive as 1-5 instead of single drive as it is doing now. The only option I can think of is to delete the whole array and rebuild the entire 6 drive array from scratch :(

So before I just started with that en-devour I started to make a complete backup of the data on the array but low and behold doing this I just now found that some of my data is damaged and I might end up loosing some stuff. Luckily so far it would seem to be stuff I seldom use and that which is important is also actually a backup of the office server which I can just backup again.

So my first question is what the heck am I missing to just reintegrate the new drive in the array? Secondly seeing that there is damaged/corrupted data won't it be better to just salvage the undamaged data and just rebuild the array from scratch as then I'll know that the data I have is all good and start a fresh?

PS I've booted the HDD on one of my other pc's and it is registering correctly as it should so the drive seems good.

THX

[EDIT] Well this is just stupid. I ended up backing up all my data, minimum data loss as well and nothing I can't life without, but when I now try to rebuild the RAID5 it won't format in windows like the first time I did it. I've mixed up the combinations and narrowed it down to my 2nd HDD that had the same near death experience as the 5th drive that died but it rebooted and I was able to save my data, well most of it. Anyways this drive keeps me from formatting the 6HDD RAID5 I wanted to run. The other problem is the drive actually works in window as a stand alone or if I mix it up in other RAID combo's. This makes me believe I won't be able to RMA this foul piece of hardware, that and the hassle and cost to do it isn't worth the effort. So I'm left with some other options.

1) Just go get another 2TB drive to replace this one and rebuild the RAID5 (9TB - 1 drive fault tolerant)
2) Setup two 3HDD RAID5's and mirror them in Win7 (4TB - 2/3 drive fault tolerant 1 drive per RAID5 or even 2 drives in one RAID 5 and 1 in the other and the mirror should still hold)
3) Setup RAID 10 (motherboards onboard raid is limiting me to only 4 HDD's via the onboard raid thus only 4TB - 1 drive fault tolerant)
4) Setup two 3HDD RAID0's and mirror them in Win7 [6HDD RAID10 effectively] (6TB - 1 drive fault tolerant)

I'm leaning towards the 4th option as option 2 seems the safest but the cost of disk space is rather high as I went RAID 5 because I was aiming for maximum disk space with fault tolerance. I think 6TB should still do but 4 just seems a bit to small as my current data is only 1.5TB but I did accumulate that rather fast so I don't want to limit myself unnecessarily. Any inputs?

PS: All my data is backed up and I will keep this backup for as long as I can as it is now spread over two of my PC's until I can consolidate it to a single backup as I'm not looking for another failure scare like this and that is one of the reasons I'm now leaning away from RAID 5 as it did save my data but the rebuild didn't work like it should and I nearly lost everything.
All that is necessary for evil to triumph is for good men to do nothing!
User avatar
Krom
DBB Database Master
DBB Database Master
Posts: 16114
Joined: Sun Nov 29, 1998 3:01 am
Location: Camping the energy center. BTW, did you know you can have up to 100 characters in this location box?
Contact:

Re: Problem with HDD/Raid Array

Post by Krom »

With onboard RAID chips, more often than not it requires a complete rebuild to restore a degraded array, the BIOS RAID modes just aren't robust enough to handle it on the fly. The one drive behaving oddly may not automatically be a problem with the drive and could be a compatibility problem on the controller, or a defect anywhere on the board/controller/drive/firmware chain. The BIOS RAID modes are usually not tested that extensively and problems could easily slip past the engineers, especially when you start getting past 2 or 3 drives. If you are rebuilding the array anyway, do it in a pure software mode from windows, it will handle failures much more gracefully.

As for the layout of the two arrays, I have one suggestion: don't automatically mirror them. Certainly RAID modes can provide some protection against against hardware failure. But they don't protect against accidental destruction, like in cases where you accidentally delete something you shouldn't have or a virus infects the system and permanently corrupts the partitions. In these cases having two entirely separate and manually backed up RAID arrays provides the best data security, because even if one of them gets completely and irrecoverably wiped, the other will still be online and safe. Where as if the OS ties them together completely, if something bad gets written to one, it also gets written to the other and can destroy both the primary and the backup together. RAID 1, 5 and 10 are meant to maximize up-time, they are not meant to protect data.
User avatar
FireFox
DBB Ace
DBB Ace
Posts: 440
Joined: Sun Jun 03, 2001 2:01 am
Contact:

Re: Problem with HDD/Raid Array

Post by FireFox »

Well if I knew now that software RAID is the more robust option I would have setup my system a bit differently as I've already rebuild the arrays (2 RAID0 arrays in the BIOS which I'm mirroring in Win 7 with Disk manager). I did firstly try to set this all up in Windows disk manager but I could only build the 2 RAID0 arrays and couldn't figure out how to mirror the 2 RAID0 drives I created until now, after all the data has been restored which took forever due to the limit of USB2.0 speed for the transfer. I just messed up the mirror and actually think I saw what I missed the first time round to set it up :x. Frankly at this point I'm not really looking at rebuilding the entire setup again (re-syncing the mirror right now) as the data transfer just takes to long. Well that is until I get a better backup option than USB2.0.

I hear what you are saying about not letting windows mirroring the drives but here I need to confess that I tend not to do backups that often just for the fact it takes so long at this point, which isn't really any excuse and setting up a scheduled backup will defeat the purpose as you said because then errors can still creep in.

I think what I should do at this point then is run my setup as it is right now "BOIS" RAID0 mirrored in Windows and get myself an e-sata external hard drive enclosure or hard drive dock where I can slap in and 3TB drive for a dedicated backup which I will do manually as I don't like the idea of my backup drive being in the same system as where the data is stored because if something really goes south then you are up the creek but with an external drive option you will have even better protection because when you backup and are you done you store it safely offline.

This way I will have maximum up time and maximum protection. It is better to spend the extra bit of cash right now than regretting it later because you were a bit cheap on the system.
All that is necessary for evil to triumph is for good men to do nothing!
Post Reply