AnimeSuki Forums

Register Forum Rules FAQ Community Today's Posts Search

Go Back   AnimeSuki Forum > Support > Tech Support

Notices

Reply
 
Thread Tools
Old 2008-02-21, 20:25   Link #1
Ledgem
Love Yourself
 
 
Join Date: Mar 2003
Location: Northeast USA
Age: 38
Trouble bringing a RAID 0 array online

Greetings everyone!

Here's the situation. A year or so ago we made a RAID 0 using four 500 GB "GTech" external drives to create a 1.8 TB working array. The RAID was made through Mac OS X's built-in software RAID manager and was formed as a striped array (as opposed to concatenated). The four drives are linked through Firewire 800, and were originally linked to a computer using a Firewire 800 to 400 cable. A few months ago, we lost the power supply to one of the drives during a sewer leak incident. I discovered this when attempting to mount the array, and found that only three of the drives were coming online.

I just received a replacement power supply, and find that only three of the drives will show up. It isn't a drive failure, and I'll explain why. The RAID array is linked to a WD MyBook RAID, which is linked to another WD MyBook RAID, which is linked to another MyBook RAID, which is linked to the computer (that means that the first GTech drive in the link is the fourth daisy-chained device; all connections are Firewire 800).

I traced it back and through trial-and-error initially thought the issue to be that one of the drives had failed - specifically, the third GTech in the link. It seemed odd, however, as the drive would spin up like the others, and the fourth drive would mount. So I went through and changed the link order, making the third drive the first. The drive mounted successfully, and the new third drive would no longer mount. The fourth drive (unchanged from before) mounted successfully.

I'm working with what seems like dozens of Firewire devices here, but from what I recall up to 32 Firewire devices can be linked in this manner. I'm also ruling out the possibility that it's an issue with too many devices on the line because in theory the fourth drive should be the one to not show up.

When testing, I've attempted to power on all four drives at the same time, and I've also powered them on one-by-one (only proceeding to power on the next drive in line when the previous drive shows up on the system). I have a few more things to try, such as connecting the GTech drives to the computer directly and swapping cables and Firewire ports, but it's the end of the day and I'm heading home. So, to save on time, has anyone ever seen this behavior with a RAID 0 array before? Can you see what's wrong with the setup, or give me some pointers for how to diagnose it and get it up and running quicker?

Thanks in advance.
__________________
Ledgem is offline   Reply With Quote
Old 2008-02-21, 21:02   Link #2
Epyon9283
Geek
 
 
Join Date: Dec 2005
Location: New Jersey
Age: 40
Send a message via ICQ to Epyon9283 Send a message via AIM to Epyon9283
If this was working fine until the power supply blew I'd strongly suspect a hardware fault of some sort. I'd plug each drive in individually to see if they appear in disk utility.
Epyon9283 is offline   Reply With Quote
Old 2008-02-21, 21:07   Link #3
Ledgem
Love Yourself
 
 
Join Date: Mar 2003
Location: Northeast USA
Age: 38
Each drive will show up, but out of two different configurations the third drive in the link failed to show. The OS identifies the various drives as Disk5s3, Disk6s3, Disk7s3, and Disk3s3. I verified this by turning on the first drive in the link, saw it show up (in Disk Utility), turned on the second, saw it, turned on the third... nothing for ~1 minute, turned on the fourth, saw it mount. Again, when I swapped the positions of the first and third linked drives, it seemed to be the third drive that failed to show up. Note that the third drive in this case was physically different from the one in the first trial. It's very strange...
__________________
Ledgem is offline   Reply With Quote
Old 2008-02-22, 03:00   Link #4
Jinto
Asuki-tan Kairin ↓
 
 
Join Date: Feb 2004
Location: Fürth (GER)
Age: 43
Weirs... though, if the raid controller or raid software somehow memorized not to use the third device in the chain because it is dead... but then again its a problem of physical chaining not logical chaining, so that does not make sense (except the drive's change their logical id's when replugged physically). Maybe you can reset some flag or trigger in the raid setup (I never worked with MACs).
__________________
Folding@Home, Team Animesuki
Jinto is offline   Reply With Quote
Old 2008-02-22, 10:00   Link #5
Epyon9283
Geek
 
 
Join Date: Dec 2005
Location: New Jersey
Age: 40
Send a message via ICQ to Epyon9283 Send a message via AIM to Epyon9283
See if theres anything in the system log when the third drive doesn't appear.
Epyon9283 is offline   Reply With Quote
Old 2008-02-22, 20:56   Link #6
Ledgem
Love Yourself
 
 
Join Date: Mar 2003
Location: Northeast USA
Age: 38
I attached each drive to the computer directly, one by one, unlinked. The fourth drive - the one whose power supply failed due to water damage - didn't show up like the others. The only pertinent message I could find in the system.log was this:
Code:
Feb 22 17:44:15 ZVUK kernel[0]: FireWire (OHCI) Apple ID 52 built-in: no valid selfIDs for more than 2 minutes after bus reset.
I presume that means that the enclosure itself is shot? That was connected to the enclosure's Firewire 400 port. I connected it via Firewire 800 to the MyBook chain and it knocked the connected MyBook offline. I unplugged the cable from the GTech and plugged it into the second Firewire 800 slot and nothing happened - no log messages, either (except for a dozen "IOResources: match category DigiIO exists" messages, but I think that's from something else).

I did leave the drive wired and connected with the other GTechs for ~20 minutes, and within that timeframe the drive did show up. However, the system viewed it as unformatted. It's strange behavior, as a regular unformatted disk would still show up right away. This one takes its time. I could have sworn that it showed up right away in previous tests... I'll play with the configuration some more, but it seems rather suspicious.

In the event that the drive really is problematic, what's the best solution - crack the enclosure open and free the drive? I can have the department order a separate external enclosure no problem, but I'd rather ensure that the enclosure is the problem and not the drive. Any ideas, or suggestions for things to check for?
__________________
Ledgem is offline   Reply With Quote
Old 2008-02-23, 02:46   Link #7
Tiberium Wolf
Senior Member
 
 
Join Date: Dec 2004
Location: Portugal
Age: 44
Just take out the HDD see if it's detected. There isn't that many choices. It's either the enclosure or the HDD. When it comes to simple hardware detection and it freaking takes time then it's hardware problem.
BTW, I had a few instances where the PSU went Bang and so it took 1 or 2 components with it.
__________________
Tiberium Wolf is offline   Reply With Quote
Old 2008-02-23, 14:42   Link #8
Ledgem
Love Yourself
 
 
Join Date: Mar 2003
Location: Northeast USA
Age: 38
Quote:
Originally Posted by Tiberium Wolf View Post
Just take out the HDD see if it's detected.
Since this is an external HD, taking out the HD will involve cracking open the enclosure and probably rendering it unusable. I'd rather try some alternate options to ensure that it's the enclosure before doing that, but given how things have been going it seems like I'll get around to that soon enough.
__________________
Ledgem is offline   Reply With Quote
Old 2008-02-23, 19:44   Link #9
Tiberium Wolf
Senior Member
 
 
Join Date: Dec 2004
Location: Portugal
Age: 44
Quote:
Originally Posted by Ledgem View Post
Since this is an external HD, taking out the HD will involve cracking open the enclosure and probably rendering it unusable. I'd rather try some alternate options to ensure that it's the enclosure before doing that, but given how things have been going it seems like I'll get around to that soon enough.
Rendering it unusable!? Ain't all enclosure able to take out and put in again whatever HDD whenever u want?
__________________
Tiberium Wolf is offline   Reply With Quote
Old 2008-02-23, 20:47   Link #10
Ledgem
Love Yourself
 
 
Join Date: Mar 2003
Location: Northeast USA
Age: 38
If you buy the enclosure, sure. But if you buy one of those pre-made drives in an enclosure, very few of them are made to let you take the drive out and put a new one in. For example I've got a few LaCie's and you'd have to take a hammer to them to get the drive out - there are no screws or anything. These GTechs seem to have some cracks that I can pry open. The only external drives I've seen that have scews and include instructions for removing and adding drives were the MyBook Premium and Professional editions. Those external drives are actually two drives in a single enclosure, so it sort of makes sense.
__________________
Ledgem is offline   Reply With Quote
Old 2008-02-23, 23:06   Link #11
Epyon9283
Geek
 
 
Join Date: Dec 2005
Location: New Jersey
Age: 40
Send a message via ICQ to Epyon9283 Send a message via AIM to Epyon9283
I had (operative word) an external Maxtor drive. It was out of warranty and started making the clicking noise of death. Wanting to check the SMART status on the drive (can't do it through USB) I pried the enclosure open took the drive out and stuck it in a PC to check it. Drive tested fine but is still making the noise. Prying the enclosure open broke it real good. I had to toss it. It was clipped together... No screws anywhere. My LaCIe looks the same way. I just got a Seagate freeagent drive that doesn't look easy to open either.
Epyon9283 is offline   Reply With Quote
Old 2008-02-24, 05:30   Link #12
Tiberium Wolf
Senior Member
 
 
Join Date: Dec 2004
Location: Portugal
Age: 44
Geh. Then don't buy those kinda enclosures where you can't take the HDD out without breaking anything. I never saw any of those for sale. But then again my last and 3rd one was bought 4 year ago. Anyway it's kinda a waste if you can't exchange HDD for some reason.
__________________
Tiberium Wolf is offline   Reply With Quote
Old 2008-02-24, 12:21   Link #13
Ledgem
Love Yourself
 
 
Join Date: Mar 2003
Location: Northeast USA
Age: 38
I agree to that for home use, but my job isn't to piece HDs and enclosures together - it's to work with media applications. The purchasing department would rather only find one item than hunt for two, as well. As a result I have dozens of premade external drives in my office. At home I only have one premade external drive, and that was because I was desperate and the price was too good to pass up
__________________
Ledgem is offline   Reply With Quote
Old 2008-02-28, 22:33   Link #14
Ledgem
Love Yourself
 
 
Join Date: Mar 2003
Location: Northeast USA
Age: 38
Just to update this and provide the conclusion:

I took the drive home so that I could work on it with all of my tools. The enclosure did actually have screws (one of which was covered by a "warranty void if this seal removed" sticker) and allowed for a drawer containing the drive to slide out. Sure enough, there was plenty of bluish dust and staining on the chipset, indicative of oxidation from the water damage. The drive itself also had some stains on it, presumably from the water. The drive was held by mounting brackets that were screwed in with smaller screws too tightly for me to remove all of them, so I settled for removing the enclosure chipset and wedging my IDE to USB adapter into there. It took longer than expected, but the drive mounted, and my system recognized it as a RAID slice.

The firewire on the chipset seems to work correctly, but it seems that some of the corrosion hit the area where the PATA connector is (there was also a bit of corrosion along the connector that attached to the chipset). That's likely where the problem is. I'll probably bring my adapter into work, form the RAID, dump its contents onto other drives, and then recreate the RAID using only three of the original four drives. I could request an external enclosure for this drive, but enclosures with Firewire 800 are incredibly expensive. It's also clear that this drive was touched by water, so I'm not sure how it'll hold up to rigorous usage (it's a Hitachi).

I will say that I am very impressed with these external drives, though. This is an external drive from Hitachi's "GDrive" line. The enclosure is relatively small and in addition to being a metallic material, it sports a heatsink at the bottom of the enclosure as well.

I'll also mention the Vantec IDE/SATA to USB adapter - it's an incredibly useful tool, and as far as I'm concerned it's essential if you want to be able to field emergencies and are using a laptop.
__________________
Ledgem is offline   Reply With Quote
Old 2008-02-29, 02:34   Link #15
Jinto
Asuki-tan Kairin ↓
 
 
Join Date: Feb 2004
Location: Fürth (GER)
Age: 43
Good that you found a solution and the data is not void
__________________
Folding@Home, Team Animesuki
Jinto is offline   Reply With Quote
Old 2008-02-29, 16:06   Link #16
Ledgem
Love Yourself
 
 
Join Date: Mar 2003
Location: Northeast USA
Age: 38
Well, I'm hoping that there's no corruption. I have about 1.7 TB of data to transfer off; three of the drives are connected via firewire but the one with the damaged enclosure is linked via an IDE to USB converter, and I think it's slowing the entire thing down. Estimated 14 hours to transfer 360 GB when I transferred 450 GB in ~4-5 hours over a firewire -> 1 gbps ethernet -> firewire connection? Pah...

Just for fun, here's the setup I'm working with (showing the successfully loaded RAID set; I'll have to look up what Apple RAID 2.0 is):
__________________
Ledgem is offline   Reply With Quote
Old 2008-04-21, 12:59   Link #17
Ledgem
Love Yourself
 
 
Join Date: Mar 2003
Location: Northeast USA
Age: 38
RAID 1+0 Issue

Greetings everyone - yet another problem, but I didn't want to make a new thread for it.

I've put the four drives that I was dealing with before into a RAID 0+1 array (Apple calls this a "RAID 10"). Since one of the drives is potentially unstable I figured that this would be the safest thing to do. I had a hiccup with it once, when the unstable drive did something and the system reported that one of the two mirrors needed to be rebuilt, but other than that it's been working very well.

Now there's a new problem. It would seem that someone tripped on a surge protector over the weekend because at least two of my systems went down (the primary one didn't) and virtually all of my external HDs seem as if they were dismounted and remounted - except for the RAID 0+1 set. I was able to access data from the RAID 0+1 without a problem.

The system seemed a bit sluggish, so I rebooted it. A message flashed about how the RAID 0+1 wasn't recoverable by this system, but the message disappeared within a second or two. I checked the RAID status in Disk Utility and it reported that the RAID was healthy. I checked the RAID itself and can only see two of my data folders. However, the old data still has to be there because the amount of space free is around the same that there was before I rebooted. How do I recover it? Here are the pertinent messages that were in the console:

Code:
4/21/08 10:32:52 AM mds[37] (/Volumes/RAID Mirror ZX/.Spotlight-V100/Store-V1/Stores/B97ACB4B-2F10-44BE-A3F1-C19FA03C781C)(Error) IndexCI in indexRestoreHeaderFromBuffer:Invalid version (255080341) expected (63) 
4/21/08 10:32:52 AM mds[37] (/Volumes/RAID Mirror ZX/.Spotlight-V100/Store-V1/Stores/B97ACB4B-2F10-44BE-A3F1-C19FA03C781C)(Error) IndexCI in ContentIndexOpenBulk:Unclean shutdown of /Volumes/RAID Mirror ZX/.Spotlight-V100/Store-V1/Stores/B97ACB4B-2F10-44BE-A3F1-C19FA03C781C/0.; needs recovery 
4/21/08 10:32:52 AM mds[37] (/Volumes/RAID Mirror ZX/.Spotlight-V100/Store-V1/Stores/B97ACB4B-2F10-44BE-A3F1-C19FA03C781C)(Error) IndexCI in indexRestoreHeaderFromBuffer:Invalid version (781074352) expected (63)
I really, really need this data back. I'm willing to seek out data recovery software if that's what it takes, but I'm hoping it can be accomplished through the Disk Utility or the terminal.

Comments about whether RAID 0+1 is a bad idea are also appreciated...

Edit: Here's an output from Disk Utility while doing a "verify disk" and attempting repair:
Code:
Verifying volume “RAID Mirror ZX”
Invalid content in Journal
Checking Journaled HFS Plus volume.
Checking Extents Overflow file.
Checking Catalog file.
Incorrect block count for file Baron Muenchhausen - German
(It should be 2838272 instead of 6210447)
Incorrect block count for file Bolshoi Ballet in Sergei Prokofiev's Romeo and Juliet1
(It should be 2782208 instead of 6061585)
Incorrect block count for file Lecture - Prof Edwin Perkins - Epidermis - Summer 1977
(It should be 61312 instead of 1776974)
Incorrect block count for file Swept Away
(It should be 6478720 instead of 6537137)
Incorrect block count for file The Two Traditions (Th#53D4-MPEG-2 6.2Mbps 2-pass.m2v
(It should be 311974 instead of 566371)
Incorrect block count for file Voices and Vi-ITM-00000001
(It should be 60928 instead of 160830)
Incorrect block count for file Voices and Vi-ITM-00000002
(It should be 42496 instead of 160830)
Invalid node structure
The volume RAID Mirror ZX needs to be repaired.

Error: Filesystem verify or repair failed.
Verify and Repair volume “RAID Mirror ZX”
Checking Journaled HFS Plus volume.
Checking Extents Overflow file.
Checking Catalog file.
Invalid sibling link
Volume check failed.

Error: Filesystem verify or repair failed.
Think Disk Warrior might be better for the job? I may attempt to disconnect one of the two mirrors and see if it makes a difference - I'm not thrilled with the idea of degrading the RAID again, though.
__________________

Last edited by Ledgem; 2008-04-21 at 13:22.
Ledgem is offline   Reply With Quote
Old 2008-04-21, 14:57   Link #18
Ledgem
Love Yourself
 
 
Join Date: Mar 2003
Location: Northeast USA
Age: 38
OK, the issue is resolved, although I'm not really sure how or why.

1) After running Disk Utility the RAID would not mount.
2) I ran Disk Warrior, which immediately found the RAID but identified it as "IndexState." It was able to run a recovery and found all of my old directories, but barely any files. It gives you the option of previewing the changes before committing them to the disk - the new disk would have 800 GB free. Current disk had closer to 100 free. Canceled changes.
3) Rebooted system again. Console had a lot of complaints about the RAID, but it mounted. Console claimed that "fsck" would be forced on the drive upon next mount. RAID status was still healthy; all files seemed to be on the drive.
4) Ran Disk Warrior again. The problems that it found were different from before. The preview didn't look any different than what was actually there, but I decided to let the system do its business and did not commit the changes. Disk Warrior unmounted the drive and was not able to mount it again.
5) Rebooted. No complaints in the Console. Ran Disk Utility and did a repair on the RAID. There were a number of things to be repaired, but the repair was successful. Everything's good!

Great way to start a Monday. I'd imagine that this problem was exacerbated by the fact that I'm working with a RAID 1+0, but it probably would have occurred with a RAID 0 as well. The recovery was probably more due to the system background processes than RAID recovery. I'd still like remarks about the RAID 1+0 - I think it's the only choice for me. After I botched a manual mirror recovery by trying to do it through the GUI (long story), I have it set to automatically rebuild mirrors. Is this a bad idea? I'll look into it further when I have the time, but for now I'm just happy that it seems that I won't need to redo a month's worth of work.

Edit: Seems like many of the files can't be opened. Disk Warrior reports that they have issues in the headers - an unrecoverable problem. Curses...

Some of the files are OK but it seems like very few things were spared. All in all I have to redo about two weeks' worth of work. Data corruption from a power outage can't be prevented at the software level, but I do have to wonder if the automatic rebuilding of the RAID 1 messed something up worse than it would have been.
__________________

Last edited by Ledgem; 2008-04-21 at 20:06.
Ledgem is offline   Reply With Quote
Reply


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 21:03.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.
We use Silk.