|
This topic comprises 3 pages: 1 2 3
|
Author
|
Topic: DSS-200 Degraded Array Question
|
Jim Cassedy
Phenomenal Film Handler
Posts: 1661
From: San Francisco, CA
Registered: Dec 2006
|
posted 07-22-2019 03:42 PM
Today, at one of the screening rooms I work at, I noticed a "degraded array" warning. When I investigated further, the system is showing Drive 1 with 16 reallocated sectors and Drive2 with 15 reallocated sectors. ( )
I was just at this location two days ago and I KNOW this warning was not present. So far, it hasn't in any way affected operations.
I know they need to look at getting some new drives in there, but in the short term, my question is: CAN I DO A RAID RE-BUILD WITH 2 DRIVES DOWN LIKE THIS?
I need to use this server again for a press screening on Thursday, and I don't have the time to get/install new drives B4 then
Since, so far, it's not affecting operations, I'm tempted to leave it "as is" until I can deal with it next week. (This room isn't used every day)
| IP: Logged
|
|
Mike Renlund
Film Handler
Posts: 71
From: San Francisco
Registered: Feb 2008
|
posted 07-22-2019 04:59 PM
Hi Jim,
Reallocated sectors are a leading indicator of drive failures, but having reallocated sectors doesn't necessarily mean that the drives are bad. I had a site with a major structural failure that shook the building very hard, and we gained a few dozen reallocated sectors added...but that was from the mechanical shock. After that they were still fine.
The DSS200 will attempt to rebuild on it's own, so at least one of those drives may be bad since it is still showing the error.
It is advisable to try and command a rebuild just in case and won't affect playback. I'd also look at the front panel and see if one or both drives show red.
If it is only one, power down, pull the drive, and power up again. If the unit still works, get a new drive for the system.
If the unit doesn't boot, you know that two drives are acting badly (you should be able to put the one drive back and limp along while you get two new drives, then wipe the system to create a new RAID). Obviously take note of settings in the config script, any serial commands and settings, and copy all the needed content off.
You can also put system logs through the Dolby Log Analyzer to see additional information.
If you need help on any of these steps, contact our team at CinemaSupport@dolby.com. We can look deeply into the logs and easily tell if you have one or two drives that are bad.
Mike Renlund Dolby Laboratories
| IP: Logged
|
|
Leo Enticknap
Film God
Posts: 7474
From: Loma Linda, CA
Registered: Jul 2000
|
posted 07-22-2019 08:35 PM
I look after a few DSS200s and 220s showing between 1 and 100 reallocated sectors, one of them on all the drives, literally for years. Of course I advise replacement of any drive showing reallocated sectors as soon as I see it (whenever I touch a server, I download a log package and put it through the online analyzer, usually as the first thing I do), but sometimes, the end user doesn't regard it as a priority to do that. The server still works, right?
If the DCP you need for Thursday is ingested and you know that it plays OK, then if it were me, I'd leave the server powered up, and deal with the problem after your screening is out of the way.
Again, if it were me, I'd put a log from your server through the Dolby log analyzer. If it shows that those drives have racked up more than 43,920 hours (which is equivalent to five years of spinning: there are 8,784 hours in a year, and as a very rough rule of thumb, enterprise grade SATA drives are said to be good for five years before the risk of failure starts to increase significantly), replace the whole set and do a clean install of the software. Take the opportunity to open up and clean out the server case (e.g. with a paintbrush or Datavac), too.
2TB enterprise grade drives are now so cheap, that, IMHO, it is silly to take the risk of a RAID going out, stopping a show, and maybe having to give out 100 refunds.
| IP: Logged
|
|
|
Jim Cassedy
Phenomenal Film Handler
Posts: 1661
From: San Francisco, CA
Registered: Dec 2006
|
posted 07-23-2019 09:21 AM
Thanks for the replies. I actually wasn't supposed to work that screening yesterday and got called in at the very last minute & managed to squeeze it in before another commitment afterwards. I don't know why I didn't think of pulling & analyzing the logs, except that I was in a rush to get to my next location. Duh!
quote: Mike Renlund The DSS200 will attempt to rebuild on it's own,
After the screening, I tried re-booting the serverthingy, and it automatically started doing a 'raid re-buid'. This morning, I logged in remotely and the re-build had completed, and the "degraded array" error message was gone.
So for now, things seem to be under control, although obviously I need to talk to 'the boss' about getting some new drives in there. I've installed new drives before, so I'm familiar with the procedure, and I think there's also an instructional document on how to do it on the Dolby Customer website, which I have access to in case I need to refresh my memory. The Approved Replacement Drive List is also there IIRC.
quote: Leo Enticknap If the DCP you need for Thursday is ingested and you know that it plays OK, then if it were me, I'd leave the server powered up, and deal with the problem after your screening is out of the way.
Actually, the Press Screening in question was supposed to happen last Friday, but at the last minute was re-scheduled to this Thursday, so the content IS already ingested, and I just got a new KDM, which I was able to (remotely) ingest this morning with no problem.
So for now, I, too believe that "If it works, don't F*** with it" (at least until after Thursday) is the best path forward.
The room isn't being used again until then, & I don't have time to go down there before then anyway. - - but when I do go in on Thursday, just for fun, I'm going to pull the logs and run them through the Dolby Log Analyzer & see what it sez.
quote: Leo Enticknap it is silly to take the risk of a RAID going out, stopping a show, and maybe having to give out 100 refunds
Agreed, although in our case it would only be ONE refund, albeit a big one!
| IP: Logged
|
|
Marcel Birgelen
Film God
Posts: 3357
From: Maastricht, Limburg, Netherlands
Registered: Feb 2012
|
posted 07-23-2019 10:51 AM
Keep in mind that a degraded array will affect the performance considerably. But a RAID array that's rebuilding will impact the performance even more. Also keep in mind that depending on disk size and load on the machine, a RAID rebuild can take anywhere between hours and days...
Also remember that an array that has a failed member, that has been re-inserted into array, has a big chance of failing again after a while. Often, when a disk/member fails during a show, it will have an impact on the show, like a short to medium hickup during the presentation.
Also, a faulty disk can sometimes bring the performance of your array to a grind, so much, that it even impacts normal playback. If you have a disk that's failing, but always answers back just before the RAID controller or RAID software would eject it, it can cause some severe issues with performance. Some Film-Tech members have experienced such behavior before. RAID controllers and RAID software usually isn't smart enough to detect this kind of behavior and as long as the disk keeps on giving back the requested result, it will hang on to the lagging disk.
I consider disks to be a consumable. You usually should swap them out after 3 to 5 years, depending a lot on the usage pattern on them. But for a normal server, playing one to three shows every day, I'd say replacing them every 3 years is good practice.
Luckily, compared to almost anything else on this equipment, disks are pretty cheap.
| IP: Logged
|
|
Leo Enticknap
Film God
Posts: 7474
From: Loma Linda, CA
Registered: Jul 2000
|
posted 07-23-2019 10:56 AM
On servers with full-sized 3.5" SATA drives that are left running 24/7, our experience is that after around 30,000 hours, the chances of bad (reallocated, as Dolby Show Manager labels them) sectors appearing increases from almost zero to low, but the chance of an outright drive failure remains near zero. After around 40 to 45,000 hours, a drive is almost guaranteed to have some bad sectors, and the chance of a drive failure becomes significant.
On IMS type servers with 2.5" drives, those figures become 20,000 and 30,000 hours. They just don't last as long, possibly because they operate at a higher temperature.
| IP: Logged
|
|
|
|
|
|
|
|
|
Marcel Birgelen
Film God
Posts: 3357
From: Maastricht, Limburg, Netherlands
Registered: Feb 2012
|
posted 07-24-2019 10:51 AM
It depends on the use-case of the SSD. An SSD can theoretically read an infinite amount of data, it cannot store an infinite amount of data, because of the limited write cycle. For some applications, SSDs are more reliable than rotating rust, due to this aspect.
The write cycle limit of an SSD is pretty predictable and can be closely monitored. The problem is that, when you put them into an RAID array, all with the same age, they're going to fail awfully close together. So, you need to replace them before you park yourself against the wall.
The failure of an SSD is indeed mostly binary, it's usually completely dead. Data recovery from an SSD, if possible at all, is even more expensive than from a platter.
The biggest advantage of an SSD is the far shorter access time to random data and therefore far larger "IOPS". This performance gain is primarily experienced in stuff like databases.
For standard DCI servers, not much gain to be gotten. One advantage though is, that with SSDs you have less to worry about bandwidth problems on your storage medium when we're going to see stuff like 60fps 4K or even 120fps 4K content...
| IP: Logged
|
|
|
All times are Central (GMT -6:00)
|
This topic comprises 3 pages: 1 2 3
|
Powered by Infopop Corporation
UBB.classicTM
6.3.1.2
The Film-Tech Forums are designed for various members related to the cinema industry to express their opinions, viewpoints and testimonials on various products, services and events based upon speculation, personal knowledge and factual information through use, therefore all views represented here allow no liability upon the publishers of this web site and the owners of said views assume no liability for any ill will resulting from these postings. The posts made here are for educational as well as entertainment purposes and as such anyone viewing this portion of the website must accept these views as statements of the author of that opinion
and agrees to release the authors from any and all liability.
|