Originally posted by Marcel Birgelen
View Post
Announcement
Collapse
No announcement yet.
Reinstall RAID to degraded
Collapse
X
-
-
As you can see, there are no numbers on the Y axis. The graph is an approximation of the service life of your average hard disk and I think it matches pretty well with my experiences over the years: There is definitely a spike in infant mortality for new disks. Those are the disks that probably suffered from some small hardware defects that play out within a year or so. Disks that make it past a year or so are almost never problematic for the next 3 to 4 years. After that, you clearly see a larger percentage of disks dying due to simple wear.
Still, it doesn't mean a failure is guaranteed to happen in 10 years, but stuff starts to add up over the years. Mechanical parts start to wear out, grease starts to gunk up and seals start to get porous. Your mileage may vary between hard drive models.
I've seen some hard drives in datacenters last far longer than10 years in some odd boxes, but those drives have operated in optimal conditions: a room with almost constant temperature and optimal humidity, almost no dust and the disks have probably never been powered down. Still, another disk in that same datacenter, being part of some heavily used RAID array may not see it's 3rd year in service, simply due to the constant heavy duty I/O on the disk.
Comment
-
Originally posted by Carsten Kurz View PostMarcel - that means, if he buys a new drive set, his chances for a drive failing are the same as with keeping the old set. ;-)
Comment
-
So, what's the benefit for Shai if he buys a new drive set, compared to continue using his old set? ;-)Last edited by Carsten Kurz; 05-30-2021, 12:30 PM.
Comment
-
Peace of mind, and enhanced reliability. Sure, he could replace just the one that has actually failed, and thereafter keep a close eye on the server and replace the others as they go. However, given that all four drives were presumably installed new at the same time, and one has already failed, what are the odds of two more failing at about the same time? Higher than I would like to risk, and if that happens, the screen is down. In fact, this happened to a customer of mine just last week; though thankfully, on their TMS rather than one of the screen servers. The content drive simply disappeared, and when I looked remotely, I saw that the RAID controller was reporting that two out of the four drives had gone bad.
Not sure about prices in Israel, but here, 4 x 2TB enterprise drives can be had on Amazon for around $350. That is a very small price to pay to insure against a show stopping halfway through, and having to refund 200 customers.
Comment
-
How would peace of mind be justified if the risk for his new drives failing is the same as for his old drives?
(just playing devils advocate here)
What I would do personally is either upgrading the RAID to 4*4TB, or just buy another 2TB replacement drive and put it on the shelf for now. So far this drive did not exhibit a particular issue. It would be different it the issue had shown up during regular operation. But in this case it seems that just something went wrong when he reinitialized the RAID.
Comment
-
Like I indicated before, the graph is a bit overstated, at least in my experience. The chances of an old drive failing are higher than a new one failing. Also, the only way to get "over the hill", is by starting to climb it, else you'll be stuck in limbo forever. Also, new drives do come with warranty, which old ones don't.
And yeah, some enterprise storage vendors pro-actively mix their batches of disks to get out of the so-called bathtub curve...
Comment
Comment