Announcement

Collapse
No announcement yet.

Pinpoint failing HDD on IMS1000 RAID

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    With no Smaart Info available the best choice is to use one of the modern drive test programs available and set up the drives for testing on s seperate computer. If the bad drive has failed blocks then some times the test program can map them out and leave the drive usable again. They will also just read the Smaart Data as well. To do a full scan on a 1 tb drive does take 4+ hours though.

    Comment


    • #17
      You'll need an archive tool like 7z to look in the Doremi logs. The path will be: drmreport.tgz\drmreport.tar\doremi\opt\smart\

      The SMART logs on the IMS1000 are just the basics, from what I can tell. The 3WARE controller of the DSS was wonderful because you could see how the drives functioned as a team and if one of the drives had a notable longer access time. If such a thing exists on an IMS1000 (or the later variants), I haven't found it.

      Comment


      • #18
        @Mark
        Great, I will do so, after replacing the drives, to get some more details about them.
        I was wondering if I could do or check something before removing them. Or, to be precise, in order which to remove immediately.

        @Steve
        Yes, that's a valid solution as well, without having to go through a huge list of data.
        The truth is I overlooked that folder before, because my first impression was that the files were either too old (eight years old, some of them) or zero size. That might have to do with sym-linking. I am not sure.
        As for administration of the drives from IMS1000 (even though I installed one or two IMS2000, I never had to trouble shoot one, so no experience there) is limited. That is comparing to its predecessors SV3/4 and DCP2K/2K4.

        Comment


        • #19
          You can only do as Steve suggests and get into the logs. I also prefer the 3 Ware in the old DSS's too, but 3 Ware no longer exists in that form.having been.bought out by Promise. I suppose you could also just send the logs into Dolby and they could get back to.you. I did all GDC for my customers, and the full size servers allow you to.go.to.the Smaart logs instantly, but starting with the SX-3000 I had to send logs in. GDC is always very fast at responding to.those requests... less than an.hour... because they assume the tech is waiting at the theater to switch out the bad drive. Had Dolby kept the slick DSS GUI then I would have gone that route. But everyone I put in.an ISS for hated the non-intuitive GUI.

          Comment


          • #20
            Mark,

            Dolby has adopted Doremi's Log analyzer so things like critical SMART data is extracted and presented...if anything is out of spec, it is flagged. This is true of many other parameters. The Log analyzer, which as been ported over to handle DSS servers too. It covers a lot of ground in a relatively short period of time. That said, it goes for the low-hanging fruit rather than digging too deep. But, it will tell you if you are underflowing and that sort of thing without having to plow through the bazillions of logs that a Doremi system seems to take and stuff in odd places. I've made a point of always checking with it first because it can save you a lot of digging as it already knows about any common problems.

            http://loganalyzer.dolbycustomer.com/

            Comment


            • #21
              Originally posted by Mark Gulbrandsen
              GDC is always very fast at responding to.those requests...
              ...which is just as well, because their log packages are encrypted, meaning that I can't dive into them myself, even if I were prepared to have a go at it.

              Comment


              • #22
                Yes, the entire log package on all models have always been encrypted. But considering that I have had just two legimitate problems out of over 300 GDC servers that has not mattered. One had a bad stick of memory, and the other had a bad Super Micro M.B.

                Comment


                • #23
                  You are unique in your experiences, Mark. Encrypting the logs is BS.

                  Comment


                  • #24
                    Sony did the same beginning with some of their later SRX-R5xx/8xx software releases. It sucks.

                    Comment


                    • #25
                      Admittedly, it is a PITA.
                      Sony encrypted their log packages also, since SRX-R51X that is. (Carsten we were writing together.)

                      The good thing with dolby log analyser is its universal access (unlike Barco nowadays) and its pointers.
                      It's also great that instead of keeping the log file with you, you can instead bookmark the analysis' link.
                      It would be great, though if that "reallocated event count" was there already. Especially on the user interface.
                      I can't avoid the comparison with "system (RAID - Drive X - Reallocated Sectors)" counter on DSS.

                      As an update, I had a look on a log from about three weeks before the last and I found that those reallocated events on the drive in question, all 12 of them took place in the meanwhile.
                      So, there is no repurposing for that unit in any form or fashion. No good door stoppers were ever made out of 2.5" drives.

                      Comment


                      • #26
                        The IMS does show reallocated sector counts on the UI:

                        Screen Shot 2021-09-02 at 9.54.08 AM.png

                        Second line down (this is on an IMS1000 but it is true on the IMS2000 and IMS3000). Check out that temperature! It gets awfully hot in there!

                        Comment


                        • #27
                          Originally posted by Steve Guttag View Post
                          The IMS does show reallocated sector counts on the UI:

                          Second line down (this is on an IMS1000 but it is true on the IMS2000 and IMS3000). Check out that temperature! It gets awfully hot in there!
                          Yes, we are thinking of using sea water for cooling, like they did in Fukushima!
                          But we don't want to flood the Atlantic ocean with heavy-film-industry-waste.

                          I saw that line there, but it misses the following:

                          196 Reallocated_Event_Count 0x0032 100 100 000 - - - 12

                          and even though I can't fill you in with the difference between "Reallocated Event Count" and "Reallocated Sector Count", and how one can be 12 and the other null (doesn't an event create a sector?), it seems to make a difference in practice, as much as frame underflow is concerned. And due to that, it would be great if that value was apparent.

                          To demonstrate the difference in the logs, the full info is as follows:

                          Code:
                          === START OF READ SMART DATA SECTION ===
                          ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
                          1 Raw_Read_Error_Rate 0x000b 100 100 062 - - - 0
                          2 Throughput_Performance 0x0005 100 100 040 - - - 0
                          3 Spin_Up_Time 0x0007 240 240 033 - - - 94489280514
                          4 Start_Stop_Count 0x0012 100 100 000 - - - 326
                          5 Reallocated_Sector_Ct 0x0033 100 100 005 - - - 0             <-- This is shown
                          7 Seek_Error_Rate 0x000b 100 100 067 - - - 0
                          8 Seek_Time_Performance 0x0005 100 100 040 - - - 0
                          9 Power_On_Hours 0x0012 001 001 000 - - - 64968
                          10 Spin_Retry_Count 0x0013 100 100 060 - - - 0
                          12 Power_Cycle_Count 0x0032 100 100 000 - - - 318
                          191 G-Sense_Error_Rate 0x000a 100 100 000 - - - 0
                          192 Power-Off_Retract_Count 0x0032 099 099 000 - - - 249
                          193 Load_Cycle_Count 0x0012 100 100 000 - - - 326
                          194 Temperature_Celsius 0x0002 214 214 000 - - - 171799347228
                          196 Reallocated_Event_Count 0x0032 100 100 000 - - - 12         <-- This is not shown
                          197 Current_Pending_Sector 0x0022 100 100 000 - - - 0
                          198 Offline_Uncorrectable 0x0008 100 100 000 - - - 0
                          199 UDMA_CRC_Error_Count 0x000a 200 200 000 - - - 0
                          223 Load_Retry_Count 0x000a 100 100 000 - - - 0

                          Comment

                          Working...
                          X