Film-Tech Cinema Systems
Film-Tech Forum ARCHIVE


  
my profile | my password | search | faq & rules | forum home
  next oldest topic   next newest topic
» Film-Tech Forum ARCHIVE   » Operations   » Digital Cinema Forum   » Fixing recursive error but reboot is needed?

   
Author Topic: Fixing recursive error but reboot is needed?
Dustin Grush
Film Handler

Posts: 14
From: Johnstown, PA, USA
Registered: Apr 2018


 - posted 07-30-2019 11:04 PM      Profile for Dustin Grush   Email Dustin Grush   Send New Private Message       Edit/Delete Post 
Hello All,
Upon starting our Doremi Showvault tonight the boot sequence stopped and at the bottom of the screen displayed "Fixing recursive error but reboot is needed!" A quick look on the forum did not have any results for that but another post suggested to remove any USB drives or Ingest drives and reboot. I had forgotten to take out the latest Trailmix drive so I removed that and did a hard reset and everything started like normal. It had been in since Friday and the server had three successful boots since then (Drive-In, one double feature nightly with shutdown after). I have also been getting "SMART values for dev/sde updated" warning but when I check the control panel all are showing OK.

Is there anything I should be checking or planning to replace? This is our fifth year with this system. I have e-mailed out tech but haven't received a reply yet. Just figured it wouldn't hurt to check here also. Thanks in advance.

 |  IP: Logged

Dave Macaulay
Film God

Posts: 2321
From: Toronto, Canada
Registered: Apr 2001


 - posted 07-31-2019 12:09 AM      Profile for Dave Macaulay   Email Dave Macaulay   Send New Private Message       Edit/Delete Post 
I do not know what causes this but I've seen it several times. A reboot has always resolved it. It is not due to drives being left connected.
I did ask Dolby about it but I don't have their reply now, as I recall it wasn't super helpful... just to reboot.

 |  IP: Logged

Marcel Birgelen
Film God

Posts: 3357
From: Maastricht, Limburg, Netherlands
Registered: Feb 2012


 - posted 07-31-2019 12:35 AM      Profile for Marcel Birgelen   Email Marcel Birgelen   Send New Private Message       Edit/Delete Post 
It's a Linux kernel thing, often related to problematic device drivers. It often comes in pairs, as in, you can often find another error in your logs. I guess something triggers the kernel to do an infinite recursive operation and the kernel detects it and aborts it, which is a good thing, because otherwise you would have a hanging system.

It can actually be triggered by faulty hardware like a faulty external drive. In some cases, the occurrence is an indication of dying hardware.

 |  IP: Logged

Carsten Kurz
Film God

Posts: 4340
From: Cologne, NRW, Germany
Registered: Aug 2009


 - posted 07-31-2019 10:00 AM      Profile for Carsten Kurz   Email Carsten Kurz   Send New Private Message       Edit/Delete Post 
Obviously a rare error, but not strictly Doremi/Dolby related, but a generic Linux one. What version of the Doremi/Dolby software are you running on this machine?

I wouldn't be too concerned. Do create a detailed report on the ShowVault (in Diagnostic Tool -> 'Detailed Report') to a USB stick, and upload it to Dolbys Log Analyzer to be on the safe side. It's free and can not harm. If you haven't done it before, this is a good time to learn it.

http://loganalyzer.dolbycustomer.com

- Carsten

 |  IP: Logged

Dustin Grush
Film Handler

Posts: 14
From: Johnstown, PA, USA
Registered: Apr 2018


 - posted 07-31-2019 06:41 PM      Profile for Dustin Grush   Email Dustin Grush   Send New Private Message       Edit/Delete Post 
Thanks to all who replied above. I took Carstens advice and ran the log analyzer when I got in this evening and the results are below. Everything booted uneventfully. I have yet to try to ingest another CRU drive. I substituted the theatre name with an * to protect the guilty [Big Grin] Any thoughts are greatly appreciated. I figured it would be a good idea to put this here to help the next person:
ingestc Log file Truncated by 'cleanlog.sh'
in drmreport/doremi/log/ingestc.log :
Line 1:[Mon Jul 29 20:34:21 EDT 2019] *** Truncated by 'cleanlog.sh' ***

The system had to truncate the above log files. As a consequence some logged information has been erased.
Drive unhandled error
in drmreport/doremi/log/kern.log :
Line 7766:Jul 31 01:02:12 2019 * kernel: sd 7:0:0:0: [sdh] Unhandled error code

Line 7777:Jul 31 01:02:12 2019 * kernel: sd 7:0:0:0: [sdh] Unhandled error code

An unhandled error occurred on a drive.
CRU Failure (frozen)
in drmreport/doremi/log/kern.log :
Line 5475:Jul 28 19:50:41 2019 * kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen

Line 5498:Jul 28 20:00:04 2019 * kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x4040000 action 0xe frozen

---Line 5546:Jul 28 21:42:15 2019 * kernel: ata5: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen---(This one is odd. There was no CRU drive in the bay at this time. ) The drive in the CRU is not responding. This could mean a drive issue or a CRU reader issue.
Note: Drive sde is the drive in the ingest CRU as well known as ata5 in the system.
Note: On some units the CRU may be plugged to the ata6 motherboard socket in this case this error would mean a CD-ROM issue.
It took a bit longer to shut down last night, screen said something about USB, dont remember exactly
Bad connection between the motherboard and an unknown hdd
in drmreport/doremi/log/kern.log :
Line 7772:Jul 31 01:02:12 2019 * kernel: Buffer I/O error on device sdh1, logical block 2

Line 7773:Jul 31 01:02:12 2019 * kernel: Buffer I/O error on device sdh1, logical block 3

Line 7783:Jul 31 01:02:12 2019 * kernel: Buffer I/O error on device sdh1, logical block 32

Line 7784:Jul 31 01:02:12 2019* kernel: Buffer I/O error on device sdh1, logical block 33

Line 7786:Jul 31 01:02:12 2019 * kernel: Buffer I/O error on device sdh1, logical block 4

Line 7787:Jul 31 01:02:12 2019 * kernel: Buffer I/O error on device sdh1, logical block 5

Line 7788:Jul 31 01:02:12 2019* kernel: Buffer I/O error on device sdh1, logical block 6

Line 7789:Jul 31 01:02:12 2019 * kernel: Buffer I/O error on device sdh1, logical block 7

Line 7790:Jul 31 01:02:12 2019 * kernel: Buffer I/O error on device sdh1, logical block 8

Line 7791:Jul 31 01:02:12 2019 * kernel: Buffer I/O error on device sdh1, logical block 9

An I/O errors have been detected in the Kern.log, this log is monitoring all hardware connected to the motherboard.
An I/O error means one of the drives is faulty and this normally means the drive must be replaced.
Bad connection between the motherboard and an external USB hdd sdh
in drmreport/doremi/log/kern.log :
Line 7771:Jul 31 01:02:12 2019 * kernel: end_request: I/O error, dev sdh, sector 2064

Line 7782:Jul 31 01:02:12 2019 * kernel: end_request: I/O error, dev sdh, sector 2304

An I/O errors have been detected in the Kern.log, this log is monitoring all hardware connected to the motherboard.
An I/O error means one of the drives is faulty and this normally means the drive must be replaced.
sdh is an external drive connected to the system using USB.
NTP no server suitable for synchronization found
in drmreport/doremi/log/time.log :
Line 3714:[Mon Jul 29 20:34:37 2019][ERROR]: ntpdate: no server suitable for synchronization found

Line 3716:[Mon Jul 29 20:34:44 2019][ERROR]: ntpdate: no server suitable for synchronization found

Edit the configuration file /doremi/etc/ntpservers and specify a valid NTP server under the variable: NTPSERVERS="" Found a NTP server close to your location at www.pool.ntp.org, or if you have an TLMS, set the target at the IP address of the TLMS server. To make sure the server is available from the Doremi server ping it from a terminal window.
It is highly recommended to use NTP, if the server drifts and RTC goes outside the allowed amount it is not covered under warranty
I have since deleted a few things to get below 85%.
RAID Partion md0 has been overused
in drmreport/doremi/log/sensors.log :
Line 10682:2019-07-24T20:36:23-04:00,1564014983,STOR0,active,96%,active,active,active

Line 10696:2019-07-25T20:49:17-04:00,1564102157,STOR0,active,96%,active,active,active

Line 10707:2019-07-26T20:06:48-04:00,1564186008,STOR0,active,96%,active,active,active

Line 10719:2019-07-27T20:56:40-04:00,1564275400,STOR0,active,87%,active,active,active

Line 10730:2019-07-28T19:48:49-04:00,1564357729,STOR0,active,87%,active,active,active

Line 10744:2019-07-29T20:34:45-04:00,1564446885,STOR0,active,89%,active,active,active

Line 10759:2019-07-30T20:38:18-04:00,1564533498,STOR0,active,89%,active,active,active

Thanks again.

 |  IP: Logged

Leo Enticknap
Film God

Posts: 7474
From: Loma Linda, CA
Registered: Jul 2000


 - posted 07-31-2019 07:52 PM      Profile for Leo Enticknap   Author's Homepage   Email Leo Enticknap   Send New Private Message       Edit/Delete Post 
This is a wild guess, but it worked for me once in the past, with a DCP2K4 that was doing weird s*** similar to this (e.g. complaining that a CRU drive was causing it to lock up, when the CRU bay was empty and had been since the previous reboot), and for which no rational explanation could be found.

Replacing the BIOS settings battery on the motherboard fixed it.

I tried this, because my experience of PCs is that they can become unstable, throw random BSODs, etc., if they're running with a totally dead BIOS battery, and I happened to have spare BR2032s and CR2032s with me. As soon as I put a new one in, all the problems went away.

I now replace them as a matter of routine whenever I open up an old school server, unless I know that it has been replaced within the last year.

 |  IP: Logged

Dustin Grush
Film Handler

Posts: 14
From: Johnstown, PA, USA
Registered: Apr 2018


 - posted 07-31-2019 08:10 PM      Profile for Dustin Grush   Email Dustin Grush   Send New Private Message       Edit/Delete Post 
Leo,
Thanks for the suggestion. I'm willing to give it a shot. Do you or anyone know off hand what type and size battery? And, anything special to replacing it. I've never done one before. Pop in and out? Is this the same as the motherboard battery or should that one be replaced as well?
Thanks in advance.

 |  IP: Logged

Leo Enticknap
Film God

Posts: 7474
From: Loma Linda, CA
Registered: Jul 2000


 - posted 07-31-2019 08:49 PM      Profile for Leo Enticknap   Author's Homepage   Email Leo Enticknap   Send New Private Message       Edit/Delete Post 
I mean the motherboard BIOS/CMOS battery, not the one on the Dolphin media block, if your server has one. There is a very specific procedure for replacing this, which can brick your media block if it isn't followed. I would suggest not touching this one.

I can't remember finding any PC or server motherboard with a BIOS battery in it that isn't either a CR2032 or a BR2032. Either will work as a replacement: I would suggest the BR if your server spends long periods of time powered down, and a CR if it is powered up most of the time. Their chemistry is slightly different: BRs are designed for low current drain over a long period, whereas CRs are optimized for occasional bursts of power draw.

Note - you may wish to go into the BIOS settings screen and make a note of anything that isn't default before swapping the battery; though I have to confess that I've replaced Doremi batteries having forgotten to do this, and they booted OK afterwards and worked without any problems.

 |  IP: Logged

Dustin Grush
Film Handler

Posts: 14
From: Johnstown, PA, USA
Registered: Apr 2018


 - posted 07-31-2019 09:03 PM      Profile for Dustin Grush   Email Dustin Grush   Send New Private Message       Edit/Delete Post 
Thanks again. They are on order. This unit has an IMB and I've done that battery already, following the procedure, nervously but successfully, twice. Oddly enough we were going to change the server battery at our last service but I forgot to order them and it never got done. [Roll Eyes]

 |  IP: Logged

Leo Enticknap
Film God

Posts: 7474
From: Loma Linda, CA
Registered: Jul 2000


 - posted 07-31-2019 09:13 PM      Profile for Leo Enticknap   Author's Homepage   Email Leo Enticknap   Send New Private Message       Edit/Delete Post 
The IMB battery is much easier than the Dolphin (old school Doremi SDI media block) board: you don't have to attack it vertically (or go to the hassle of taking the server totally out of the rack in order to be able to put it on its side and work with the battery holder horizontal), and can ease the battery out with a spudger, rather than have to mess around with a plastic straw over the spring clamp to avoid the risk of accidentally shorting the two contacts as you remove the battery from the holder (a short would nuke the certificate and brick the board). Doing all that in three minutes, plus getting the new one in, is not something I look forward to. I haven't lost a Dolphin yet, though.

 |  IP: Logged

Dave Macaulay
Film God

Posts: 2321
From: Toronto, Canada
Registered: Apr 2001


 - posted 07-31-2019 09:29 PM      Profile for Dave Macaulay   Email Dave Macaulay   Send New Private Message       Edit/Delete Post 
I have the server plugged in (just one power cord) so that standby power is on, this keeps the Dolphin RTC and certs alive (according to one of the Doremi support guys). Takes the worry out of changing the battery even when a server is well over the "change battery now" date. Just don't drop a battery into the semi powered motherboard, a tissue spread below it offers some protection.
Also good for replacing the motherboard battery - this keeps the BIOS settings while there's no battery.

 |  IP: Logged

Carsten Kurz
Film God

Posts: 4340
From: Cologne, NRW, Germany
Registered: Aug 2009


 - posted 08-01-2019 03:57 AM      Profile for Carsten Kurz   Email Carsten Kurz   Send New Private Message       Edit/Delete Post 
I even do Dolphin battery swaps fully powered. I put some plastic sheet below the battery holder to prevent the battery from dropping onto conductive parts. Also, I use a small piece of sticky tape and plastic pliers to pry out the battery carefully.

Isn't that 'frozen error' about sd7 normal when no drive is inserted? Also note that the log also contains older events, not just the current state. So, if a drive had issues at some past point in time, they may still turn up as log analyzer events, however, stamped with the actual date.

- Carsten

 |  IP: Logged

Marcel Birgelen
Film God

Posts: 3357
From: Maastricht, Limburg, Netherlands
Registered: Feb 2012


 - posted 08-01-2019 04:56 PM      Profile for Marcel Birgelen   Email Marcel Birgelen   Send New Private Message       Edit/Delete Post 
No, it's a warning you're about to ingest a very profitable, yet terrible Disney movie.

But let me let it go...

Sorry, bad humor day today.

Yes, the "frozen" message is normal when the CRU bay is empty and nothing particular to worry about.

 |  IP: Logged



All times are Central (GMT -6:00)  
   Close Topic    Move Topic    Delete Topic    next oldest topic   next newest topic
 - Printer-friendly view of this topic
Hop To:



Powered by Infopop Corporation
UBB.classicTM 6.3.1.2

The Film-Tech Forums are designed for various members related to the cinema industry to express their opinions, viewpoints and testimonials on various products, services and events based upon speculation, personal knowledge and factual information through use, therefore all views represented here allow no liability upon the publishers of this web site and the owners of said views assume no liability for any ill will resulting from these postings. The posts made here are for educational as well as entertainment purposes and as such anyone viewing this portion of the website must accept these views as statements of the author of that opinion and agrees to release the authors from any and all liability.

© 1999-2020 Film-Tech Cinema Systems, LLC. All rights reserved.