Film-Tech Cinema Systems
Film-Tech Forum ARCHIVE


  
my profile | my password | search | faq & rules | forum home
  next oldest topic   next newest topic
» Film-Tech Forum ARCHIVE   » Operations   » Digital Cinema Forum   » Failed LMS/TMS

   
Author Topic: Failed LMS/TMS
Aaron Mange
Film Handler

Posts: 3
From: Tucson, Az, u.s.a
Registered: Sep 2016


 - posted 12-09-2019 05:00 PM      Profile for Aaron Mange   Email Aaron Mange   Send New Private Message       Edit/Delete Post 
Our IBM x3650M4 server running windows server 2008R2/ Cinedigm TCC has entered a reboot cycle. We are unable to boot from USB or disc, this included windows install, Linux live and Acrionis recovery discs and USB drives.

initially we received a BSOD APC index mismatch error. Nothing we have tried including outside support (sonic) has worked, before we drop an outrageous amount on a replacement , any last ideas I migh try?

 |  IP: Logged

Leo Enticknap
Film God

Posts: 7474
From: Loma Linda, CA
Registered: Jul 2000


 - posted 12-09-2019 05:25 PM      Profile for Leo Enticknap   Author's Homepage   Email Leo Enticknap   Send New Private Message       Edit/Delete Post 
If it won't boot anything, this says to me that this is a hardware problem, not a hard drive or corrupted software problem.

Maybe replace the BIOS battery, and pull and reseat the memory boards as a first step?

 |  IP: Logged

Aaron Mange
Film Handler

Posts: 3
From: Tucson, Az, u.s.a
Registered: Sep 2016


 - posted 12-09-2019 05:46 PM      Profile for Aaron Mange   Email Aaron Mange   Send New Private Message       Edit/Delete Post 
Thank you for the tips.
I was pretty sure it was a hardware issue as soon as it failed to booth any bootable disc or drive. Sonic has continued to suggest I re-image the device to resolve the issue. They fail to understand when I say "the device will not boot anything" and sent me the Acronis true image recovery to try anyway.

It is suggested I send the device in to be looked at, however I wonder how smart that would be given they cant even guess as to what the issue is. I feel wed be throwing money at a moving target.

 |  IP: Logged

Leo Enticknap
Film God

Posts: 7474
From: Loma Linda, CA
Registered: Jul 2000


 - posted 12-09-2019 06:07 PM      Profile for Leo Enticknap   Author's Homepage   Email Leo Enticknap   Send New Private Message       Edit/Delete Post 
Shipping one of those things is a significant chunk of change straightaway, and if they'd charge you to look at it even if they fail to diagnose the problem, that doesn't strike me as a very sensible option.

However, might there be a local IT tech in town who could test the PSUs and memory for you? A guy from a PC repair store should be able to do that, and that would likely be more cost effective than shipping a 50lb server long haul. Unless the mobo itself is pooped, it probably is cost effective to repair the server, if the bad component can be identified.

If you want to try yourself, power supply testers are pretty cheap, though if you can't even boot as far as the BIOS, testing the RAM is more tricky. If the server has more than one RAM board, you could try removing each of them in turn, to see if the computer will boot OK with one of them out. If it will, that's your culprit.

 |  IP: Logged

Marcel Birgelen
Film God

Posts: 3357
From: Maastricht, Limburg, Netherlands
Registered: Feb 2012


 - posted 12-09-2019 06:11 PM      Profile for Marcel Birgelen   Email Marcel Birgelen   Send New Private Message       Edit/Delete Post 
Hardware corruption can't be excluded, but I don't think this is a memory related problem, at least not if it's a server with proper parity-protected memory.

Booting Windows in Safe Mode also doesn't work?

Usually, if you press F8 right after the BIOS has finished its POST routine and Windows starts to load, you get a small boot menu that allows you to start Windows in safe mode.

If that works, then you can selectively start to disable devices in your system. I've seen this before on a desktop machine, where the sound driver was misbehaving.

 |  IP: Logged

Leo Enticknap
Film God

Posts: 7474
From: Loma Linda, CA
Registered: Jul 2000


 - posted 12-09-2019 06:12 PM      Profile for Leo Enticknap   Author's Homepage   Email Leo Enticknap   Send New Private Message       Edit/Delete Post 
But would an NFG sound card prevent a Linux distro and an Acronis image recovery CD from booting as well?

 |  IP: Logged

Aaron Mange
Film Handler

Posts: 3
From: Tucson, Az, u.s.a
Registered: Sep 2016


 - posted 12-09-2019 06:17 PM      Profile for Aaron Mange   Email Aaron Mange   Send New Private Message       Edit/Delete Post 
Safe mode, recovery mode, using an install disc to attempt a repair all failed. Ive decided to call in a local specialist to check out the system, thanks for the suggestion. I will however update everyone as to what the issue was, if its found and fixed.

 |  IP: Logged

Marcel Birgelen
Film God

Posts: 3357
From: Maastricht, Limburg, Netherlands
Registered: Feb 2012


 - posted 12-09-2019 06:20 PM      Profile for Marcel Birgelen   Email Marcel Birgelen   Send New Private Message       Edit/Delete Post 
quote: Leo Enticknap
But would an NFG sound card prevent a Linux distro and an Acronis image recovery CD from booting as well?
No, it shouldn't. But booting stuff from USB stick or other devices often fails for multiple other reasons, especially on servers. Reasons like incorrect BIOS settings or incorrectly formatted drives or incompatible blocksizes. Sometimes, when a RAID is involved, the RAID controller will register itself into the BIOS and boot from the array anyway.

Aaron also didn't state where the boot fails. Does it start loading the Linux kernel and does it panic somewhere halfway thru? Or doesn't it load at all and just try to boot Windows?

If it fails halfway thru loading the Linux kernel for example, the location where it fails may provide a clue of what's wrong with the system.

quote: Aaron Mange
Safe mode, recovery mode, using an install disc to attempt a repair all failed. Ive decided to call in a local specialist to check out the system, thanks for the suggestion. I will however update everyone as to what the issue was, if its found and fixed.
On what error did the Safe Mode fail?

What is that Recovery Mode you're talking about?

And what error did you get from the Install Disc?

 |  IP: Logged

John Thomas
Film Handler

Posts: 75
From: Boston, MA
Registered: Sep 2011


 - posted 12-09-2019 09:11 PM      Profile for John Thomas   Email John Thomas   Send New Private Message       Edit/Delete Post 
If you can get to the point of attempting Windows safe mode, you can attempt to boot from external media.

Remove the RAID controller card and see if that changes anything.

 |  IP: Logged

Steve Guttag
We forgot the crackers Gromit!!!

Posts: 12814
From: Annapolis, MD
Registered: Dec 1999


 - posted 12-10-2019 07:02 AM      Profile for Steve Guttag   Email Steve Guttag   Send New Private Message       Edit/Delete Post 
2019 has been a very bad year for me and equipment failure. I have changed out more motherboards this year than all previous years combined. As that seems to have been my "hammer" I'm seeing you as having a "nail."

I've changed out Motherboards on multiple brand servers too (SMS servers like Dolby and GDC), not TMS/LMS systems, per say. However, their symptoms resemble yours...don't boot up well...hang at random times, self reboot.

So far, in every case, changing the motherboard resolved most or all of the issue. In one instance, the software also needed to be reloaded (I attribute that to bad hardware allowing software to be corrupted as well). All of those servers are running, at this point and we are talking about probably 6 or 7 now.

I'm wondering if the effects of ROHS assembly/soldering isn't showing itself in long-term stability more so than outright component failure though I could be the odd capacitor that manifests itself that way. In my MB story above, they are not all the same model motherboards either so it wasn't like it was just a bad run.

I know what TMSes that run the TCC cost so it is probably worth the effort to change the memory (I haven't had that problem yet but one server company always seems to suspect it so there is something there...if it is outright bad, most of the time the server will "yell" a bit...if it just has issues, you get random failures) and motherboard. All in, it is typically much less than $1000 plus your time.

Another technique I've used in troubleshooting computer hardware is removing cards, one at a time, to see if boot up will change. I've had a mediablock suck down the power supply such that it prevented boot up as well.

 |  IP: Logged

Marcel Birgelen
Film God

Posts: 3357
From: Maastricht, Limburg, Netherlands
Registered: Feb 2012


 - posted 12-10-2019 08:41 AM      Profile for Marcel Birgelen   Email Marcel Birgelen   Send New Private Message       Edit/Delete Post 
quote: Steve Guttag
I'm wondering if the effects of ROHS assembly/soldering isn't showing itself in long-term stability more so than outright component failure though I could be the odd capacitor that manifests itself that way. In my MB story above, they are not all the same model motherboards either so it wasn't like it was just a bad run.
There has been some limited research into this and since we're now about 14 years later, I think we can conclude that RoHS did very much have a negative impact on equipment failure rates. Especially the first-generation solder replacements have proven to become brittle over time and others have severe corrosion issues. Also, we're still fighting the late effects of the capacitor plague, because some of those bad capacitors still made it into equipment which was produced way after 2007.

 |  IP: Logged

Mark Gulbrandsen
Resident Trollmaster

Posts: 16657
From: Music City
Registered: Jun 99


 - posted 12-14-2019 10:28 PM      Profile for Mark Gulbrandsen   Email Mark Gulbrandsen   Send New Private Message       Edit/Delete Post 
quote:
I know what TMS's that run the TCC cost so it is probably worth the effort to change the memory
Since 98% of my customers went with the GDC VPF program, because it was way easier to deal with, what I did for most of their racks because they are independents and looking to save a lot of $$$$ was to procure 2 year old server pulls. I found a reliable company in the South East to get them from. The reliability has been basically been at 99%, and they all spent their two year life in a server clean room... Have only lost two original Dell 2950 servers so far, and frankly because these servers are so inexpensive to buy, its not worth the time to trouble shoot what's wrong with the original unit. The minimum time they have lasted is 6.5 years and all but two of the 40 some odd systems I built with the Dell 2950's are still flying along just fine. Since they are NOT in a clean room they get cleaned out once a year. I've been installing Dell R-720 and R-730 (two so far) in place of the failed servers. They can be procured for $300 to $1000 with dual 6+ core processors and 20+ GB of ram depending on budget. I just move the OS drives over to an R-720 and add the foreign config to the PERC 6 and you got Windows back after a driver update. Then add the media drives as the second foreign config and that gets you 95% of the way back to on line status. I actually did the last one remotely with the help of another cinema tech at the other end who was in the area who moved the hard drives over for me.

Those that have large external IBM media storage arrays may find that those are not compatible with the Dell. I have not had an instance to try that yet... But since the model R-730 is also made in 16 and 24 drive versions you could also choose to go that route for more $$$$. Then you can toss the array in the trash along with the dead server.

I hated those IBM severs in the Cinedigm racks and fortunately only had three to deal with for about 5 years. You will find that most IT guys also hate IBM gear. The 8 minute long boot time for W-2008 is ludicrous. The Strong racks which used Dell's were way more responsive on boot up with W-2008.

Mark

 |  IP: Logged

Steve Guttag
We forgot the crackers Gromit!!!

Posts: 12814
From: Annapolis, MD
Registered: Dec 1999


 - posted 12-15-2019 11:15 AM      Profile for Steve Guttag   Email Steve Guttag   Send New Private Message       Edit/Delete Post 
Mark's math and estimates have always been a little off...2 out of 40 is 5%, not 2%...still low but he's off by over a factor of 2.

FWIW...all of my original Cinedigm systems are still operational, including three of the dreaded IBM/InSight systems, which I agree are more troublesome than the Dell ones put together by Strong. I attribute that to better hardware and Will York at Strong.

 |  IP: Logged

Mark Gulbrandsen
Resident Trollmaster

Posts: 16657
From: Music City
Registered: Jun 99


 - posted 12-15-2019 02:32 PM      Profile for Mark Gulbrandsen   Email Mark Gulbrandsen   Send New Private Message       Edit/Delete Post 
Well, it's actually two out of 46. I also installed a number of Dell 2900 towers at theaters that were only two or three screens.

Yes, Will York is an amazing IT guy...

Mark

 |  IP: Logged

Marco Giustini
Film God

Posts: 2713
From: Reading, UK
Registered: Nov 2007


 - posted 12-15-2019 04:45 PM      Profile for Marco Giustini   Email Marco Giustini   Send New Private Message       Edit/Delete Post 
As Steve says, remove all the extra boards, all the drives. Try loading a linux distribution on USB. If it does not work, then it's MB or CPU or Memory (reseat everything). If it works, add one component at a time until it stops working.

 |  IP: Logged



All times are Central (GMT -6:00)  
   Close Topic    Move Topic    Delete Topic    next oldest topic   next newest topic
 - Printer-friendly view of this topic
Hop To:



Powered by Infopop Corporation
UBB.classicTM 6.3.1.2

The Film-Tech Forums are designed for various members related to the cinema industry to express their opinions, viewpoints and testimonials on various products, services and events based upon speculation, personal knowledge and factual information through use, therefore all views represented here allow no liability upon the publishers of this web site and the owners of said views assume no liability for any ill will resulting from these postings. The posts made here are for educational as well as entertainment purposes and as such anyone viewing this portion of the website must accept these views as statements of the author of that opinion and agrees to release the authors from any and all liability.

© 1999-2020 Film-Tech Cinema Systems, LLC. All rights reserved.