Problem with some boards (One might be EDID, other unknown)

Started by JeroenV, January 14, 2014, 04:07:55 PM

Previous topic - Next topic

JeroenV

Hi, we're using the A20 boards with Debian. They're the heart of a system of two devices and a 4 line display that are all controlled by software I wrote and which automatically starts up with Linux start.

A system at a customer stopped working after a day or two (it had a problem, they tried to restart it by powering it off and on again several times, but nothing happened), so a technical specialist went by with some spare parts and he got the system working again once when both the SD-card and the board were replaced and as he wanted to know which one of the two caused the system hanging, he replaced one after another never getting the system to work again. Later a third board was used, brought by someone else, to get the system back up.

I've put both boards failing at that customers system in my test environment, started them up 4 times to see what happens and then started up again with a board that always works fine to copy the log folders to a USB-stick and finally to my laptop. I've looked for differences in the dmesg files:

board 1
1. (1-dmesg.3) starts up fine and stays working for several minutes before I reboot the system
2. (1-dmesg.2) doesn't start up
3. (1-dmesg.1) starts up fine and stays working for several minutes before I reboot the system
4. (1-dmesg.0) starts up fine and stays working for several minutes before I power down the system

board 2
1. (2-dmesg.3) start up fine, hangs anyway after a couple minutes [and I coupled another mouse to the USB hub as the first one didn't work very well]
2. (2-dmesg.2) doesn't start up
3. (2-dmesg.1) doesn't start up
4. (2-dmesg.0) doesn't start up

The one major difference I see in the dmesg logging of board 1 points at a problem with initializing EDID, as the following lines are present in 3 out of 4 log-files, but not in 1-dmesg.2:
[    3.405766] ParseEDID
[    3.422719] EDID version: 1.3
[    3.432020] PCLK=146250000 X 1680 1784 1960 2240 Y 1050 1053 1059 1089 fr 59 NP
[    3.445040] usb 2-1.1: new high-speed USB device number 3 using sw-ehci
[    3.455239] Patching 146250000 pclk to 146000000
[    3.463775] Using above mode as preferred EDID mode
[    3.474729] disp_clk: Could not find a matching pll-freq for 146850000 pclk
[    3.487530] disp_clk: Could not find a matching pll-freq for 26150000 pclk
[    3.500393] disp_clk: Could not find a matching pll-freq for 146250000 pclk
[    3.513093] asoc: sndhdmi <-> sunxi-hdmiaudio.0 mapping ok

Sadly there's not such an easy to spot difference in the dmesg logging of board 2, so I wonder what is going on there.

As I didn't know what EDID stands for I looked it up on wikipedia. As they're Linux drivers for commnication with displays, I'm not surprised my software doesn't start when Linux starts up without EDID; the first component initialized is the display as it is used to show status information about the two devices.

Does anyone have got any idea why EDID functionality can become unstable? Can it be wiring? Unstable power supply? A connection not precisely enough soldered? etc.

And any idea on where to look for a clue on what is going wrong on board 2?

Actually another customer called today saying their system isn't starting up anymore after restart where it has been working fine a couple of days and I certainly don't hope way more boards will start giving hanging problems.

P.S.: I wanted to attach the 8 log-files just in case someone might find that useful, but under the "Attachments and other options" there's, strange enough, no attachments option ...

JeroenV

Actually there was a third board in a reject bin and it did the following when starting it 4 times in my test environment:
1. starts up
2. doesn't start up
3. doesn't start up
4. starts up

Again, there's not a substantial difference in the dmesg log files between the two times the system did start up and the two times it didn't ...