A20 SOM DRAM Speed

Started by anthonym, May 26, 2016, 11:03:16 AM

Previous topic - Next topic

anthonym

Hi All,

I am working on a product using an A20 SOM (Rev D) and I am suffering from random crashes usually after about 5 hours.

I have built about 5 products and only one appears to suffer from this issue.

Having a good Google I am now suspecting that the issue may be to do with DRAM timings and would like to reduce the DRAM speed to 384MHz to see if my issue goes away.

I am struggling to do this I have tried re-compiling u-boot after changing the board defconfig and copying it to my SD card with no luck as meminfo still shows 480Mhz.

Anyone got any ideas as to where the A20 SOM gets its DRAM timings when booting from an SD card?

Thanks in advance.


JohnS

linux-sunxi.org has data (such as about BROM)

After that, read uboot source / boot only into uboot so you can check timings with a scope / logic analyser.

Once you have the timings right you can boot Linux and check again.

John

LubOlimex

Hey,

The RAM memories of A20-SOM boards from hardware revision D should work flawlessly at 480Mhz. Consider that something else might be the root of the hangs.

Can you tell me if the issue persists with these two images (suitable for A20-SOM boards from hardware revision D):

1) mainline uboot, sunxi kernel 3.4.103+, Debian Jessie file system: https://www.olimex.com/wiki/images/c/cc/A20-SOM_Rev_D_mainline_uboot_GMAC_master_sunxi_kernel_3.4.103_jessie_NAND_rel_6.torrent

2) sunxi uboot, sunxi kernel 3.4.90+, Debian Wheezy file system: https://www.olimex.com/wiki/images/9/97/A20-SOM_Rev_D_debian_3.4.90_camera_release_3.torrent

3) I'd recommend you to also try these two suggestions to improve stability:

3.1) Lower the A20 speed to 960Mhz. By default we set it to 1008MHz.

3.2) Enable "ondemand" CPU governor. It is set to "performance" by default (performance means to always force 1008MHz even when there is nothing going on).

Both of the above suggestions can be performed with commands for testing or editing rc.local in order to load the values at boot (typically "nano /etc/rc.local").

You can find supported frequencies with: "dmesg | grep freq".

You can check currently set frequency with: "cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq".

You can set new frequency with: "echo 960000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq".

You can check currently set governor behavior with: "cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor".

You can set new frequency with: "echo ondemand > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor".

Best regards,
Lub/OLIMEX
Technical support and documentation manager at Olimex

anthonym

Thanks,

Will try as you have suggested and reduce the CPU speed.

I will post again to let you know the result once the board has been left to run for a while.

Best regards,

Anthony

anthonym

Hi,

I tried setting the Max CPU speed to 960Mhz as suggested and unfortunately the board fell over again sometime during the night.

I do not wish to enable the CPU ondemand governor as I am concerned that it will only mask the problem (there will be periods of time when our application will likely make the CPU ramp up to max).

Is there any point in me trying the default images given that I have 4 other units performing perfectly?

Any help you can give would be greatly appreciated.

Best regards,

Anthony

LubOlimex

Just test with this one: https://www.olimex.com/wiki/images/9/97/A20-SOM_Rev_D_debian_3.4.90_camera_release_3.torrent

We are getting reports of unstable behavior for some of the A20 boards when using newer kernels. Even the 3.4.103 sunxi kernel causes random freezes and crashes. Testing with the image above would clear any reason to believe it is a software or configuration issue.

Of course, if 4 boards identical to the faulty one, placed in the same setup and conditions (with the same hardware setup and card) work without problems then it is probably a hardware problem with the 5th board (the one that doesn't work reliably).

Best regards,
Lub/OLIMEX
Technical support and documentation manager at Olimex

aventuri

hello guys,

i'd like to raise on top this thread because we are too experiencing random "sudden death" of A20SOM when installed on our carrier designs (we have two, one simpler and one more complex dealing externally with low speed signals, up to 12MHz for I2S)

under lighter loads, everything goes fine, so our main business case would be safe but..

under stress test loads (actually a couple of memtester processes to keep both cores busy), after some (few) hours, the device will FREEZE, without no msg on serial console or dmesg or logs..

the kernel looks like doesn't have the "time" to say something.

we started seeing this issue on our kernel 4.7.x mainline, then we moved on a legacy Debian based distro with kernel 3.4.108, for raising more data on the case. still this issue!

we tested 5 different A20 SOMs on different carrier boards; it's not easy to pinpoint the right mix of HW/SW that DOESN'T trigger the fault. usually it happens.

the temp sensor in the SOC, available on 4.7.x kernel, says max 52* Celsius degree and it's consistent with an external temp meter (when turning on/off the SW loads, they both move between 40-52*C).

when testing the 3.4.x legacy kernel, we just found the temp sensor of the AXP209 and it still says a temp between 45-50 *C.

we tested both open or closed case, no change. anyway the temp looks like not to be the issue.

we DO NOT stress the AXP209 setup for SOC voltage neither the memory cfg; we are using the stock cfg available in uboot 2016/kernel 4.7 DTS. we are looking for stability, not higher performances.

a couple of SOMs looks like can keep up with the stress test. the three others are failing. that's not a great score.. :-|

finally we found this thread and we indeed tested the suggested distro with k3.4.90:
https://www.olimex.com/wiki/images/9/97/A20-SOM_Rev_D_debian_3.4.90_camera_release_3.torrent

this indeed looks like to run stable on a couple of SOMs and HW carriers we found previously troublesome.

now, after the introduction, our questions:

* is this "random death" of A20SOM still present in Olimex radar, after the May2016 last post?

* did you correlate the freeze with higher CPU loads (we cannot evince from the thread's posts)

* is there a deeper/better explanation about the crash root cause, at least relative to Olimex post as of last May that talks about "changes between kernels"?

* did you pinpoint the SW changes who led to freezes, between 3.4.90 and 3.4.103? it should not be too difficult, as the changes introduced in the "ancient" 3.4.x should not be so huge.

that's all for my first post about this topic.

let me tell you we are not looking for a "solution ready to go NOW!" (..that would be really too good to be true, provided EVERY design using a SOM is introducing so many variables that 'could' make a difference..)

BUT we'd just like to understand if the other freeze cases you observed could lead to some kind of better indication where we need to dig more, on such kind of troubles.

bests

andrea

aventuri

sorry if i reply to myself but we found a way to keep the SOM stable for a day with the stress test load and want to share the hint to other who could slip into this issue.

Put the uboot RAM settings to 360MHz and play on the safe side, as written here:

https://linux-sunxi.org/Mainline_U-boot#DRAM_Settings

we made the switch on our uboot. '''a10-meminfo-static''' confirm the 360MHz clock settings and the system is stable under stress test.

BTW the max temp under load is ~56 Celsius degree:

[02:86:04:c2:1e:ca] # cat /sys/devices/virtual/thermal/thermal_zone0/temp
55700

we do not see significant impact on performance for our use case.

our systems are running with a 30% of CPU load with a couple of processes that DO NOT stress the memory (steady 100MB busy on 1GB available).

as we look for stability instead of "pure performance", we are pretty fine this way.

BTW changing/downclocking the CPU with cpufreq-set without changing the DRAM CLK speed DOESN'T improve the reliability.

i'm wondering why the u-boot default defconfig for A20-SOM push for the 480MHz setting on DRAM clock. this SOM is supposed to target embedded setups who should run 24/7

are we the only ones experiencing this hiccups?

bests

Andrea