LIME2 Rev.L lock-up freeze instability

Started by kimfaint, June 13, 2023, 04:56:47 AM

Previous topic - Next topic

faraz

So, more people are facing the same issue. I am not alone.
What solved my problems partially, was I masked the reboot and shutdown commands.
I have good uptime on one site and issues on the other.
Not sure what revision board is on the site that's been up (uptime almost a year now). The site that's going off is using Rev L boards.

I am going to implement the scaling option suggested above and see if that helps. and also learn how I can configure a watchdog to reboot the system.
I am also logging current_now, voltage_now and Temperature_now values.

LubOlimex

Do you boot from the eMMC?

Can you check what is the revision of the stable board?

Is your Linux image based on Olimage?

How do you power the board and do you have a Li-Po battery?

Differences between revision K and latest L1 had been quite significant. Most notably the eMMC and the RAM are different, so it is probably something about support of those two in your image. You can check hardware changes here:

https://github.com/OLIMEX/OLINUXINO/blob/master/HARDWARE/A20-OLinuXino-LIME2/hardware_revision_changes_log.txt
Technical support and documentation manager at Olimex

weszta-t

I can confirm KriszK solution.
We had the same freeze problem, but changing the governor to peformance solved
the problem. :)

mbosschaert

I'm also still having these freezes with all of my Rev.L boards when a SATA HD (Or SSD) is connected, this is irrespective of the Olimex-debian version, the power supply, lipo connected, etc.

From this conversation I understand that Rev.K2 used to be stable with this respect.

Most important changes to consequtive versions of the board concern RAM, eMMC and the power supervisors. As I'm not using EMMC, and the suggestion that it is a power-handling problem make me consider to replace the power-supervisor component back to the one which is used in Rev.K2. Reading the revision document however confuses me a bit. On line 201 (Rev.K1) is stated that MCP121T475I-TT is used, however on line 215 (Rev.L) is stated that U14 is changed from MCP120T475I-TT to MCP121T475I-TT. The latter suggests that Rev.K1 has MCP120T475I-TT on U14. Or did I miss something here?

@LubOlimex, could you confirm this?

As around U14 there is sufficient space on the board I think exchanging this component should be possible.

Does this sound as test worthwile or is the idea not realistic or completely insane... ?

LubOlimex

Compare the BOMs (Bill-Of-Materials) to confirm what is the part. But I doubt change in MCP120T475I-TT to MCP121T475I-TT can be any sort of issue.

Improper timings for the RAM memory are more likely. Are the boards and revisions properly listed at start of boot process? Like does it say:

CPU:   Allwinner A20 (SUN7I)
ID:    A20-OLinuXino-LIME2 Rev.L
Technical support and documentation manager at Olimex

mbosschaert

Quote from: LubOlimex on May 09, 2024, 02:48:31 PMCPU:   Allwinner A20 (SUN7I)
ID:    A20-OLinuXino-LIME2 Rev.L

It identifies like this:
CPU: ARMv7 Processor [410fc074] revision 4 (ARMv7), cr=10c5387d
OF: fdt: Machine model: Olimex A20-OLinuXino-LIME2

I'm running an up to date olimex debian bullseye

LubOlimex

But what does it say at start of boot over the serial? It is important that the revision is properly listed as revision L. Often people would delete the contents of the EEPROM (thus disabling automatic board recognition by the Olimage) and then also fail to manually configure the board via the u-boot-tools and then the board would load some generic revision A preset that might have settings incompatible with the actual board revision used. Like RAM timings, different LAN controller and so.
Technical support and documentation manager at Olimex

aleix

Hello. Here we have about 30 A20's, all version L (5 of them without emmc), but the other 25 have emmc and they are the industrial version.

All of them have been tested with the OS on uSD cards, all class 10 or higher.

With the image provided, all updated, freezes are really common. Typically within 3 weeks (sometimes a few days), the board freezes without any trace in logs or kernel messages.

With Alpine Linux installed, as provided by this phenomenal script by user unicorn, boards either never freeze or they do only very occasionally.

Factors that we have seen as freeze-frequency increasing are: unstable voltage, very long wires (antenna effect). But eventually, with Olimage installed, they always freeze in less than a month.

Lately, things have improved by following these approaches with Olimage (many are commented in this thread already):

```
# cpufreq-set -r -g performance
# echo 960000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
# echo 960000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
```

The command `cpufreq-set` does not produces permanent changes, though. Also, the 960000 could be also replaced with 816000 (as recommended here), but for now we haven't needed that.

Another approach that seems to improve instability is installing the OS to the internal emmc, with the `olinuxino-sd-to-emmc`. Good statistics with these attempts are difficult, so drawing false conclusions is easy. However, it would seem that this reduces the freezing. But we have already got some freezings even with all these settings (though 816000 not tried yet).

We are now experimenting with this approach: to create a swap space:

```
# fallocate -l 150M /swapfile
# chmod 600 /swapfile
# mkswap /swapfile
# swapon /swapfile
```

To make it permanent:

```
# echo '/swapfile none swap sw 0 0' >> /etc/fstab
```

For now (though soon to say), no freezes with this swap space have been caught (after a week). Interesting fact: though usually swap remains at 0, we have already caught some events where swap space has around 1 MB used. After that, we perform

```
# swapoff -a && swapon -a
```

to liberate swap space again. Perhaps this is a naïve approach. But could it be that RAM, which is usually at

```
$ free -mh
               total        used        free      shared  buff/cache   available
Mem:           998Mi        46Mi       829Mi        10Mi       122Mi       917Mi
```

sometimes spikes (don't know why), so that it makes the board freeze?

Do you think this approach could lead to something?