Corrupt SD card (file system)

Started by dudo, January 21, 2014, 04:27:58 PM

Previous topic - Next topic

dudo

Hi everyone.

After few weeks of using A13 OLinuXino in an embedded device, there is a case of corrupt SD card / file system. Some files are missing or corrupt, so booting process to the final application isn't possible. The problem can be repeated. Here is the configuration:
1. A13 OLinuXino on SDHC
2. Debian Wheezy running Xorg and custom application
3. rootfs: ext4 rw
4. kernel 3.4.67
5. peripherals: asix based network card (from Olimex, the "blue" one), USB to serial adapter connected to one of USB host connectors, custom device connected to UEXT as COM1 (RS232 connection), LCD
6. power supply has enough current for all components

Some important notes:
a) custom application is appending (writing) to a file 1, 2 or 3 times per minute without any kind of sync call; file can grow tens of MB
b) frequent data communication over COM1 (UEXT)
c) !!sudden power off without shutdown!!
d) reinstalling of Debian makes SD card functional again

What could be the cause to corrupt file system and solution to avoid it?
Is it allowed to switch off such device without shutdown and what impact it can have on SD card even if there is journaling fs like ext4?

Thanks.

Lurch

Power loss without shutdown is about the best way to corrupt the system.  Some file systems can recover, but you should avoid this at all costs - add a LiPo battery (it will charge when power is there).
There was also a note in the forum about SDCard wear - you can redirect some stuff to RAM.

dudo

If we eliminate power switch off without shutdown, is there any other possibility to corrupt SD card in some short time of few weeks?
What about power source?
Any suggestions?

Thanks.

yt7pwr

On the PC bad power supply can cause hard drive bad sectors. But this is more mechanical problem due to inadequate power supply. Try to run fdisk command, check partitions and then run fsck.ext4 /dev/sdX (where X is your SD card). This command should repair any damage in file system.
You did not wrote in the first post how your app writes data to SD card: using track/sector (like dd command) or using Linux file system (/home/xxxx/yyyy) with open/close on every entry? Did you try different SD card?

dudo

Linux system file commands. The same corruption happened on few other SD cards from same manufacturer. I also noticed some strange behaviour of FTDI usb serial driver. It happens like usb serial cable was unplugged and again pluged without any physical action.

yt7pwr

Make a RAM disk and write your data to it for test. If your app do not use X server then you will have plenty free RAM for experiment. I would also try with ext3, less advance but more robust.
About FTDI strange behavior: did dmesg output contains any info about this issue?
For now all points to bad power supply (bad filtration, current drop when SD card is written or other peripheral demands more power). Do you have voltage and/or current meter (voltage filtration can be seen only with oscilloscope)? Last on the checklist is board itself. If your project is very serious you should have at least two boards.
Good luck  :)

dudo

Here is the part of dmesg dump concerning FTDI US to serial cable/driver. Those messages arrive when something is connected to one of 3 USB connectors. In different situations, ethernet driver can also be deregisterd and eth device is down, like FTDI driver. Very strange behaviour, also affecting SD card by [mmc...] messages shown in dmesg dump, but with which consequences? In this case, USB stick (storage) was plugged in, but can also be repeated with USB keyboard.
...
[ 3789.777269] [mmc-err] smc 0 err, cmd 25,  RTO
[ 3789.777286] [mmc-err] In data write operation
[ 3789.777296] [mmc-msg] found data error, need to send stop command
[ 3789.777312] [mmc-err] sdc 0 send stop command failed
[ 3789.777354] mmcblk0: timed out sending r/w cmd command, card status 0xc00900
[ 3789.777363] mmcblk0: command error, retrying timeout
[ 3790.422622] hub 2-1:1.0: port 2 disabled by hub (EMI?), re-enabling...
[ 3790.422985] usb 2-1.2: USB disconnect, device number 4
[ 3790.423397] ftdi_sio ttyUSB0: FTDI USB Serial Device converter now disconnected from ttyUSB0
[ 3790.423447] ftdi_sio 2-1.2:1.0: device disconnected
[ 3790.665745] usb 2-1.2: new full-speed USB device number 5 using sw-ehci
[ 3790.783900] ftdi_sio 2-1.2:1.0: FTDI USB Serial Device converter detected
[ 3790.784006] usb 2-1.2: Detected FT232RL
[ 3790.784016] usb 2-1.2: Number of endpoints 2
[ 3790.784025] usb 2-1.2: Endpoint 1 MaxPacketSize 64
[ 3790.784033] usb 2-1.2: Endpoint 2 MaxPacketSize 64
[ 3790.784041] usb 2-1.2: Setting MaxPacketSize 64
[ 3790.784767] usb 2-1.2: FTDI USB Serial Device converter now attached to ttyUSB0
[ 3791.163341] usb 2-1.3: new high-speed USB device number 6 using sw-ehci
[ 3791.310574] scsi0 : usb-storage 2-1.3:1.0
[ 3792.310288] sd 0:0:0:0: [sda] 3891200 512-byte logical blocks: (1.99 GB/1.85 GiB)
[ 3792.310890] sd 0:0:0:0: [sda] Write Protect is off
[ 3792.310907] sd 0:0:0:0: [sda] Mode Sense: 03 00 00 00
[ 3792.311634] sd 0:0:0:0: [sda] No Caching mode page found
[ 3792.311647] sd 0:0:0:0: [sda] Assuming drive cache: write through
[ 3792.317141] sd 0:0:0:0: [sda] No Caching mode page found
[ 3792.317161] sd 0:0:0:0: [sda] Assuming drive cache: write through
[ 3792.413266] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 3792.661725]  sda: sda1
[ 3792.665367] sd 0:0:0:0: [sda] No Caching mode page found
[ 3792.665385] sd 0:0:0:0: [sda] Assuming drive cache: write through
[ 3792.665398] sd 0:0:0:0: [sda] Attached SCSI removable disk
[ 3800.264980] [mmc-err] smc 0 err, cmd 25,  RTO
[ 3800.264996] [mmc-err] In data write operation
[ 3800.265006] [mmc-msg] found data error, need to send stop command
[ 3800.265022] [mmc-err] sdc 0 send stop command failed
[ 3800.265062] mmcblk0: timed out sending r/w cmd command, card status 0xc00900
[ 3800.265071] mmcblk0: command error, retrying timeout
[ 3800.659482] usb 2-1.3: USB disconnect, device number 6
...


Neither USB to serial cable is in use nor is /dev/ttyUSB0 opened.
To return to my original post, I'm having problems with corrupted file system and trying to find source of problem.

Thanks.

dudo

I would like to ask one more question. Is there any better file system solution than ext4 for such cases that I described in original (first) post?
I also noticed that mkfs.ext4 sets default value of "Maximum mount count" (from dumpe2fs) to -1 and "Check interval" to 0. This means there is no periodic partition fsck, but only in cases kernel detects it is needed so. Any comments?