Volevo conoscere lo stato di salute del disco esterno ma ho ucciso Docker

Since I am using a very old 1TB HDD (15 years old?) I am sometimes scared it could fail from one day to another. Today I wanted to see if I could find some symptoms using a software-based solution. DuckDucking a bit I found this article(archived) that talks about smartctl.

I gave it a try and, after installing it using apt, I ran: sudo smartctl -i /dev/sda and it returned:

smartctl 7.2 2020-12-30 r5155 [aarch64-linux-5.15.84-v8+] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SAMSUNG
Product:              HD103UJ
User Capacity:        1.000.204.886.016 bytes [1,00 TB]
Logical block size:   512 bytes
Serial number:        152D20329000
Device type:          disk
Local Time is:        Sun Nov  5 14:36:25 2023 CET
SMART support is:     Unavailable - device lacks SMART capability.

Sadly, my old HDD does not support SMART :(. I don’t know what actually happened but after this command the HDD became read-only and all my docker containers started to fail (my docker uses the HDD and not the SD for its data). I could use the services but any actions that had to require a write was failing (in one of the container logs I could read something like read-only file system).

I first tried the old way: let’s restart the containers. But docker-compose down did not succeed and the logs contained a tons of line like these:

ERROR: for self-hosted_mongodb_1  container da0f4865943a7c338660006e857dfada61b1fff08d1fb02585e10737f13a5270: driver "overlay2" failed to remove root filesystem: unlinkat /mnt/ext-drive/docker-data/overlay2/d893045b5f5181a43d51abe2989fbc627c89e8dedb7026606902086f6f399ff3: read-only file system
Removing network self-hosted_default
ERROR: error while removing network: network self-hosted_default id 114fecd466ee36774d0bef247ba8127dbab79c9ce25f091bae3a8d3480337546 has active endpoints

At this point, I was a little bit scared. I did not want to restart the Pi because I thought Docker would not be able to restart anymore and it could have blocked the Pi startup. I searched for something online and one of the Stackoverflow answers made me even more scary. My dmesg output was not great (no colors here, but of these lines were red):

[23542.142493] EXT4-fs warning (device sda1): ext4_end_bio:348: I/O error 10 writing to inode 16129270 starting block 7139300)
[23542.142566] Buffer I/O error on device sda1, logical block 7139042
[23542.142588] Buffer I/O error on device sda1, logical block 7139043
[23544.614305] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s
[23544.614326] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x2a 2a 00 20 9d 4a b8 00 00 08 00
[23544.614332] blk_update_request: I/O error, dev sda, sector 547179192 op 0x1:(WRITE) flags 0x0 phys_seg 1 prio class 0
[23544.614343] EXT4-fs warning (device sda1): ext4_end_bio:348: I/O error 10 writing to inode 16129026 starting block 68397400)
[23544.614355] Buffer I/O error on device sda1, logical block 68397143
[23545.128173] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s
[23545.128194] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x2a 2a 00 3a 07 23 10 00 00 88 00
[23545.128200] blk_update_request: I/O error, dev sda, sector 973546256 op 0x1:(WRITE) flags 0x800 phys_seg 17 prio class 0
[23545.128263] Aborting journal on device sda1-8.
[23545.206305] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s
[23545.206331] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x2a 2a 00 3a 04 08 00 00 00 08 00
[23545.206339] blk_update_request: I/O error, dev sda, sector 973342720 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
[23545.206351] Buffer I/O error on dev sda1, logical block 121667584, lost sync page write
[23545.206386] JBD2: Error -5 detected when updating journal superblock for sda1-8.
[23547.145991] EXT4-fs error (device sda1): ext4_journal_check_start:83: comm dockerd: Detected aborted journal
[23547.230310] sd 0:0:0:0: [sda] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x07 driverbyte=DRIVER_OK cmd_age=0s
[23547.230338] sd 0:0:0:0: [sda] tag#0 CDB: opcode=0x2a 2a 00 00 00 08 00 00 00 08 00
[23547.230346] blk_update_request: I/O error, dev sda, sector 2048 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[23547.230359] Buffer I/O error on dev sda1, logical block 0, lost sync page write
[23547.230400] EXT4-fs (sda1): I/O error while writing superblock
[23547.230408] EXT4-fs (sda1): Remounting filesystem read-only
[23584.896419] br-114fecd466ee: port 3(vethe8b1298) entered disabled state
[23584.896882] veth29119cf: renamed from eth0
[23585.006119] overlayfs: upper fs is r/o, try multi-lower layers mount
[23586.118680] br-114fecd466ee: port 11(veth362085a) entered disabled state
[23586.118897] vethff16379: renamed from eth0
[23586.224521] overlayfs: upper fs is r/o, try multi-lower layers mount

I tried a few times to restart the containers, without luck. I also tried to restart the docker service (sudo service docker restart) with the same result.

At this point, I decided to try the shutdown and hope for the best. It worked :) When the Pi started, all the docker containers restarted and everything was back!