Thursday, September 4, 2014

Fix bad sectors in Linux with hdparm

Kernel messages like these are the begging of the end for a hard drive:

[4248398.645517] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[4248398.645522] ata2.00: BMDMA stat 0x24
[4248398.645527] ata2.00: failed command: READ DMA EXT
[4248398.645535] ata2.00: cmd 25/00:08:07:24:23/00:00:55:00:00/e0 tag 0 dma 4096 in
[4248398.645536]          res 51/40:00:0d:24:23/40:00:55:00:00/00 Emask 0x9 (media error)
[4248398.645540] ata2.00: status: { DRDY ERR }
[4248398.645543] ata2.00: error: { UNC }
[4248398.784319] ata2.00: configured for UDMA/133
[4248398.784340] sd 1:0:0:0: [sdb] Unhandled sense code
[4248398.784343] sd 1:0:0:0: [sdb]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[4248398.784349] sd 1:0:0:0: [sdb]  Sense Key : Medium Error [current] [descriptor]
[4248398.784354] Descriptor sense data with sense descriptors (in hex):
[4248398.784357]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
[4248398.784369]         55 23 24 0d
[4248398.784374] sd 1:0:0:0: [sdb]  Add. Sense: Unrecovered read error - auto reallocate failed
[4248398.784380] sd 1:0:0:0: [sdb] CDB: Read(10): 28 00 55 23 24 07 00 00 08 00
[4248398.784392] end_request: I/O error, dev sdb, sector 1428366349
[4248398.784419] ata2: EH complete
[4249453.881503] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[4249453.881510] ata2.00: failed command: READ SECTOR(S) EXT
[4249453.881519] ata2.00: cmd 24/00:01:0d:24:23/00:00:55:00:00/e0 tag 0 pio 512 in
[4249453.881520]          res 51/40:00:0d:24:23/40:00:55:00:00/00 Emask 0x9 (media error)
[4249453.881524] ata2.00: status: { DRDY ERR }
[4249453.881527] ata2.00: error: { UNC }
[4249454.020324] ata2.00: configured for UDMA/133
[4249454.020350] ata2: EH complete

What does it all mean?  Well, the very basic message is that there was a read error and the drive couldn't automatically move that data to another, presumably good sector, of the hard drive:
[4248398.784374] sd 1:0:0:0: [sdb]  Add. Sense: Unrecovered read error - auto reallocate failed

Skip two lines ahead and the kernel is telling us the drive and the sector where the error occurred.  In this case, /dev/sdb and sector 1428366349.  You can confirm, but running the following hdparm command (as root or with sudo):
root@tv:/home/khanh# hdparm --read-sector 1428366349 /dev/sdb

The output should look similar to the following, confirming that our sector has a read error:
/dev/sdb:reading sector 1428366349: FAILED: Input/output error

Most of the time, we're able to clear the error by writing a zero to the sector.  ***WARNING*** DOING THIS COULD/WILL IRREPARABLY DAMAGE THE FILE IN THIS SECTOR!!!
Of course, in my case, this drive is used as a DVR (hence the TV hostname) and just has a bunch of MPEG2 files for my TV recordings.  Putting a single zero somewhere in the file doesn't ruin the file beyond use.  So, we're going to write the zero and remind hdparm that we know what we're doing.
root@tv:/home/khanh# hdparm --yes-i-know-what-i-am-doing --write-sector 1428366349 /dev/sdb
/dev/sdb:
re-writing sector 1428366349: succeeded

The device sdb reports success in writing to the sector and now we should be able to read a nice clean zero from the sector with hdparm:
root@tv:/home/khanh# hdparm --read-sector 1428366349 /dev/sdb
/dev/sdb:reading sector 1428366349: succeeded

No comments:

Post a Comment