Published On: 11 February 2025Last Updated: 11 February 2025

I migrated back to Debian last month since most of my daily work can happen easily in Linux and for the times when I need Windows I use the PCIe passthrough feature. This was primarily done to get better performance and flexibility at managing my disk configuration. The details are in the previous post for those interested.

I am using some pretty old hard disks in this system – one my hard disks, the one sized 160 GB is with me since 2008 – it still works, because as per SMART data it has been used only for 24801 hours at the time of writing this – which translates to about 2.8 years. Hard disks do last long.

But for some reason, when booting the system and while doing a ZFS scrub, I encountered errors in the kernel like this:

kernel: ata1.00: exception Emask 0x0 SAct 0x7000 SErr 0x40000 action 0x0 
kernel: ata1.00: irq_stat 0x40000008 
kernel: ata1: SError: { CommWake } 
kernel: ata1.00: failed command: READ FPDMA QUEUED 
kernel: ata1.00: cmd 60/10:60:10:0a:00/00:00:00:00:00/40 tag 12 ncq dma 8192 in 
kernel: ata1.00: status: { DRDY ERR } 
kernel: ata1.00: error: { UNC } 
kernel: ata1.00: configured for UDMA/133 
kernel: sd 0:0:0:0: [sda] tag#12 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=3s 
kernel: sd 0:0:0:0: [sda] tag#12 Sense Key : Medium Error [current] 
kernel: sd 0:0:0:0: [sda] tag#12 Add. Sense: Unrecovered read error - auto reallocate failed 
kernel: sd 0:0:0:0: [sda] tag#12 CDB: Read(10) 28 00 00 00 0a 10 00 00 10 00 
kernel: I/O error, dev sda, sector 2590 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 
kernel: zio pool=$import vdev=/dev/disk/by-id/ata-ST3160215AS_9RA824JP-part1 error=5 type=1 offset=270336 size=8192 flags=721089 
kernel: ata1: EH complete 
kernel: ata1.00: exception Emask 0x0 SAct 0x300000 SErr 0x40000 action 0x0 
kernel: ata1.00: irq_stat 0x40000008 
kernel: ata1: SError: { CommWake } 
kernel: ata1.00: failed command: READ FPDMA QUEUED 
kernel: ata1.00: cmd 60/10:a0:10:4c:a1/00:00:12:00:00/40 tag 20 ncq dma 8192 in 
kernel: ata1.00: status: { DRDY ERR } 
kernel: ata1.00: error: { UNC } 
kernel: ata1.00: configured for UDMA/133 
kernel: sd 0:0:0:0: [sda] tag#20 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=5s 
kernel: sd 0:0:0:0: [sda] tag#20 Sense Key : Medium Error [current] 
kernel: sd 0:0:0:0: [sda] tag#20 Add. Sense: Unrecovered read error - auto reallocate failed 
kernel: sd 0:0:0:0: [sda] tag#20 CDB: Read(10) 28 00 12 a1 4c 10 00 00 10 00 
kernel: I/O error, dev sda, sector 312560670 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0 
kernel: zio pool=$import vdev=/dev/disk/by-id/ata-ST3160215AS_9RA824JP-part1 error=5 type=1 offset=160030007296 size=8192 flags=721089 
kernel: ata1: EH complete 
kernel: ata1.00: exception Emask 0x0 SAct 0x80c00000 SErr 0x40000 action 0x0 
kernel: ata1.00: irq_stat 0x40000008 
kernel: ata1: SError: { CommWake } 
kernel: ata1.00: failed command: READ FPDMA QUEUED 
kernel: ata1.00: cmd 60/10:b0:10:0a:00/00:00:00:00:00/40 tag 22 ncq dma 8192 in 
kernel: ata1.00: status: { DRDY ERR } 
kernel: ata1.00: error: { UNC } 
kernel: ata1.00: configured for UDMA/133 

I ran SMART tests both long and short, but it said all OK for some reason. But this error kept occurring every reboot and despite tightening the cables, so that means SMART is lying. Funnily enough, the ZFS pool continued to operate properly without any issues. This drive was my media pool which mostly contains photos, videos (which are synced to cloud), games and other stuff which can be downloaded again.

After a bit of searching, I landed on this article about a similar issue where the author fixed it by forcing the drive to reallocate the sectors. And hence, I proceeded to do the same. In my case, the sectors with the read error were numbered 2590 and 312560670.

Like Chris described in his wiki, I too tried reading those sectors using dd and it indeed failed every time.

dd if=/dev/sda of=/dev/null skip=2590 bs=512 count=1
dd if=/dev/sda of=/dev/null skip=312560670 bs=512 count=1

Oddly enough, these two sectors were at different ends of the disk as seen in the sector numbers. Interestingly though, I saw output of fdisk for the disk to contain a valid GPT (GUID Partition Table). I have added the disk directly to zpool and did not create any partitions in it. So it seems like zpool internally creates GPT.

Disk /dev/sda: 149.05 GiB, 160040803840 bytes, 312579695 sectors
Disk model: ST3160215AS     
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 9E3FB707-838A-8348-B0B6-9B7A84780AE2

Device         Start       End   Sectors  Size Type
/dev/sda1       2048 312561663 312559616  149G Solaris /usr & Apple ZFS
/dev/sda9  312561664 312578047     16384    8M Solaris reserved 1

To force reallocation of those sectors, I used the dd command provided in Chris’ wiki to write zeroes to the sectors:

dd if=/dev/zero of=/dev/sda bs=512 seek=2590 count=1
dd if=/dev/zero of=/dev/sda bs=512 seek=312560670 count=1

Post this, when I ran zfs scrub it did not complain about errors and completed successfully.

Later, when I rebooted the system, the media ZFS pool vanished! ZFS could not import the pool. In hindsight, I think what happened was the partition table was already loaded in memory and those sectors are not used for data, I force wrote zeroes there and now the disk reallocated those sectors, containing zeroes. This corrupted the GPT and therefore, ZFS could not import it.

Linux allows you to do crazy things, which is why I love it 😀
But one needs to be careful not shoot themselves in their foot, which seems like what happened in this case.

Anyway, how to fix it? Enter gdisk.

gdisk /dev/disk/by-id/ata-ST3160215AS_9RA824JP
GPT fdisk (gdisk) version 1.0.9

Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with corrupt MBR; using GPT and will write new
protective MBR on save.

Command (? for help): p
Disk /dev/disk/by-id/ata-ST3160215AS_9RA824JP: 312579695 sectors, 149.0 GiB
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 9E3FB707-838A-8348-B0B6-9B7A84780AE2
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 312579661
Partitions will be aligned on 2048-sector boundaries
Total free space is 3628 sectors (1.8 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048       312561663   149.0 GiB   BF01  zfs-e2bf5828b7d30746
   9       312561664       312578047   8.0 MiB     BF07  

As can be seen in the message printed out by gdisk it says GPT is valid, but the MBR is corrupt and it can rewrite new protective MBR on save.

I went ahead and did the same, issuing the write command w in gdisk. Then tried importing the ZFS pool again and it worked! I still don’t know why ZFS creates GPTs on disks added directly without partitioning – but it saved my day nevertheless.

3 Comments

  1. Nilesh 11 February 2025 at 10:39 AM - Reply

    … reposted this!

  2. Maria Lindqvist,PhD 11 February 2025 at 12:46 PM - Reply

    … reposted this!

Leave A Comment

Share

Get new posts by email