Rebuild Netgear ReadyNAS RAID

Print
Category: Techie
19 Nov 2013
Written by Andy Hits: 32141

Having manually created a secondary array using my spare 500G disks, I found that the Xraid2 setup that my NAS had selected wouldn't expand them with the pair of 1TB already in the array.  You can read more here: Creating Additional Volumes on ReadyNAS

One of the 500's decided to give up the good fight, and I purchased a pair of new 1TB to expand the array  (I keep taking pictures, and 500G didnt seem enough space!)  Adding the first disk into the array and joining it to my manual one wasnt too hard, a quick fdisk to create a compatible partition, then mdadmin --add to include it to the array.  My problem came when I thought I'd be clever, and split my disk batches, so both arrays have a new and an older disk in them. 

I pulled the last 500G from one partition, and 1 of the 1TB in the other, and put the new disk into slot 1 to rebuild the 1TB partition again.

However the readynas just wouldnt rebuild, and no amount of powercycling to try and force a rebuild were helping! It would detect the disk, and the LCD displayed testing disk, and from /var/log/messages I could see various partitionings going on, but no re-sync.

Here's what I did to solve it:

I figured most of this out by reading through logs, and seeing where the failure was.  /var/log/frontview/expand_md.log showed the automated rebuild tasks that the system was performing, and I tried to emulate them on the command line. 

These are the lines that put me on the right path:
[2013/11/19 00:30:33  1665] Succeed to convert mbr to gpt /dev/sdb.
[2013/11/19 00:30:33  1665] Repair md: new drive partition written
[2013/11/19 00:30:33  1665] new dev:/dev/sda1 blocks:976759008 md_id:0 md_cfg:0
[2013/11/19 00:30:33  1665] used dev:/dev/sdb1 blocks:4194304 md_id:0 md_cfg:0
[2013/11/19 00:30:33  1665] used dev:/dev/sdb2 blocks:524288 md_id:0 md_cfg:1
[2013/11/19 00:30:33  1665] used dev:/dev/sdb3 blocks:972041912 md_id:0 md_cfg:2
[2013/11/19 00:30:33  1665] gpt sig:0,mbr sig:0,fake type:0
[2013/11/19 00:30:33  1665] get disk /dev/sdb format is (GPT=2,MBR=1,MX=3,MISC=-1): 2
[2013/11/19 00:30:33  1665] Add /dev/sda2, to /dev/md1, partition_no=2
[2013/11/19 00:30:33  1665] Will add /dev/sda2 to /dev/md1
[2013/11/19 00:30:33  1665] No device add to /dev/md0
[2013/11/19 00:30:34  1665] ADD1 pdev /dev/sda2 to md /dev/md1, level=1, repair:=1, degraded=1, part_idx=1
[2013/11/19 00:30:34  1665]  Repair: 2, 2, 1
[2013/11/19 00:30:34  1665] LINE 3473: exec command: mdadm --zero-super /dev/sda2
mdadm: Couldn't open /dev/sda2 for write - not zeroing
[2013/11/19 00:30:34 1665] LINE 3480: exec command: mdadm -a /dev/md1 /dev/sda2
mdadm: cannot find /dev/sda2: No such file or directory
[2013/11/19 00:30:34 1665] Expand RAID /dev/md1 failed err:0x100, degraded=1, repair=1
mdadm: cannot find /dev/sda2: No such file or directory

The Xraid2 partition is using GPT (GUID Partition Table), rather than the standard one.  It also appears that the inbuilt utilities are failing to build the gpt partitions correctly, and so when it gets to the stage of adding the partition to the software raid, it finds that it doesnt exist and aborts!  

My first step was to print the valid partition table on /dev/sdb (disk 2) for comparison for creating a new one on /dev/sda

:~# gdisk -l /dev/sdb
GPT fdisk (gdisk) version 0.7.0
Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /dev/sdb: 1953525168 sectors, 931.5 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): 34EBEFB5-A580-4548-AF13-26FB254D14F5
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 64-sector boundaries
Total free space is 4092 sectors (2.0 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1              64         8388671   4.0 GiB     FD00
   2         8388672         9437247   512.0 MiB   FD00
   3         9437248      1953521072   927.0 GiB   FD00

This shows 3 partitions, with 3 being the main data and I believe 1 holds the OS, and 2 might be swap

So I set out to create them on the other disk.  Note that the start sector is 64.  By default gdisk uses 2048.  You need to use the advanced settings in order to completely match the other disks partition sizes.  First, delete the existing partition(s)

:~ gdisk /dev/sda
Command (? for help): d
Using 1

Then go into advanced, and set sector alignment to 64:

Command (? for help): x

Expert command (? for help): l
Enter the sector alignment value (1-65536, default = 2048): 64

Expert command (? for help): m

Once back on the main menu, add the partition sizes, and match the sizes on disk /dev/sdb

Command (? for help): n
Partition number (1-128, default 1): 1
First sector (34-1953525134, default = 34) or {+-}size{KMGTP}: 64
Last sector (64-1953525134, default = 1953525134) or {+-}size{KMGTP}: +4G
Current type is 'Linux/Windows data'
Hex code or GUID (L to show codes, Enter = 0700): FD00
Changed type of partition to 'Linux RAID'

Repeat for the other 2 partitions, then commit the changes:

Command (? for help): w

It now should be as simple as adding the new partitions to the raid management.  The only snag is which partition matches which raid array.  This information can be gleaned using the mdadm command, which can list the disks and sizes of the different software RAID partitions:

 mdadm -D --scan -vv | less
/dev/md/3:
        Version : 1.2
  Creation Time : Sat Jan 12 23:24:41 2013
     Raid Level : raid1
     Array Size : 488382841 (465.76 GiB 500.10 GB)
  Used Dev Size : 488382841 (465.76 GiB 500.10 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

    Update Time : Tue Nov 19 00:31:50 2013
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : 2CB05DBED910:3
           UUID : b1ca31e8:3be8568e:4dd96138:abd63a8b
         Events : 39930

    Number   Major   Minor   RaidDevice State
       2       8       33        0      active sync   /dev/sdc1
       1       0        0        1      removed

In my case, /dev/sda1 was part of /dev/md0, /dev/sda2 was part of /dev/md1 and /dev/sda3 was part of /dev/md2. Mdadmin was then used to add the partition to the array:

:~#  mdadm --add /dev/md2 /dev/sda3
:~#  mdadm --add /dev/md0 /dev/sda1
:~#  mdadm --add /dev/md1 /dev/sda2

These commands add the disks to the partitions and triggers the array rebuild.  The LCD on the front of the NAS also displayed the status.  For a command line view, you can try the following:

:~# watch cat /proc/mdstat
Every 2.0s: cat /proc/mdstat                                                                                                                                                                                         Tue Nov 19 01:50:44 2013

Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md2 : active raid5 sda3[2] sdb3[1]
      972040704 blocks super 1.2 level 5, 64k chunk, algorithm 2 [2/1] [_U]
      [====>................]  recovery = 22.9% (223358464/972040704) finish=226.4min speed=55089K/sec

md1 : active raid1 sdb2[1]
      524276 blocks super 1.2 [2/1] [_U]

md3 : active raid1 sdd[3] sdc1[2]
      488382841 blocks super 1.2 [2/1] [U_]
      [============>........]  recovery = 61.1% (298729088/488382841) finish=44.0min speed=71735K/sec

md0 : active raid1 sda1[2] sdb1[1]
      4193268 blocks super 1.2 [2/1] [_U]
        resync=DELAYED

unused devices: <none>

NOTE: I have not rebooted since the rebuild, so I am unsure how happy the array will be at next boot.