Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Disks and Volumes

Solaris 10 SVM/SDS Mirrored Root Disk Replacement

Submitted by on September 25, 2012 – 3:08 pm 8 Comments

The following is a standard process for replacing a failed boot disk mirrored with SVM on a Solaris 10 Sun server. Your hardware must support hot-swappable disks for this process to be performed without booting into single-user mode.

Environment:

Sun Fire V240
SunOS Release 5.10
UltraSPARC-IIIi

The following two root disks are mirrored with SVM:

c0t0d0 (sd3) Fujitsu MAT3073N SUN72G SCSI Disk Drive
c0t1d0 (sd0) Fujitsu MAT3073N SUN72G SCSI Disk Drive

Scenario:

c0t1d0 has failed and needs to be replaced

1) Identifying the failed disk

Failed disk can be identified as the one in “maint” state:

# /usr/sbin/metastat -ac
d6 m 20GB d16 d26 (maint)
d16 s 20GB c0t0d0s6
d26 s 20GB c0t1d0s6 (maint)
d3 m 4.0GB d13 d23 (maint)
d13 s 4.0GB c0t0d0s3
d23 s 4.0GB c0t1d0s3 (maint)
d1 m 4.0GB d11 d21 (maint)
d11 s 4.0GB c0t0d0s1
d21 s 4.0GB c0t1d0s1 (maint)
d0 m 4.0GB d10 d20 (maint)
d10 s 4.0GB c0t0d0s0
d20 s 4.0GB c0t1d0s0 (maint)

Additionally, the failed disk will show the “W” (“Write” error) state in metadb:

# /usr/sbin/metadb
flags first blk block count
a m p luo 16 8192 /dev/dsk/c0t0d0s4
a p luo 8208 8192 /dev/dsk/c0t0d0s4
W p l 16 8192 /dev/dsk/c0t1d0s4
W p l 8208 8192 /dev/dsk/c0t1d0s4

Take extra care when identifying the failed disk and the corresponding MD devices. A scripted solution, similar to the one below, may help avoid manual mistakes. You will have a chance to see the advantages of wd blue vs black.

for i in `/usr/sbin/metastat -ac | grep maint | egrep "c.t." | awk '{print $4}' | awk -F's' '{print $1}' | sort | uniq`
do
echo "Failed disk ${i} contains the following failed MD devices:"
/usr/sbin/metastat -ac | grep maint | grep "${i}"
echo ""
done

Output:

Failed disk c0t1d0 contains the following failed MD devices:
d26 s 20GB c0t1d0s6 (maint)
d23 s 4.0GB c0t1d0s3 (maint)
d21 s 4.0GB c0t1d0s1 (maint)
d20 s 4.0GB c0t1d0s0 (maint)

2) The next step is to detach and clear the MD devices:

# /usr/sbin/metadetach -f d0 d20
# /usr/sbin/metadetach -f d1 d21
# /usr/sbin/metadetach -f d3 d23
# /usr/sbin/metadetach -f d6 d26

# /usr/sbin/metaclear d20
# /usr/sbin/metaclear d21
# /usr/sbin/metaclear d23
# /usr/sbin/metaclear d26

Note: depending on the size of the partitions, the “metaclear” operation may take some time. To automate things a bit, use a simple loop as shown below. Don’t forget to substitute the correct name for the meta devices on your system:

for i in d20 d21 d23 d26 ; do /usr/sbin/metaclear $i ; done

Sample output:

# /usr/sbin/metadetach -f d6 d26
d6: submirror d26 is detached
# /usr/sbin/metadetach -f d3 d23
d3: submirror d23 is detached
# /usr/sbin/metadetach -f d1 d21
d1: submirror d21 is detached
# /usr/sbin/metadetach -f d0 d20
d0: submirror d20 is detached

 

d20: Concat/Stripe is cleared
d21: Concat/Stripe is cleared
d23: Concat/Stripe is cleared
d26: Concat/Stripe is cleared

3) Delete metadat for the failed disk. The first command may take a while, so don’t panic.

# /usr/sbin/metadb -d c0t1d0s4

# /usr/sbin/metastat -ac

Sample output:

d6               m   20GB d16
    d16          s   20GB c0t0d0s6
d3               m  4.0GB d13
    d13          s  4.0GB c0t0d0s3
d1               m  4.0GB d11
    d11          s  4.0GB c0t0d0s1
d0               m  4.0GB d10
    d10          s  4.0GB c0t0d0s0

4) Run cfgadm and verify the status of the failed disk

#/usr/sbin/cfgadm -al | grep c0t1d0

Output:

# /usr/sbin/cfgadm -al | grep c0t1d0
c0::dsk/c0t1d0                 disk         connected    configured   unknown

5) Remove the disk

# /usr/sbin/cfgadm -c unconfigure c0::dsk/c0t1d0

Run cfgadm again and verify that the failed disk is not showing up. The second time you run the cfgadm command, it will take a minute to re-scan your disks, so be patient.

# /usr/sbin/cfgadm -al | grep c0t1d0
c0::dsk/c0t1d0                 disk         connected    unconfigured unknown

6) Physically replace the failed disk. I guess I don’t need to remind you about the importance of unplugging the correct drive.

# /usr/sbin/cfgadm -al
c0::dsk/c0t1d0                 disk         connected    unconfigured unknown

# /usr/sbin/cfgadm -c configure c0::dsk/c0t1d0

Note: if you run into an error below when executing “cfgadm -c configure”, try re-running the same command a minute later and see if it works this time. The reasoin for this failure is that it takes the system some time to rescan SCSI paths and detect new devices. It may take a while for cfgadm to configure a large disk, so find something to do…

cfgadm: Hardware specific failure: failed to configure SCSI device: I/O error

7) Run “format” and verify disk information. Then run “prtvtoc” to format the replacement disk to look like the good mirror disk. The “prtvtoc” may take a long time

# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c0t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424>
          /pci@1c,600000/scsi@2/sd@0,0
       1. c0t1d0 <drive not available>
          /pci@1c,600000/scsi@2/sd@1,0
Specify disk (enter its number):

Note: there is a chance that “prtvtoc” may give you an error along the lines of “/dev/rdsk/c0t1d0s2: Cannot get disk geometry”. What to do: run “format”; select the disk you just replaced (in this example it appeared as “c0t1d0 <drive not available>”; from the list of “Available Drive Types”, select your drive type (or “Auto configure”, if you don’t see the correct drive type in the list); type “current” to verify you are working with the correct disk; type “format”. If you are still getting an error saying “Format failed”, then it is likely that your replacement disk is defective. It happens more often than you’d think…

# /usr/sbin/prtvtoc /dev/rdsk/c0t0d0s2 | /usr/sbin/fmthard -s - /dev/rdsk/c0t1d0s2

# /usr/sbin/installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0

# /usr/sbin/metadb -c 2 -a c0t1d0s4

# /usr/sbin/metainit d20 1 1 c0t1d0s0
# /usr/sbin/metainit d21 1 1 c0t1d0s1
# /usr/sbin/metainit d23 1 1 c0t1d0s3
# /usr/sbin/metainit d26 1 1 c0t1d0s6

# /usr/sbin/metattach d0 d20
# /usr/sbin/metattach d1 d21
# /usr/sbin/metattach d3 d23
# /usr/sbin/metattach d6 d26

8) Run “/usr/sbin/metastat -ac” a few times until you confirm the new disk is synced up with the good mirror.

Print Friendly, PDF & Email

8 Comments »

  • Jack Bauer says:

    My lap top has two hard disk drives. One of them is nearly full. How do I switch to the other one so my computer will stop telling me to delete data that I still want?

  • mendhak says:

    My laptop shows a blue screen message, saying that the HDD has failed. What has failed; the actual disk, the disk drive of both? Can the disk be removed from the drive and read?

  • DuckieM10 says:

    I’m trying to install a new operating system on to my computer but, they won’t install because its having a hard time copying files to the hard disk.
    I have tried to install Windows 7 Ultimate, Mint (linux) and Ubuntu 9.10, but none of them will install correctly.
    Do I just need to buy a new computer or is there a way to fix this?

  • morbiusdog says:

    While playing Dead Rising, my game keeps going into a “sluggish” state for up to 25-30 secs, then it resumes to normal speed. It keeps lagging like that over and over, and there’s nothing wrong with the disk. Is this an early sign of my disk reader failing?

  • andresumoza says:

    I did not create recovery discs with my HP Windows Vista Home Premium computer.
    Is there a website I could securely download these files and create the discs? Or another way I could delete every file and restore my system to the point where it’s like I just bought it?

  • mr flibble says:

    My brother and I were just having an argument about how the length of the friction zone of a clutch varies when the clutch is starting to wear/failing. If a clutch is failing, would the friction zone become longer or shorter? He says it would become longer because the clutch plates would need to make more contact in order to compensate for the decreased friction between the plates. To me, that just doesn’t sound right. Anyone care to chime in?

  • tjpimpin says:

    S.M.A.R.T has informed me that my laptop hard disk is doomed to failure and will lose all its data unless I find a replacement. How much would it cost to get a new one? My current hard drive has 189 GB but I’d like something with a lot more space, and preferably cheap…

  • liza says:

    The usb ports work well enough with modem and camera. My PC is running on Windows XP Home, which is newly installed. The usb disk works fine in other computers.

Leave a Reply to mr flibble Cancel reply

%d bloggers like this: