<note important> This guide is only for recovering ClearOS 7 installations. For ClearOS 6, go here. </note>
If you've come to this page because you need the information contained here, let us begin by saying that we are sorry your system is not working and we hope this page will help. There are a number of reasons why you may need to use the rescue image. It is a valuable tool for anyone supporting ClearOS and as such is included on every ClearBOX appliance by default. Here are just some of the reasons why you may need to use the rescue image which is contained on your ClearOS installation CD/DVD/USB/ISO:
<note warning>This guide is intended as an instructional howto. The commands listed here are not to be used verbatim but are intended to illustrate examples. This guide is provided without warranty to accuracy or applicability to your specific situation. ClearCenter will not be liable for lost data as a result of the data in this guide. Please take all precautions necessary to preserve your own data</note>
Likely your problem is causing you stress and this can lead to extreme reactions which can destroy data. In this guide we will attempt to point out key validation points in the diagnosis and also validation that changes you made actually took. Some of this process is complex and this guide will NOT necessarily meet the needs of your particular problem. That being said, if it doesn't fix the problem, you will know what your problem is NOT.
You may have multiple problems. For example, a failed disk will affect your RAID, GRUB, Kernel and init process. The boot of ClearOS goes through the following stages:
Powering on your system will result in a series of tests. If your Power On Self Test (POST) completes it will usually beep once and transition straight to the first cylinder on the first device as listed in your BIOS. This first cylinder should contain boot code call GRUB (Grand Unified Boot Loader). GRUB under ClearOS contains an item and a count down timer. This will transition to a black screen which will fill up quickly with text on 5.x and to a graphical screen on 6.x. From here the kernel will load devices, and hand over the boot process to the init scripts.
So where the process ends is key to understand where to start fixing the issue. Error messages are critical and it is a good idea to write them down or Google them if you don't understand WHERE it is failing.
All ClearOS installations contain a rescue image. To successfully start this image you need to tell your BIOS to boot from the installation media. This can require a modification of the boot order in your BIOS or perhaps your BIOS supports a keystroke with allows you to select your boot order (often F12). You will need to use the same mechanisms that you used to install the system. In most cases this is straight forward. With systems that required disk drivers in order to see partitions, you will need to use those same methods to mount the disks to modify the 'root' password.
At the start screen, navigate with the arrows to select 'Troubleshooting'
At the troubleshooting screen, select 'Rescue a ClearOS System' and press <ENTER>.
After a while a blue screen will come up.
Use the arrow or tab keys to select 'Continue'. Press <ENTER>.
The system will attempt to find your ClearOS partition. If this step was not successful, you may need to load special drivers or contact support for assistance. If it finds your partition, it will notify you that the partition was found and mounted under '/mnt/sysimage'. Press <ENTER>.
For extra measure, we will notify you that your partition is mounted under '/mnt/sysimage'. Press <ENTER>.
You will be dropped to a command prompt. Your prompt will look similar to the following:
You may be here in this document because you have lost the first disk in your RAID array and the system is either unbootable and/or you need to use the rescue mode for the repair instead of the regular OS. If this is the case you may have identified the bad disk and replaced it already or perhaps you need to just look around and assess the damage. If you did add a new disk, the rescue CD will ask you to initialize that disk.
Survey the landscape by running the following and take an inventory of your physical disks and their partitions:
fdisk -l | less
Here is an example of what a RAID disk will look like from running that command:
Disk /dev/sda: 2000.3 GB, 2000398934016 bytes 255 heads, 63 sectors/track, 243201 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 * 1 15 120456 fd Linux raid autodetect /dev/sda2 16 7664 61440592+ fd Linux raid autodetect /dev/sda3 7665 243201 1891950952+ fd Linux raid autodetect
If you've replaced the disk that failed, you will notice that it will not have any partitions yet. Ideally, the replacement disks will have the same geometry as its RAID member if not you will need to get a little more technical and ensure that the partitions sizes either match or are greater than the original:
255 heads, 63 sectors/track, 243201 cylinders
Write this information down. You will need it later so that you can make the new disk with the same geometry and information. Of particular note is the start and end numbers, the partition number, the type and lastly which drive has the asterisk '*' character.
Locate your unformated/unpartitioned disk and run the following (in this example our disk is /dev/sdb):
You will enter the fdisk menu system which will look like this:
[root@server ~]# fdisk /dev/sdb The number of cylinders for this disk is set to 243201. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Command (m for help):
There are several command that you will use here but to familiarize yourself with the tool, type 'm' on the keyboard and press <enter>. This will show you a list of commands. Note the 'p' command. This shows you the proposed layout to the disk. Run this now by pressing 'p'.
On your blank disk it will not show any partitions. Go ahead and let's make one. Type 'n' for new partition. It will ask you whether you want a primary or extended. You can have up to 4 partitions that are primary - or you can have 3 primary and many extended. Typically the first partition will be primary. Type 'p' for primary. When it asks for which partition, use '1' for the first. It will ask what the start cylinder is and by default will show the . If that matches your notes from the other drive then enter that. It will ask for the end cylinder, supply that as well. When it is completed, type 'p' to view your partition. Repeat this process for each partition.
You will likely need to change the type of the drive from 83 to something else. If this is the case then do the following and supply the correct hex code:
Command (m for help): t Partition number (1-4): 1 Hex code (type L to list codes): fd
Review your changes using the 'p' command.
You will likely need to set the active partition (the asterisk) on the correct partition. Do the following or similar:
Command (m for help): a Partition number (1-4): 1
Review your changes and ensure that the information is correct. If you want to abort the proposed changes type 'q' to quit. If you want to write these changes and commit the partition proposal to disk, type 'w' for write.
Double-check your work by running fdisk -l or you can limit the results to just the disks that are part of your RAID by listing them in brackets like this:
fdisk -l /dev/sd[ab]
or this if you have 5 disks
fdisk -l /dev/sd[abcde] | less
Familiarize yourself with this command:
This command is useful for watching what your MultiDisk RAID is doing RIGHT now. Here is an output that shows one RAID volume:
[root@gateway-utah ~]# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 sda1 sdb1 120384 blocks [2/2] [UU] unused devices: <none>
Let's take this apart.
Another useful command is to watch this file as it will display the status. You can do this especially when you are rebuilding to see the progress bar:
watch cat /proc/mdstat
Multidisk arrays are usually assembled by the /etc/mdadm.conf file. However, you are likely in this section because your RAID is not assembling…and how can it if mdadm.conf does not exist. Moreover, you CANNOT assemble disks in the rescue CD using a typical command like:
mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 # ^^^^^^^^^^^^^^^ This won't work in Rescue Mode ^^^^^^^^^^^^^^^ #
Ok, so how do we assemble our disks? First, let's check our disk members (do this on all partitions which comprise your RAID):
mdadm --examine /dev/sda1
You should get results like this:
/dev/sda1: Magic : a92b4efc Version : 0.90.00 UUID : 03e965cf:42e2070c:eeb11af9:065b0b59 Creation Time : Wed Aug 4 11:56:07 2010 Raid Level : raid1 Used Dev Size : 120384 (117.58 MiB 123.27 MB) Array Size : 120384 (117.58 MiB 123.27 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 1 Update Time : Thu Aug 30 10:40:57 2012 State : clean Active Devices : 1 Working Devices : 1 Failed Devices : 1 Spare Devices : 0 Checksum : 819e9b9b - correct Events : 27850 Number Major Minor RaidDevice State this 0 8 1 0 active sync /dev/sda1 0 0 8 1 0 active sync /dev/sda1 1 1 0 0 1 faulty removed
Check and make sure that the State is clean. If it is not clean you may have difficulties reassembling your array.
Now let's probe our disks and see what arrays we can find. Start by making a file in /etc/ called /etc/mdadm.conf. In it you will tell it which devices to scan:
DEVICE /dev/sd[abcd]1 DEVICE /dev/sd[abcd]2 DEVICE /dev/sd[abcd]3
In the above file, we will be scanning the first three partitions on four different drives for multidisk signatures. You will need to customize the above to suit your needs. Now, let's see what is there:
mdadm --examine --scan
This information is vital to assembling your array. If the output looks good, append this to your new /etc/mdadm.conf:
mdadm --examine --scan >> /etc/mdadm.conf
From here you can assemble your devices by name:
mdadm --assemble --scan /dev/md0 mdadm --assemble --scan /dev/md1
Now check to see your assembled RAID arrays:
If this method does not work, you may have to try other means. Another way to see what is on our disks is to do an exhaustive probe and manual means:
mdadm -QE --scan
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=03e965cf:42e2070c:eeb11af9:065b0b59 ARRAY /dev/md2 level=raid1 num-devices=2 UUID=c10a6566:11ce9088:da0e5da7:e1449030 ARRAY /dev/md3 level=raid1 num-devices=2 UUID=6d83baec:d8c4f50b:3ccc3173:326118cf
If you are familiar with Multidisk technology, you will notice that the output is very similar to the contents of the mdadm.conf file. In rescue mode, this information is critical because you can ONLY assemble disks using the UUID numbers.
Let's assemble md1:
mdadm --assemble --uuid 03e965cf:42e2070c:eeb11af9:065b0b59 /dev/md1
Notice that we do not put in the /dev/sdX1 disks. This is because the assemble will use the UUID which should be the same on each member. You will notice that this UUID is present when we ran the 'mdadm –examine /dev/sda1' command.
Now check the status using 'cat /proc/mdstat'.
Once the device is assembled, you can add the partitions that you created on your replacement disks (/dev/sdb1 in my example here):
mdadm --manage /dev/md0 --add /dev/sdb1
Now check the status using 'cat /proc/mdstat' or with 'watch cat /proc/mdstat'.
<note>A rebuild of the array will begin at the beginning if one of the disks enters a uncompleted state before the sync is complete. A reboot will cause the sync to restart the sync from the beginning.</note>
<note>This section needs to be written as ClearOS7 uses grub2 and not grub</note>
By default, drives are mounted in read-only mode. You can remount a Read Only drive in Read/Write mode with a command such as:
mount -o remount,rw /sysroot
Change /sysroot to whichever partition you want to remount.