content:en_us:kb_rescue_mode

Rescue Mode for ClearOS 7

This guide is only for recovering ClearOS 7 installations. For ClearOS 6, go here.

If you've come to this page because you need the information contained here, let us begin by saying that we are sorry your system is not working and we hope this page will help. There are a number of reasons why you may need to use the rescue image. It is a valuable tool for anyone supporting ClearOS and as such is included on every ClearBOX appliance by default. Here are just some of the reasons why you may need to use the rescue image which is contained on your ClearOS installation CD/DVD/USB/ISO:

  • Hard drive failure
  • RAID problems
  • GRUB problems
  • Kernel boot issues
  • Init boot issues
This guide is intended as an instructional howto. The commands listed here are not to be used verbatim but are intended to illustrate examples. This guide is provided without warranty to accuracy or applicability to your specific situation. ClearCenter will not be liable for lost data as a result of the data in this guide. Please take all precautions necessary to preserve your own data

DON'T PANIC

Likely your problem is causing you stress and this can lead to extreme reactions which can destroy data. In this guide we will attempt to point out key validation points in the diagnosis and also validation that changes you made actually took. Some of this process is complex and this guide will NOT necessarily meet the needs of your particular problem. That being said, if it doesn't fix the problem, you will know what your problem is NOT.

Establishing the point of failure

You may have multiple problems. For example, a failed disk will affect your RAID, GRUB, Kernel and init process. The boot of ClearOS goes through the following stages:

  • BIOS/POST
  • GRUB
  • Kernel
  • Init

Powering on your system will result in a series of tests. If your Power On Self Test (POST) completes it will usually beep once and transition straight to the first cylinder on the first device as listed in your BIOS. This first cylinder should contain boot code call GRUB (Grand Unified Boot Loader). GRUB under ClearOS contains an item and a count down timer. This will transition to a black screen which will fill up quickly with text on 5.x and to a graphical screen on 6.x. From here the kernel will load devices, and hand over the boot process to the init scripts.

So where the process ends is key to understand where to start fixing the issue. Error messages are critical and it is a good idea to write them down or Google them if you don't understand WHERE it is failing.

Rescue Image

Starting the Rescue Image

All ClearOS installations contain a rescue image. To successfully start this image you need to tell your BIOS to boot from the installation media. This can require a modification of the boot order in your BIOS or perhaps your BIOS supports a keystroke with allows you to select your boot order (often F12). You will need to use the same mechanisms that you used to install the system. In most cases this is straight forward. With systems that required disk drivers in order to see partitions, you will need to use those same methods to mount the disks to modify the 'root' password.

Install media start screen

At the start screen, navigate with the arrows to select 'Troubleshooting'

Press <ENTER>.

At the troubleshooting screen, select 'Rescue a ClearOS System' and press <ENTER>.

After a while the next screen will come up.

Select '1'. Press <ENTER>.

You will then see:

At this point, press <ENTER>.

You will be dropped to a command prompt.

Your prompt will look similar to the following:

sh-4.2#

If you do not get this far and it does not drop into a shell you will need to manually assemble your file system. See further below

Now type:

chroot /mnt/sysimage

At this point you should have access to the command line on your ClearOS installation do you can use commands such as cd and pwd. ClearOS should automatically have the vi and nano editors available for you to use.

Special circumstances

Manually assembling your file system

You may be here because option 1) above, which automatically assembles your file system may not have worked. In this case you may need to assemble it yourself manually. Start by selecting option 3) which will drop you to a command prompt.

Survey the landscape by running the following and take an inventory of your physical disks and their partitions:

fdisk -l | less

You are looking for a disk with at least 2 partitions and perhaps more. An EFI/GPT installation without the LVM will have at least three partitions, one of the “EFI System” type, one containing /boot and one for the root file system. There may also be a SWAP partition which you can ignore. A conventional BIOS installation won't have the “EFI System” partition.

To help identify the layout you can use the command

blkid

Here are a couple of examples, firstly on a GPT disk:

sh-4.2# blkid
/dev/nvme0n1p1: SEC_TYPE="msdos" UUID="5C18-CE03" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="9a37b632-2be9-4726-ae20-0e3ebb52cdbd"
/dev/nvme0n1p2: UUID="5dedca90-5dd5-4210-b6aa-0f2d96c6b722" TYPE="xfs" PARTUUID="08dc394d-2e1d-4b35-ad89-0768f8b6745d"
/dev/nvme0n1p3: UUID="b719d753-6db1-4d69-a72e-97bfde3c1a7a" TYPE="xfs" PARTUUID="d34693a7-83a7-4a60-ba02-2d69d131fe50"
/dev/nvme0n1p4: UUID="b5035d6d-c564-4c2c-9e8b-d269c9cfd5fa" TYPE="swap" PARTUUID="09faf8f2-1cce-4db0-8e52-301201820ef6"
/dev/nvme0n1p5: UUID="okFsKQ-yTWf-CZ3J-Rbiw-2HDf-1A6k-2b7KFG" TYPE="LVM2_member" PARTUUID="d18a34a2-76de-4fe5-9452-e16b8fac05a2"

And secondly on a convention disk on a BIOS system:

sh-4.2# blkid
/dev/sda1: UUID="9b6feebc-3f1f-4676-a1be-c52fbb9f0ce7" TYPE="xfs"
/dev/sda2: UUID="5cfe9b8f-97ff-48ba-8b7c-c284992612d5" TYPE="xfs"
/dev/sda3: UUID="76001404-a9d6-4e67-b407-b2aa5149d31b" TYPE="swap"
/dev/sda4: UUID="e97dfea7-60e2-4dc4-8e30-3d5a02f838d1" TYPE="xfs"

Now try inspecting the partitions. You can mount them to see what is in them e.g:

mount /dev/nvme0n1p1 /mnt/sysimage -t vfat

Then have a look in it:

ls /mnt/sysimage

You are looking for output something like:

bin   dev  flex  lib    lost+found  mnt  proc  run   store  tmp  var
boot  etc  home  lib64  media       opt  root  sbin  srv  sys    usr

If you have /etc, /usr and /boot then you have found it. If you don't have it, unmount the partition and try the next one:

umount /mnt/sysimage
mount /dev/nvme0n1p2 /mnt/sysimage -t xfs
ls /mnt/sysimage
umount /mnt/sysimage
mount /dev/nvme0n1p3 /mnt/sysimage -t xfs
ls /mnt/sysimage

In this case the root partition was /dev/nvme0n1p3. When you mount it you will find the boot folder is empty. From the drives your have been mounting and looking at, one should have had at least the folders efi, grub and grub2 in it. Mount that into boot:

mount /dev/nvme0n1p2 /mnt/sysimage/boot -t xfs

On an EFI/GPT system only, you will then find boot/efi is empty. You need to mount your vfat drive here:

mount /dev/nvme0n1p1 /mnt/sysimage/boot/efi -t vfat

Hopefully all is in order now and you can chroot as in the earlier instructions:

chroot /mnt/sysimage
The chroot did not work on one system rescued like this but success was achieved by mounting the filesystem directly into /:
mount /dev/nvme0n1p3 / -t xfs
mount /dev/nvme0n1p2 /boot -t xfs
mount /dev/nvme0n1p1 /boot/efi -t vfat

in which case there is no need to chroot.

If you have problems mounting and umounting and get stuck, a reboot will take you straight back to the beginning as nothing you have done so far is permanent.
This would be much easier if you had a copy of your /etc/fstab available as you would immediately know which disk and partition was which and how to mount it.

RAID issues

Checking partitions

You may be here in this document because you have lost the first disk in your RAID array and the system is either unbootable and/or you need to use the rescue mode for the repair instead of the regular OS. If this is the case you may have identified the bad disk and replaced it already or perhaps you need to just look around and assess the damage. If you did add a new disk, the rescue CD will ask you to initialize that disk.

Survey the landscape by running the following and take an inventory of your physical disks and their partitions:

fdisk -l | less

Here is an example of what a RAID disk will look like from running that command:

Disk /dev/sda: 2000.3 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1          15      120456   fd  Linux raid autodetect
/dev/sda2              16        7664    61440592+  fd  Linux raid autodetect
/dev/sda3            7665      243201  1891950952+  fd  Linux raid autodetect

Making partitions on the new disk to match the old

If you've replaced the disk that failed, you will notice that it will not have any partitions yet. Ideally, the replacement disks will have the same geometry as its RAID member if not you will need to get a little more technical and ensure that the partitions sizes either match or are greater than the original:

255 heads, 63 sectors/track, 243201 cylinders

Write this information down. You will need it later so that you can make the new disk with the same geometry and information. Of particular note is the start and end numbers, the partition number, the type and lastly which drive has the asterisk '*' character.

Locate your unformated/unpartitioned disk and run the following (in this example our disk is /dev/sdb):

fdisk /dev/sdb

You will enter the fdisk menu system which will look like this:

[root@server ~]# fdisk /dev/sdb

The number of cylinders for this disk is set to 243201.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): 

There are several command that you will use here but to familiarize yourself with the tool, type 'm' on the keyboard and press <enter>. This will show you a list of commands. Note the 'p' command. This shows you the proposed layout to the disk. Run this now by pressing 'p'.

On your blank disk it will not show any partitions. Go ahead and let's make one. Type 'n' for new partition. It will ask you whether you want a primary or extended. You can have up to 4 partitions that are primary - or you can have 3 primary and many extended. Typically the first partition will be primary. Type 'p' for primary. When it asks for which partition, use '1' for the first. It will ask what the start cylinder is and by default will show the [1]. If that matches your notes from the other drive then enter that. It will ask for the end cylinder, supply that as well. When it is completed, type 'p' to view your partition. Repeat this process for each partition.

You will likely need to change the type of the drive from 83 to something else. If this is the case then do the following and supply the correct hex code:

Command (m for help): t
Partition number (1-4): 1
Hex code (type L to list codes): fd

Review your changes using the 'p' command.

You will likely need to set the active partition (the asterisk) on the correct partition. Do the following or similar:

Command (m for help): a
Partition number (1-4): 1

Review your changes and ensure that the information is correct. If you want to abort the proposed changes type 'q' to quit. If you want to write these changes and commit the partition proposal to disk, type 'w' for write.

Double-check your work by running fdisk -l or you can limit the results to just the disks that are part of your RAID by listing them in brackets like this:

fdisk -l /dev/sd[ab]

or this if you have 5 disks

fdisk -l /dev/sd[abcde] | less

RAID with MultiDisk

Checking MultiDisk Status

Familiarize yourself with this command:

cat /proc/mdstat

This command is useful for watching what your MultiDisk RAID is doing RIGHT now. Here is an output that shows one RAID volume:

[root@gateway-utah ~]# cat /proc/mdstat 
Personalities : [raid1] 
md1 : active raid1 sda1[0] sdb1[1]
      120384 blocks [2/2] [UU]
      
unused devices: <none>

Let's take this apart.

  • This RAID is RAID 1:
    • Personalities : [raid1]
  • This RAID has one block device that is working:
    • /dev/md1
  • This RAID is made up of two partitions:
    • /dev/sda1
      • raid member [0]
    • /dev/sdb1
      • raid member [1]
  • There are two disks in this array
    • [2/2]
  • There are two working disks in this array
    • [2/2] (a failed member would report this: [2/1])
  • Both drives are up
    • [UU] (failed members will look like underscores: [U_])

Another useful command is to watch this file as it will display the status. You can do this especially when you are rebuilding to see the progress bar:

watch cat /proc/mdstat
Assembling your disks

Multidisk arrays are usually assembled by the /etc/mdadm.conf file. However, you are likely in this section because your RAID is not assembling…and how can it if mdadm.conf does not exist. Moreover, you CANNOT assemble disks in the rescue CD using a typical command like:

mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
# ^^^^^^^^^^^^^^^ This won't work in Rescue Mode ^^^^^^^^^^^^^^^ #

Ok, so how do we assemble our disks? First, let's check our disk members (do this on all partitions which comprise your RAID):

mdadm --examine /dev/sda1

You should get results like this:

/dev/sda1:
          Magic : a92b4efc
        Version : 0.90.00
           UUID : 03e965cf:42e2070c:eeb11af9:065b0b59
  Creation Time : Wed Aug  4 11:56:07 2010
     Raid Level : raid1
  Used Dev Size : 120384 (117.58 MiB 123.27 MB)
     Array Size : 120384 (117.58 MiB 123.27 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1

    Update Time : Thu Aug 30 10:40:57 2012
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 819e9b9b - correct
         Events : 27850


      Number   Major   Minor   RaidDevice State
this     0       8        1        0      active sync   /dev/sda1

   0     0       8        1        0      active sync   /dev/sda1
   1     1       0        0        1      faulty removed

Check and make sure that the State is clean. If it is not clean you may have difficulties reassembling your array.

Now let's probe our disks and see what arrays we can find. Start by making a file in /etc/ called /etc/mdadm.conf. In it you will tell it which devices to scan:

DEVICE /dev/sd[abcd]1
DEVICE /dev/sd[abcd]2
DEVICE /dev/sd[abcd]3

In the above file, we will be scanning the first three partitions on four different drives for multidisk signatures. You will need to customize the above to suit your needs. Now, let's see what is there:

mdadm --examine --scan

This information is vital to assembling your array. If the output looks good, append this to your new /etc/mdadm.conf:

mdadm --examine --scan >> /etc/mdadm.conf

From here you can assemble your devices by name:

mdadm --assemble --scan /dev/md0
mdadm --assemble --scan /dev/md1

Now check to see your assembled RAID arrays:

cat /proc/mdstat

If this method does not work, you may have to try other means. Another way to see what is on our disks is to do an exhaustive probe and manual means:

mdadm -QE --scan
ARRAY /dev/md1 level=raid1 num-devices=2 UUID=03e965cf:42e2070c:eeb11af9:065b0b59
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=c10a6566:11ce9088:da0e5da7:e1449030
ARRAY /dev/md3 level=raid1 num-devices=2 UUID=6d83baec:d8c4f50b:3ccc3173:326118cf

If you are familiar with Multidisk technology, you will notice that the output is very similar to the contents of the mdadm.conf file. In rescue mode, this information is critical because you can ONLY assemble disks using the UUID numbers.

Let's assemble md1:

mdadm --assemble --uuid 03e965cf:42e2070c:eeb11af9:065b0b59 /dev/md1

Notice that we do not put in the /dev/sdX1 disks. This is because the assemble will use the UUID which should be the same on each member. You will notice that this UUID is present when we ran the 'mdadm –examine /dev/sda1' command.

Now check the status using 'cat /proc/mdstat'.

Once the device is assembled, you can add the partitions that you created on your replacement disks (/dev/sdb1 in my example here):

mdadm --manage /dev/md0 --add /dev/sdb1

Now check the status using 'cat /proc/mdstat' or with 'watch cat /proc/mdstat'.

A rebuild of the array will begin at the beginning if one of the disks enters a uncompleted state before the sync is complete. A reboot will cause the sync to restart the sync from the beginning.

GRUB issues

Why me?

This section needs to be written as ClearOS7 uses grub2 and not grub

Making changes to your drive

By default, drives are mounted in read-only mode. You can remount a Read Only drive in Read/Write mode with a command such as:

mount -o remount,rw /sysroot

Change /sysroot to whichever partition you want to remount.

Help

content/en_us/kb_rescue_mode.txt · Last modified: 2021/07/05 17:13 by 62.30.63.90