freebsd-questions: When a System Dies; Getting back in operation again

FreeBSD logoThis post is fetched from the freebsd-questions mailinglist, and is a perfect example on how I consider the Linux/BSD world to be superior to Windows, at least in terms of systems administration. On Unix, it’s all just files, devices and bitstreams…

From: n j
To: Martin McCormick <martin@dc.cis.[obfuscateddomain].edu>
Cc: freebsd-questions@freebsd.org

When a System Dies; Getting back in operation again.

> … What is the best way to restore the full system?
> Can I use the FreeBSD installation disk in rescue mode?

I experienced such a situation just 2 weeks ago. My primary problem was that I had to do restore over the network (no attached tape drives, no external HDDs). I wanted to use ssh to grab the dump from the backup server, but ended up using netcat which worked great.

Here’s basically what I did including backup from the not-yet-dead machine (note, I used intermediate backup server, but it should be possible to directly pipe dump to restore):

  1. dump -0Laf – / | ssh backup-server “cat > dump.root”
  2. boot the new machine from CD disc1 (FreeBSD <7) or livefs disc (FreeBSD >7)
  3. create and newfs partitions as explained in this thread (at least the size of backup, can be larger)
  4. go into the rescue (fixit) mode, create mount points for created partitions (mkdir mnt.root), mount partitions (e.g. mount /dev/da0s1a /mnt.root), change directory to mount point (cd /mnt.root), configure NIC (ifconfig)
  5. start netcat (nc -l 55555 | restore -rvf -)
  6. on backup-server: cat dump.root | nc new-machine 55555
  7. repeat for usr and var partitions

Notes:

  1. if security is an issue, ssh out from the new machine to the backup server with port forwarding (ssh -R 55555:localhost:55555 backup-server) and pipe the backup to localhost (cat dump.root | nc localhost 55555); my initial idea was to start sshd in fixit mode (see my post to the list “fixit console with sshd”) which turned out to be too much of a trouble.
  2. restore uses TMPDIR to store some temporary files during restore process; the fixit mode has limited free space and when it gets exhausted the restore process will fail, so it is a good idea to use an available partition as a temporary TMPDIR (e.g. export TMPDIR=/mnt.var while restoring usr partition and later use a subdirectory of usr as TMPDIR to restore var partition)
  3. [IMPORTANT!] after the restore process is over, manually check restored etc/fstab and etc/rc.conf (currently mounted as /mnt.root/…) to fix:
    1. partition names (e.g. /dev/da0s1a might become /dev/amrd0s1a)
    2. ethernet interface names (e.g. em0 might become bge0)
    3. IP addresses in case you still have the old box running to avoid IP conflict

You should now be able to safely reboot and log into your new machine.

Regards,

Nino


Comments

One response to “freebsd-questions: When a System Dies; Getting back in operation again”

  1. Strange – the last thing I twittered before seeing this post …
    “Can’t understand why any IT person with respect for themselves would actually prefer to use Windows on the server side. It’s awful..”