Sophos Group Ltd.

05/13/2024 | Press release | Distributed by Public on 05/13/2024 02:30

Extracting data from encrypted virtual disks: six methods

This article explains various techniques and readily available tools for extracting data from an encrypted virtual disk. For incident-response situations in which the entire virtual disk has been encrypted, these tools and techniques may - may - enable the investigating team to retrieve data from the encrypted system.

Efforts to extract data from encrypted virtual disks can potentially lead to multiple positive outcomes: recovering customer data that is irretrievable via standard methods, helping rebuild virtualized customer infrastructure that has been compromised, and / or enriching an incident investigation timeline. So far, we've used these techniques successfully in DFIR investigations involving the LockBit, Faust / Phobos, Rhysida, and Akira ransomware groups.

We'll say this at the beginning of the article and we'll say it again at the end: Results are not guaranteed. No data-extraction method in existence is certain to yield full data from an encrypted VM. We will also highlight that while these methods have seen quite a high success rate in extracting forensic data that is valuable for the investigation (such as event logs, registry forensics, and the like), the success rate of retrieving data that can be used as part of the recovery process of production systems, such as databases, is much lower.

We strongly recommend that any recovery attempts should be conducted on "working copies" and not the originals, lest the attempts cause unintended further damage to the devices.

In the next section we'll discuss in which situations retrieval may be possible and to what extent. After that, we'll list some factors to take into consideration as you select which methods you'll attempt. Finally, we'll look at each method, listing the prerequisites (the tools required to attempt the method; all are required) and flagging other considerations. In the discussion of the most labor-intensive method, we'll walk through the details of the process. In this article, references to "virtual disks," "VM's," or "disk images" all refer to the same thing and can be any image of a disk such as VHD, VHDX, VMDK, RAW, and so on. All six techniques apply to Windows; a few also may work on Linux, and we'll note those in each case.

What is file / disk encryption?

When ransomware encrypts a virtual disk (or any file), the data has been essentially randomized, rendering the file unreadable by the operating system. The most well-known method of decrypting a file (returning the file to its original, readable state) is via a decryptor, a software tool or program designed to reverse the process of encryption, making encrypted files readable again.

In ransomware attacks, the decryptor is created and controlled by the threat actor. In those situations, unless the ransom is paid or the decryptor becomes publicly available, other methods of data recovery must be considered.

Ransomware binaries prioritize speed over thorough encryption. Encrypting entire files would be too time-consuming, so the attackers aim to inflict maximum damage swiftly, minimizing the window for intervention. Consequently, while smaller files like documents are usually fully encrypted, larger ones such as virtual disks may have significant portions left unencrypted. This provides investigators with opportunities to employ diverse techniques for extracting information from these virtual disks.

Which method to use: Considerations

There are multiple methods that can be used when looking to extract data from an encrypted Windows VM. (A few of these techniques are applicable to Linux recovery attempts as well, and we'll indicate those.) In this article we will cover six:

  • Method 1: Mounting the drive
  • Method 2: RecuperaBit
  • Method 3: bulk_extractor
  • Method 4: EVTXparser
  • Method 5: Scalpel, Foremost, and other file-recovery tools
  • Method 6: Manual carving of the NTFS partition

Which to try first? The following six considerations may help you in deciding which method is appropriate.

File size
Experience has shown that the larger the size of the virtual disk, the greater the chance of successful recovery. For Windows machines, this is largely because most VMs will have multiple partitions, usually three - recovery, boot, and the C: (user-visible) partition. (For this article, let's assume the drive is mapped to the usual C:.) The first two partitions hold little data of use for an incident investigation, but because encryption commonly encrypts the first few bytes of the VM, only these partitions end up encrypted.

This, therefore, often leaves the C: partition, where customer data and potential forensic data is housed, untouched. This can help investigators to rebuild a compromised virtual device and enrich an incident investigation.

Conversely, if the VM file is relatively small, the likelihood of recovering data is lessened. However, there still may be an opportunity to harvest event logs or registry hives.

Tools
As with any other problem in incident response, there exist multiple methods and tools for tackling the same issue. Some tools may perform better than others depending on the type of encryption. It is worth trying multiple tools to get the result you need if your first attempt fails or only partially works.

It is also important to note that tools do stop getting updated and / or supported, so consider looking for additional tools not mentioned in this guide. The tools that we are using are third-party tools, or in some cases tools that are already part of Windows or Linux (this includes Windows Subsystem for Linux [WSL]). Throughout this article and in our everyday investigations, we acknowledge the great contribution the creators of those tools have made to defense efforts, especially in those cases in which the tools were not designed with encryption in mind.

Time
The time available to complete the task is something worth considering; the hardware / equipment you have available may play a part in this. For instance, manual carving (Method 6) is one available option, but this can take a long time; specifically, it can require a lot of processor power, which could slow down your device during the process. This could lead to you not being able to use the device you are using for forensic examination for other daily duties whilst this process completes. (Because of this, if it is not time-sensitive, we recommend you start the manual carving process towards the end of the working day and leave your device running overnight.) Different solutions take varying amounts of time and this needs to be considered.

Storage
Available storage space should be factored into your decision. Manual carving, for instance, can require quite a bit of storage space, as it will recreate a copy of the file; in other words, if you are trying to recover a 1TB virtual hard disk, you may well need at least another 1TB for the results. This is also true with some of the file recovery tools (Method 5), particularly if the master file table (MFT) is corrupt, since in that situation the tool could "recover" huge files that do not actually exist.

File types and priorities
Clients occasionally ask us to recover specific files (particularly Word documents and PDFs), as they are not interested in anything else. If that is the case, and you do not need any further data for the investigation as all the TTPs have been accounted for, it may be more useful for you to run an automated media file recovery tool over the VM, rather than doing a full recovery of the whole disk.

Need
In a related vein, the enterprise's need to recover the data should be weighed in recovery decisions. For example, if the business plans to rebuild the device, they have a working backup of the data, and it's not crucial to the investigation, what is to be gained by recovering data from it? Does it need to happen? (Probably not.) A clear understanding of the business need for recovery of this specific VM leads to better allocation of precious incident-response resources.

Methods of extraction: Six techniques

The methods below cover multiple ways of attempting to extract data from a virtual machine. This is not an exhaustive list, since new methods and tools are being developed all the time; researching newer techniques and or tools is always encouraged, and we ourselves will likely update this article as we add techniques to our own repertoire. With such a variety of options available, familiarizing yourself with the basics of each of these, then applying that knowledge to the considerations listed above, is likely the best approach - and one that gets easier with experience and practice.

All that said, though the list that follows is not in a strict order, we suggest that Method 1 should be the first step in any attempted recovery, for reasons that will be clear.

Method 1: Just mount it

[Link]Just because you have been told that the VM is encrypted doesn't necessarily mean that it is. (Yes, cybercriminals sometimes lie.) We have encountered clients who have mistakenly thought their files were encrypted when, in fact, the attacker had simply changed the file extensions. In addition, we have seen instances where attackers' encryption processes have failed and actually just renamed the file.

Always try this method first as it just might work - and save a lot of time. If it doesn't succeed, you'll have lost little time and have done nothing to impede other methods of retrieval. If, on the other hand, the method succeeds and the drive does mount, you can then access the file(s) and copy and paste from them as desired. In addition, because you are simply mounting the VM, endpoint protection (that is, antimalware / antivirus packages) should not detect or remove any malicious files. This will be useful if you plan to collect samples for labs submission. Some tips for success with this method:

  • Try the 7-Zip GUI archiver; we have had a lot of success with 7-Zip in this situation
  • Mount the drive
  • If that's not working, try FTK or any other third-party mounting tool

Method 2: RecuperaBit

RecuperaBit, created by Andrea Lazzarotto, is an automated tool that will rebuild any NTFS partitions that it can find in the encrypted VM. If it can find an NTFS partition, it will re-create the folder structure of that partition on the device being used for examination. If successful, you can then access the file(s) and copy and paste from them as desired from the newly created directory/folder structure.

It is a python script, so it will work on any OS that supports python3. It's easy to use, and only a few options are needed to get it to rebuild the encrypted VM. Experience has shown that, on average, you should get a 'yes' or 'no' as to whether it can rebuild anything of use within about 20 minutes. After that, if it can manage the rebuild, it will take approximately another 20 minutes to recreate the partition for you.

It's important to know that running RecuperaBit will likely set off endpoint-protection detections if ransom.exe or other malicious files are present. For this reason, if you choose to use RecuperaBit in situations where you hope to recover that executable for further analaysis you should run it in an environment where endpoint protections can be safely disabled - hence the prerequisite of a sandbox.

At the time of this writing, RecuperaBit can be downloaded from GitHub. There is a user guide on the GitHub page for the tool.

Method 3: bulk_extractor

Bulk_extractor (called bulk-extractor on its kali.org page, but the same program in either case) is a free tool that runs on Windows or Linux. It was created by Simson Garfinkel. It can recover system files such as Windows event logs (.EVTX) as well as media files. This tool is automated, so the investigator can start it and let it run, perhaps after hours, in hope it will recover something.

It is possible to configure it for specific file types or other artifacts by altering its config file. This can be very useful to speed analysis up in scenarios where you're hoping for quick, focused, or specific results - for example, EVTX files only - rather than trying to recover the whole of the partition.

As with RecuperaBit in Method 2, running bulk_extractor will likely set off endpoint-protection detections if ransom.exe or other malicious files are present. For this reason, if you choose to use bulk_extractor in situations where you hope to recover that executable for labs submission or similar analysis, you should run it in an environment where endpoint protections can be safely disabled - hence the above prerequisite of a sandbox.

At the time of this writing, bulk_extractor for Linux can be downloaded from GitHub. There is a user guide on the GitHub page for the tool.

Method 4 : EVTXtract

This specialized tool searches a block of data (in this case, an encrypted VM) for complete or partial .evtx files. If it finds any, the tool pulls them back into their original structure, which is XML. This is an automated tool that is built to run on Linux only.

XML files are notoriously difficult to work with. In this case, the file will consist of incorrectly embedded EVTX fragments, so expect the output to be a bit unwieldly. To make it easier to review this tool's output, you will have to massage the data. A couple of suggestions for doing this effectively:

  • Attempt to convert the file to CSV format for easier viewing
  • Use the grep command to get the outcome for YYYY-DD-MM (or any other date formats), event-IDs, keywords, or known IoCS indicating activity on the day of interest

Please note that this tool, just as the name indicates, recovers EVTX files or fragments only. If you are seeking other artifacts, you will need to use a different tool.

At the time of this writing, EVTXtract can be downloaded from GitHub. There is a user guide on the GitHub page for the tool.

Method 5 : Scalpel, Foremost, or other file-recovery tools

[Link]Turning our attention from EVTX-recovery tools to those designed to restore other types of files, Scalpel and Foremost are two of many free file recovery tools currently available. Though both are older tech, the Sophos IR team has had excellent results with these two in our investigations.

The original version of Scalpel, released in 2005, was based on Foremost, and the two carving and indexing applications are similar in approach. Both mainly recover media and document files, which makes them useful if your investigation is seeking documents, PDFs, or the like. For either one, the config file can be modified to focus on specific file types, or be left alone for a fuller (though slower) catch-all effort.

As mentioned, neither of these programs retrieves system files; other tools will be needed for that work. In addition, files recovered from these may kick off endpoint-protection detections if any malicious files are present (for instance, malicious PDFs from a phishing campaign). For this reason we recommend that investigators run these tools in a sandbox environment, where endpoint protection can be disabled, if such files must be preserved for the investigation.

As noted above, both these programs are older technology, which means that recovery of newer filetypes may not be feasible with these tools. Other tools exist, and the reader is invited to investigate those, but as easily available options these are both solid performers.

Foremost can be downloaded from GitHub, and there is a user guide on the GitHub page for the tool. It was originally developed by the US Air Force Office of Special Investigations and The Center for Information Systems Security Studies and Research. The version on GitHub does not appear to be actively maintained.

Likewise, at the time of this writing, Scalpel can be downloaded from GitHub. There is a user guide on the GitHub page for the tool. As stated on its GitHub page, this tool is not actively maintained.

Method 6 : Manual carving of the NTFS partition

[Link]In contrast to the tools and techniques summarized above, manual carving takes preparation and some finer understanding of the options available to you. We'll make some recommendations for how to plan your effort, and then walk you through the specifics of working with dd, the powerful Linux utility you'll use for this work.

(Some background: DD originally stood for "data definition" and is truly one of computing's Elder Gods; it celebrates its 50th anniversary of existence in June 2024. New dd users are warned that typos can be catastrophic in this utility, earning it its alternate name of "disk destroyer"; it has been described as "a Swiss Army knife, but one that's all blades and no handle." It is recommended that investigators familiarize themselves with dd basics before proceeding. We also suggest typing the dd command into a text editor, making sure everything is correct, and then copying and pasting the command at the command line.)

Proper manual carving requires that investigators set three switches in dd prior to running the utility - bs (bytes per sector), skip (the offset value of the NTFS sector you aim to recreate), and count (the size of the sector). These calculations aren't necessarily difficult, but they do take time and they are not optional. This section walks you through the steps for calculating all three.

In addition, the processing itself is rather slow, potentially taking hours to complete correctly. (As mentioned above, we generally recommend you start the manual carving process at the end of the working day and leave your device running overnight.) With some practice, however, the calculation of the switch values may take the investigator only a few minutes - and if you calculate the size of the partition you are going to carve before attempting to carve the partition, you reduce the likelihood of wasting time and processing power. So do that.

Note finally that this process is space-intensive, likely taking up the same amount of space the VM itself does, since you are essentially copying the VM. For example, if you're working with a 100GB VM file, you'll need another 100GB plus space in which to extract the files you want.

The process has four main steps:

  1. Analyze the encrypted VM for available NTFS partitions
  2. Carve the largest NTFS partition out and into a new file
  3. If the newly created file is intact enough, mount it in Windows
  4. Extract the artifacts you need

The utility that does the copying, dd, is built into Linux. The command is as follows:

sudo dd if= *** of=***.img bs=*** skip=*** count=*** status=progress

Again - and this cannot be emphasized enough - dd is entirely unforgiving of typos. Proceed with caution. The command and its switches may be understood as follows:

sudo = User needs to have highest privileges for this tool

dd = The utility itself

if = Stands for 'input file' - this value is the path and file name of the encrypted VM

of = Stands for 'output file' - this is the name of the recreated partition. Suggested file extension is newfilename.img

bs = The bytes per sector of the partition you are carving out; this value must be entered in bytes

skip = The offset value, in sectors, of the NTFS partition you are carving out, from the start of the disk / VM file

count = The size of the partition, in sectors, of the NTFS partition you are carving out

status = An optional switch to display a progress bar, to see how many bytes have been duplicated

As mentioned above, there are three values you must calculate and provide for the switches in this command: bs, skip, and count. The easiest way to work these values out is to use a GUI hex editor such as Maël Hörz's HxD (which is Windows freeware), but a command-line tool such as xxd will work if preferred. The screen captures below show the steps using HxD.

Switches: Gathering the basic values

Start HxD and load in the encrypted VM file. Click the Offset column at the far left to change it to show values in decimal (base10). In HxD this is denoted by the letter D in brackets, as shown in Figure 1.

[Link]

Figure 1: The offset values are now displayed in decimal numbers

Next, open Data inspector from the View dropdown, as shown in Figure 2.

[Link]

Figure 2: The View dropdown in HxD with the Data inspector option selected

Now find the potential NTFS partitions. Highlight the very top left byte, then use the search function to search for the following hexadecimal string - as opposed to a decimal string or a text string, if such options are available.

EB 52 90 4E 54 46 53 20 20 20 20

Pay attention to which tab is open in the Find box, as shown in Figure 3.

[Link]

Figure 3: Seeking the hex string that indicates the start of an NTFS sector

The above hexadecimal string is the 'signature byte' of a NTFS partition, so this search will find any potential NTFS partitions that you can carve out. There will likely be many presented in a list, as shown in Figure 4.

[Link]

Figure 4: A fruitful search for potentially salvageable NTFS partitions

When you select one of these results, you will be presented with the header of the NTFS partition in the hex viewer window, as shown in Figure 5.

[Link]

Figure 5: The header is shown above the selected NTFS partition

The header contains the basic information you need for the bs, skip, and count values required in the dd command. Next, we'll explain how to calculate those three values. You'll want to do these in order.

To calculate the bs (bytes per sector) value

Working from the start of the NTFS partition you have selected, highlight the bytes at offset 11 and 12, as shown in Figure 6. The value shown as Int16 in the data inspector is the value needed. In this example, the bs value is 512. (This value will almost always be 512. Almost.)

[Link]

Figure 6: The bytes for the bs value are highlighted, and the data inspector shows that the value is indeed 512

To calculate the skip value

Now that you have the bs value, calculate the skip value by dividing the header offset value by the bs value. This calculation provides the sector value of where the NTFS partition starts.

For instance, the header offset decimal value for the NTFS partition highlighted in Figure 7 is 00576716800. (So we're clear, the following screen captures are not from the same partition as the one in the screen captures shown above. As predicted above, though, you can see that the bs value for this NTFS partition - the bytes at offsets 11 and 12 - is once again 512. )

[Link]

Figure 7: The header offset value is shown in the green box

In order to calculate the skip value, divide that value by the bs value (that is, 512). In other words, do the following:

576716800 / 512 = 1126400

1126400 is the skip value.

To calculate the count value

Locate and highlight the eight bytes that start at the 41st byte from the start of the NTFS header. To find this value, in the screen below, go down two rows from the first (EB) byte of the header, go across to the 08 column, and highlight the following eight bytes, as shown in Figure 8.

[Link]

Figure 8: Finding the count value (highlighted)

Highlight the next eight bytes, all the way to column 15, as shown (so, bytes 41-48). The value that is shown in INT64 in the data interpreter is the count value - in the figure above, 1995745279. This value is in sectors, and the above command needs it in sectors, so no conversion is needed - note the value and you're done.

Which partition to choose?

We said above that you should choose the largest available partition to carve out. The count value indicates how large the partition is. If the partition is only a few sectors in size, it is likely not worth carving out. To increase the chances of successfully carving out the C: drive, the best approach would be to find the largest partition in the initial list of NTFS partitions and carve that one out.

The largest partition should be approximately the same size as the overall VM file. However, the VM file size is shown in bytes, whereas the NTFS size is shown in total sectors. To compare them, you'll convert the sector size of the partition into bytes to compare.

In order to convert the sector size of the partition into bytes, multiply the sector size (as shown in the data interpreter) by the bs value. So, using the numbers we found in the above examples:

1995745279 x 512 = 1021821582848 bytes (951.64 GB)

Ready, set…

You now have the three values you require to use the dd utility. Enter the needed values into the dd command, paste the command into dd itself if you followed our advice to do all this in a text editor, hit Enter, and dd will carve out the chosen NTFS partition.

When completed, mount the new file that you just carved. You should then be able to recover what you need. If the drive does not mount, try 7-Zip (or other archiving tools), other mounting tools, or FTK.

To recap, Figure 9 shows an annotated diagram of the NTFS header and where the values are located.

[Link]

Figure 9: A colorful look at an NTFS header (count value is marked as "total sectors in file system")

Conclusion

Once more, we caution the reader that results are not guaranteed; the best method of retrieving data encrypted in an attack is to pull a copy from a clean, unaffected backup. However, these methods may help the investigating team claw back data in situations where there's no other choice.

When is it time to give up? Sadly, data cannot always be recovered fully, in part, or even at all. Expect results to vary, sometimes for no reason that can be determined. It's up to you, in consultation with the business stakeholder, to decide when to walk away from the process.

Acknowledgements

The authors wish to thank the creators of the software mentioned above. The editor wishes to thank Jonathan Espenschied for the Swiss-Army-knife-with-no-handle description of dd. Some information in this article was originally presented as part of CyberUK in May 2024.