This article will be focusing on the usage of the Linux tool
dd in the forensic imaging process, along with several tools that have been derived from it. In addition to briefly covering the issue of data completeness when preparing to conduct forensic acquisition.
DISCLAIMER: This article was written by myself and was previously posted on a now-defunct website on 2018-01-28. I backed up the original contents of the article prior to the website shut-down and am now reposting it here for preservation. All information was correct and accurate at the time this article was written.
dd is a command-line tool primarily used in Unix Operating Systems. It serves a very simple, yet useful purpose; to copy data from a specified source to a specified destination. Typically, this will be done bit-by-bit, regardless of any file systems or operating systems that may be present.
dd command is typically installed by default in most GNU/Linux distributions under a package called
coreutils. However, its derivatives, which will be shown later in this post may need to be installed manually. If necessary, I will include more details about the manual installation of these tools as they are mentioned.
Owing to the many implementations of the Linux operating system, it is not uncommon to find
dd installed on devices running Android1. In addition, the
dd tools can be implemented over a network using utilities like
cryptcat2. However, this article will be focusing on traditional storage device imaging using the
dd tool(s) in a non-live, local environment.
Linux Block Devices
Because everything in Linux is technically interpreted as a file, this means
dd can interact with a plethora of data. One of the most important pieces of data being “special” files in Linux; such as block devices like
These block devices are of the most interest to us, as they can represent physical drives attached to your host system; ranging from hard drives to optical drives and even NVME devices. Drives attached to a Linux host system will be assigned a special device in the
/dev directory by the kernel. The naming convention for these device files includes, but is not limited to:
|SATA / SCSI / Serial drives|
Where ‘X’ is a letter of the alphabet starting from ‘a’, (or a number starting from 0, for floppy drives) denoting the order of the devices. For example; your primary SATA hard drive which boots into Linux will be
/dev/sda, while a secondary SATA hard drive will be
In addition, these block devices will often have similar files denoting partitions for each drive, which is usually done by appending a number to the block device. For example, the first partition of your primary SATA hard drive will be
/dev/sda1 (typically the ‘boot’ partition). However, for the purposes of this article, I am only going to be focusing on the raw device files for the drives themselves and not those of the partitions.
In a Forensic Environment
In the context of digital forensic investigation; the
dd tool and its derivatives can be used to read data from the device file of an attached drive and write this data to a raw image file. Bear in mind that the data you acquire from a device such as a hard drive, may not necessarily be complete (see ‘Data Completeness’ section below). The resulting raw image file can then be easily imported into an appropriate analysis suite, or interrogated with other command-line tools.
I personally prefer to use Linux for performing digital forensics whenever I can, and I find
dd, along with its variants, to be invaluable tools. I would highly recommend using low-level command-line tools like
dd to better understand the forensic process before utilizing the well-known commercial tools.
Understanding the issue of data completeness is fundamental in the forensic acquisition process, especially when dealing with Hard Drive Disks (HDDs). There are caveats to consider when using imaging tools such as
dd, one of them being that they may not have access to ALL of the data stored on a device. In most cases, this relates to ‘hidden areas’ commonly found on hard drives, which are typically inaccessible to the Operating System or the BIOS.
The two most common ‘hidden areas’ of a hard drive are known as the Host Protected Area (HPA) and the Device Configuration Overlay (DCO). The HPA was implemented to allow manufacturers to store diagnostic, monitoring and recovery tools on-disk. The DCO was introduced in ATA-6 and is used by manufacturers to change features between drive models and/or alter the observable capacity of the disk. These two hidden areas are simply sectors on the drive which have been specified as ‘protected’ by the drives configuration3.
The third and perhaps most important ‘area’ to be aware of is referred to as the Service Area, or System Area. This area can occupy a significant part of the drives total capacity and is used to store information such as:
- SMART data
- Defective sector lists (P/G lists)
- Firmware code
- ATA passwords
- Servo information
Accessing the data contained in the Service Area is normally only possible via vendor-owned proprietary commands. However, Todd Shipley demonstrated via a proof-of-concept that it is possible to write data to this area, which has interesting implications for anti-forensics4.
It is vital to not only be aware of, but try and account for these hidden areas when conducting an investigation. Some commercial forensic software will take measures to deal with these areas, however, more so the HPA/DCO, than the Service Area. It is not uncommon for such software, or even certain write blockers, to check for the existence of HPA/DCO areas before an image is acquired5. On Linux, it is possible to ‘remove’ these areas by using tools such as
hdparm, but this is outside the scope of this article.
Finally, while I am on the topic of data completeness, the USB flash drive I will be using to show off the functionality of the
dd tools (see ‘Testing Preparation’ section below) has similar issues. All USB devices contain a hierarchy of data known as ‘descriptors’ which are used to provide information to the host system, mainly to determine the appropriate driver(s). This information includes, but is not limited to;
- Vendor and Manufacturer data
- USB device type
- Supported USB versions
- Configuration details
- Serial number of the device
- Number of endpoints
The primary descriptor found on USB drives is known as the ‘device descriptor’, which encompasses the entire device and is at the top of the hierarchy6. Although the device descriptor contains forensically relevant information about the USB drive, this data is stored in the Read-Only Memory (ROM) chip and will NOT be imaged when using tools like
I am going to be using the
dd tools on a Linux system to acquire an image of an unmounted 1GB USB flash drive, without any additional hardware. In a forensic environment, the drive being imaged would ideally be connected to an appropriate hardware write blocker to preserve data integrity, along with any other procedures being taken to ensure data completeness.
You may wonder why the USB device I will be connecting to the Linux host system is assigned a block device with the naming convention
/dev/sdX, considering the device is not connected through a SATA/SCSI interface. This is because, as of kernel version 3.15, Linux utilises a protocol called USB Attached SCSI (UAS) to facilitate the reading/writing of data to USB mass storage devices7.
With UAS; the SCSI command set is used for communicating with the USB device and is why, in this case, the block device uses the SCSI naming convention. You can see this process in the
dmesg output when the USB device is connected to the host system.
WARNING: It is very important to note before I continue that
ddis very unforgiving, especially if you enter the incorrect source and destination values. Therefore, if you are unfamiliar with
ddI would highly recommend you run it in a controlled environment first, lest you risk corrupting or destroying your data. Always ensure you know how the tools and commands work before you run them in a live environment!
Tool #1: DD
As mentioned previously, the standard
dd tool is installed by default on most GNU/Linux distributions under the
coreutils package8. Using
dd on the Linux command-line is very simple and given the block device we want to image is
/dev/sdb, a typical
dd command might look like this:
1 dd if=/dev/sdb of=USB_image.dd bs=4k conv=noerror,sync status=progress
if=/dev/sdb: This is our input (source) file (
if), which in this case is the block device associated with the USB device.
of=USB_image.dd: The output (destination) file (
of), which will be a raw image file consisting of all the accessible data on the USB device acquired by DD.
bs=4k: This specifies the size of the data blocks to be copied from the input file in bytes. If this option is not specified, it will default to a block size of 512, analogous to the traditional sector size on a hard drive. In this case, I used a block size of 4k (4096), which was optimal for my setup9. Larger block sizes are used for efficiency purposes but I would recommend using smaller block sizes where possible, because if you encounter read errors, you risk zero-filling readable data on a larger block size.
conv=noerror,sync: This option is vital if you run the
ddcommand against a disk you suspect of having ‘bad’ or ‘defective’ blocks/sectors. Normally, the
ddtool will abruptly terminate the command if a read error is encountered from the source drive, which the
noerrorparameter prevents. However, you will also need to use the
syncoption in conjunction with
noerror, which will pad any unreadable ‘bad’ blocks with zeros in the output file. Bear in mind that should this occur, the resulting image will not match the original drive when hashes are calculated for each. To counter this, you can calculate hashes in specified intervals using the
dcflddtool (see DCFLDD tool section below).
Reading the MBR
I personally do not use traditional
dd for forensic imaging, however, it is very useful when extracting key excerpts of data from a drive. For example, the following
dd command will extract the first 512 bytes of the accessible data, known as the Master Boot Record (MBR):
1 dd if=/dev/sdb of=USB_mbr.dd bs=512 count=1
count=1: This specifies how many blocks, whose size we define with
bs, are to be extracted. In the above command; I only required a single block, starting at the beginning of the accessible data. This particular block of data is also referred to as the MBR ‘boot sector’ (
0x55AAsignature), which contains partition and Operating System information, as well as boot code used by the BIOS.
A few other less common parameters used with
dd, along with their function, are described as follows:
skip=X: Where ‘X’ is an integer. This option will exclude X amount of blocks, of block size
bs=Xat the start of the input file. For example, if an input file of 100 blocks is imaged with
skip=1, the resulting output file will be 99 blocks in size, having excluded the first block.
conv=sparse: This option should generally be used to save space on the file system, as any zeroed blocks in the output file wont be written to disk. For further reading into sparse files, I recommend this resource.
status=progress: This option will cause DD to show periodic transfer statistics such as; the amount of bytes copied, the elapsed time and the data transfer rate. Typically used for convenience purposes but can help determine optimal block sizes.
Tool #2: DCFLDD
The first tool I will cover that has been forked from the DD project is called
dcfldd, which was developed by the Department of Defense Computer Forensic Lab and is considered to be an enhanced version of the traditional
dd. It boasts notable improvements over the original such as:
- Multiple output file support
- Hash verification
- Hashing during data transfer
- Split output file support
- Log file support
- In-built status progress
Bear in mind that
dcfldd does not support any output format other than ‘RAW’, meaning this tool cannot be used to output to forensic formats such as AFF, EWF, E01, etc. In addition, this tool should not be used when dealing with disks you suspect of containing defective sectors, due to a known issue in the tool itself10.
Most of the common Linux distributions contain
dcfldd in their core repositories and can be very easily installed from the command line. For a list of commands to help install
dcfldd, please check for the appropriate distribution here. Note that Arch Linux and CentOS distributions will require additional repositories to be setup before
dcfldd can be installed.
dcfldd on the Linux command-line; given the block device we want to image is
/dev/sdb, a typical command would look like this:
1 dcfldd if=/dev/sdb of=USB_Image.dd of=USB_Image2.dd bs=4k conv=noerror,sync hash=sha256 hashwindow=100MB sha256log=USB_Image.hash
conv parameters have not changed from their usage in the DD tool, please refer to the previous demonstration of
dd for more details on these options.
of=USB_Image2.dd: In the command above, I have specified a second output file option, with a different file name, meaning I end up with two identical images of the source file (
/dev/sdb). This may not be useful when dealing with very large datasets, however it does allow an examiner to save an image to different locations if necessary.
hash=sha256: This option selects the hashing algorithm SHA-256 to be used when calculating a cryptographic hash of the input and output files. The hashing algorithms MD5, SHA-1, SHA-256, SHA-384 and SHA-512 are currently supported within
dcfldd. I would not recommend using MD5 or SHA-1 as they have been cryptographically broken11.
hashwindow=100MB: As mentioned previously, this option will calculate a hash of the data in specified intervals, in this case, every 100MB of data. This can be seen in the output log file specified with the next option.
sha256log=USB_Image.hash: A good example of the logging functionality of
dcfldd, this option designates a separate file which will store the calculated hashes we specified previously. As shown in command above, this file contains a hash value for each 100MB of data, including the value for the whole data at the end. To check that this last hash value matched the source device, I ran
sha256sumagainst the block device.
dcfldd tool contains many other options and I would recommend reading through the man page if you want to take full advantage of its functionality. Like traditional
dcfldd also contains the options
status, except the status command operates with a simple on/off parameter instead (e.g.
dd, I do not personally use
dcfldd for forensic acquisition, primarily due to the reported issues it has with defective sectors. However, the ability to calculate a hash value at specified intervals can prove very useful in some circumstances.
Tool #3: DC3DD
The second derivative of dd that I am covering is
dc3dd, which was developed by the Department of Defense Cyber Crime Center. DC3DD is very syntactically and functionally similar to the previous tool
dcfldd. However, there are some slight differences between the two, the most notable being that the
conv=noerror,sync option and the progress bar are built into
dc3dd by default. Additionally,
dc3dd allows for automatic hash verification, which is a very useful feature not found in the other DD tools.
Again, most of the common Linux distributions contain
dc3dd in their core repositories and can be very easily installed from the command line. For a list of commands to help install
dc3dd, please check the appropriate distribution here. Note that CentOS will require either the EPEL, Repoforge or CERT Forensic repositories to be setup beforehand.
dc3dd on the Linux command-line has plenty of options for forensic examiners. Given the block device we want to image is
/dev/sdb, a typical
dc3dd command would look like this:
1 dc3dd if=/dev/sdb hof=USB_Image.dd log=USB_Image.log hash=sha256 hash=sha512 hlog=USB_Image.hash
hof=USB_Image.dd: This option will calculate a hash of the specified output file, as well as compare this value to the one calculated for the input file. Should the hashes match, the command will output ‘[ok]’ next to the hash values in
log=USB_Image.log: This option will write the contents of
STDOUTto a specified file. This is useful because if you plan on using
dc3ddmultiple times, you can write to the same log file each time, as it will not be overwritten.
hlog=USB_Image.hash: Specifies a file where the hash value comparison is written to. If the hash verification is written to
STDOUT, it will appear here and in the file specified by
As seen before in
dcfldd, I manually specified which hash algorithms I wanted the tool to use with the
hash= option. I used two (SHA-256 and SHA-512), with the same rational that MD5 and SHA-1 are broken. It is worth noting that
dc3dd has different names for the options seen in the previous tools, which I have listed as follows:
count = cnt: Will read a specified amount of blocks from the input file. The size of these blocks can be altered with the
skip = iskip/oskip: Here
skipis split into two options for input and output.
iskipwill specify the amount of blocks to skip at the start of the input file and
oskipwill specify the same but for the output file.
bs = ssz: The default block size in
dc3ddis 512, but this can be manually overwritten using
ssz. Bear in mind that this will still accept non-absolute values like ‘4k’ (4096).
dc3dd derivative tool is an excellent choice for forensic examiners due its hash verification and advanced logging features. In the forensic imaging process, I personally use a combination of this tool and the next one;
ddrescue. As a side note;
dc3dd is the imaging tool utilised in Bruce Nikkel’s
sfsimage program, which I highly recommend checking out here.
Tool #4: DDRESCUE
The final tool I will be covering is technically not a derivative of
dd, but functions in a very similar way and is very useful for forensic imaging, despite being considered a ‘data recovery’ tool. The tool
ddrescue was developed as part of the GNU project and is not to be mistaken with
ddrescue is considered to be an improvement upon12. Because
ddrescue is primarily focused on data recovery, it is the ideal tool to utilise on devices that are suspected to contain ‘bad’ blocks.
Like the previous two tools,
ddrescue is fairly easy to install on most Linux distributions due to its inclusion in their core repositories. For a list of distributions with appropriate commands and instructions needed to install
ddrescue, please refer to this resource.
Despite being more oriented towards data recovery,
ddrescue still provides options which will prove useful for forensic practitioners. As before, assuming the device we want to image is
/dev/sdb, a typical
ddrescue command would look like this:
1 ddrescue -d /dev/sdb USB_Image.dd USB_Image.map
Note how we do not need to specify the
of options for the input and output files respectively as seen in the other tools.
-d / --idirect: This option specifies direct disc access for the input file and will bypass the kernel cache. Note that not all systems support direct disc access and
ddrescuewill warn you if your system does not.
USB_Image.map: This is the third parameter of
ddrescueand despite being optional, it is highly recommended you use a map file. Note that the map file does not need to obey any naming convention like
*.mapand can be named however you wish. The map file will contain important information about the imaging process, specifically whether there were any read errors during the acquisition. Additionally, should the imaging process be interrupted for any reason, the map file will keep track of the recovered data and as long as the same map file is specified, the imaging can resume where it left off.
The documentation for
ddrescue is very robust and well worth reading through if you have time. From the manual I have picked out some features that I found particularly interesting:
- Specifying the option
-Rwill read the input file in reverse passes. Like many of the options, this is used mainly to maximise recovered data on bad disks.
ddrescueencounter bad sectors on the input file you are imaging, it will not write zeros to the output file in their place like the other
ddtools will do.
- The physical block size will be dynamically decreased to maximise recovered data should
ddrescueencounter bad sectors on the input file.
- Any interface (ATA, SATA, SCSI, etc.) supported by your kernel can be used with
-ioption can be used to specify a starting position on the input file. The option defaults to offset
0if not specified.
I decided to discuss data completeness at the beginning of this article because I believe it is very important for forensic practitioners to understand exactly what data they are acquiring from a device, and consider that it may not always be ‘forensically complete’.
All the tools covered in this article have their own strengths and weaknesses so your individual circumstances will be the biggest factor in deciding which you want to use. However, in my opinion, I would always utilise
ddrescue for imaging drives whenever possible due to its focus on recovering ‘good’ data. Of course, you may not necessarily be in a position to choose one over the other, which is why I emphasise learning how each of them work.
I am aware that there are many other uses/options for the tools covered in this article, but I wanted to show that fundamentally, the forensic imaging process can be completed with command-line tools on Linux. Finally, this is by no means an exhaustive list of imaging tools fit for every possible scenario you may come across.
Shema, M., Davis, C. and Cowen, D. (2004). Anti-Hacker Tool Kit, 3rd Ed. California: McGraw-Hill ↩
Nikkel, B. (2016). Practical Forensic Imaging. San Francisco: No Starch Press, Inc. ↩
Shipley, T. and Door, B. (2016) Hiding Data from Forensic Imagers – Using the Service Area of a Hard Disk Drive [Accessed 2018-01-28] ↩
Shipley, T. and Door, B. (2012). Forensic Imaging of Hard Disk Drives- What we thought we knew [Accessed 2018-01-28] ↩
Larabel, M. (2014). USB Attached SCSI (UAS) Is Now Working Under Linux [Accessed 2018-01-28] ↩
Lyle, J. R. and Wozar, M. R. (2007). Issues with Imaging Drives Containing Faulty Sectors. Digital Investigation: Volume 4 ↩