What is Deduplication? (with picture)

What is Deduplication?

By Rachel Burkot

Updated: May 17, 2024

Deduplication is a process used to eliminate redundant data. During the process, a computer’s hard drive is scanned for large sequences of data across comparison windows. While scanning for duplicate data, sequences of eight kilobytes or more are typically picked out. If the sequence is found elsewhere on the storage system, the duplicated file is referenced rather than stored again.

A successful deduplication can eliminate several kilobytes of data on a computer, leading to obvious benefits. Data duplication takes up unnecessary room in the system, and when extraneous data is removed, this leaves the user with more storage space on the computer. This will allow the system to run faster and more efficiently because it is not bogged down with the extra data. Additionally, bandwidth improvement is always more noticeable when a computer has more free space.

Deduplication involves referencing the large quantity of data to the first location and deleting the extra copies of the data, which are, however, indexed in case they should be needed. Often, the same exact data can be stored in as many as 100 different places on a hard drive. If each takes up one megabyte of space, deduplication will reduce this space on the hard drive from 100 megabytes to just one. The process works by archiving the data, and the additional space that is be gained is very beneficial for a computer’s hard drive.

Additional benefits of deduplication include reducing the amount of back-up space needed by as much as 90 percent, reducing costs such as power, space and cooling requirements, restoring a higher level of service, eliminating many different kinds of errors and recovering data at several different points. A drawback of deduplication is that it identifies the duplicate data using cryptographic hash functions, which may be unreliable, and a collision or other type of error would result in the loss of data. Also, if the person who authorized the procedure is not aware of the redundancy reduction involved, the computer’s reliability can be adversely affected.

Data deduplication works by first segmenting each piece of data that is processed. Each segment is identified and compared to data that is already in the system. If the data is unique, it is stored on a disk. If it is a duplicate piece of data, a reference is created instead. Deduplication can be implemented using software called Data Domain, which works with data and storage systems to filter through data, referencing, eliminating or storing each byte, as appropriate.

Our Promise to you

What is Deduplication?

Editors' Picks

Related Articles