We are independent & ad-supported. We may earn a commission for purchases made through our links.
Advertiser Disclosure
Our website is an independent, advertising-supported platform. We provide our content free of charge to our readers, and to keep it that way, we rely on revenue generated through advertisements and affiliate partnerships. This means that when you click on certain links on our site and make a purchase, we may earn a commission. Learn more.
How We Make Money
We sustain our operations through affiliate commissions and advertising. If you click on an affiliate link and make a purchase, we may receive a commission from the merchant at no additional cost to you. We also display advertisements on our website, which help generate revenue to support our work and keep our content free for readers. Our editorial team operates independently of our advertising and affiliate partnerships to ensure that our content remains unbiased and focused on providing you with the best information and recommendations based on thorough research and honest evaluations. To remain transparent, we’ve provided a list of our current affiliate partners here.
Software

Our Promise to you

Founded in 2002, our company has been a trusted resource for readers seeking informative and engaging content. Our dedication to quality remains unwavering—and will never change. We follow a strict editorial policy, ensuring that our content is authored by highly qualified professionals and edited by subject matter experts. This guarantees that everything we publish is objective, accurate, and trustworthy.

Over the years, we've refined our approach to cover a wide range of topics, providing readers with reliable and practical advice to enhance their knowledge and skills. That's why millions of readers turn to us each year. Join us in celebrating the joy of learning, guided by standards you can trust.

What is Deduplication?

By Rachel Burkot
Updated: May 17, 2024

Deduplication is a process used to eliminate redundant data. During the process, a computer’s hard drive is scanned for large sequences of data across comparison windows. While scanning for duplicate data, sequences of eight kilobytes or more are typically picked out. If the sequence is found elsewhere on the storage system, the duplicated file is referenced rather than stored again.

A successful deduplication can eliminate several kilobytes of data on a computer, leading to obvious benefits. Data duplication takes up unnecessary room in the system, and when extraneous data is removed, this leaves the user with more storage space on the computer. This will allow the system to run faster and more efficiently because it is not bogged down with the extra data. Additionally, bandwidth improvement is always more noticeable when a computer has more free space.

Deduplication involves referencing the large quantity of data to the first location and deleting the extra copies of the data, which are, however, indexed in case they should be needed. Often, the same exact data can be stored in as many as 100 different places on a hard drive. If each takes up one megabyte of space, deduplication will reduce this space on the hard drive from 100 megabytes to just one. The process works by archiving the data, and the additional space that is be gained is very beneficial for a computer’s hard drive.

Additional benefits of deduplication include reducing the amount of back-up space needed by as much as 90 percent, reducing costs such as power, space and cooling requirements, restoring a higher level of service, eliminating many different kinds of errors and recovering data at several different points. A drawback of deduplication is that it identifies the duplicate data using cryptographic hash functions, which may be unreliable, and a collision or other type of error would result in the loss of data. Also, if the person who authorized the procedure is not aware of the redundancy reduction involved, the computer’s reliability can be adversely affected.

Data deduplication works by first segmenting each piece of data that is processed. Each segment is identified and compared to data that is already in the system. If the data is unique, it is stored on a disk. If it is a duplicate piece of data, a reference is created instead. Deduplication can be implemented using software called Data Domain, which works with data and storage systems to filter through data, referencing, eliminating or storing each byte, as appropriate.

WiseGeek is dedicated to providing accurate and trustworthy information. We carefully select reputable sources and employ a rigorous fact-checking process to maintain the highest standards. To learn more about our commitment to accuracy, read our editorial process.

Related Articles

Discussion Comments
By indemnifyme — On Sep 21, 2011

@sunnySkys - If you do decide you want to deduplicate your computer (besides deleting duplicate files yourself) make sure you get a good program. As the article said, deduplication can result in a loss of data if you don't know what you're doing!

By sunnySkys — On Sep 20, 2011

I think my computer could use some deduplication. I was going through some of my downloaded files recently and I noticed I had a lot of duplicate files. It seems like I download stuff, forget I downloaded it, and then download it again!

And of course, that's not taking into account all the duplication my computer may be doing without my help!

Share
WiseGeek, in your inbox

Our latest articles, guides, and more, delivered daily.

WiseGeek, in your inbox

Our latest articles, guides, and more, delivered daily.