Content-based chunk placement scheme for decentralized deduplication on distributed file systems

Research output: Contribution to journalArticlepeer-review

2 Scopus citations

Abstract

The rapid growth of data size causes several problems such as storage limitation and increment of data management cost. In order to store and manage massive data, Distributed File System (DFS) is widely used. Furthermore, in order to reduce the volume of storage, data deduplication schemes are being extensively studied. The data deduplication increases the available storage capacity by eliminating duplicated data. However, deduplication process causes performance overhead such as disk I/O. In this paper, we propose a content-based chunk placement scheme to increase deduplication rate on the DFS. To avoid performance overhead caused by deduplication process, we use lessfs in each chunk server. With our design, our system performs decentralized deduplication process in each chunk server. Moreover, we use consistent hashing for chunk allocation and failure recovery. Our experimental results show that the proposed system reduces the storage space by 60% than the system without consistent hashing.

Original languageEnglish
Pages (from-to)173-183
Number of pages11
JournalLecture Notes in Computer Science
Volume7971
StatePublished - 2013

Keywords

  • Chunk placement
  • Consistent hashing
  • Deduplication
  • Distributed file system

Fingerprint

Dive into the research topics of 'Content-based chunk placement scheme for decentralized deduplication on distributed file systems'. Together they form a unique fingerprint.

Cite this