Abstract
The rapid growth of data size causes several problems such as storage limitation and increment of data management cost. In order to store and manage massive data, Distributed File System (DFS) is widely used. Furthermore, in order to reduce the volume of storage, data deduplication schemes are being extensively studied. The data deduplication increases the available storage capacity by eliminating duplicated data. However, deduplication process causes performance overhead such as disk I/O. In this paper, we propose a content-based chunk placement scheme to increase deduplication rate on the DFS. To avoid performance overhead caused by deduplication process, we use lessfs in each chunk server. With our design, our system performs decentralized deduplication process in each chunk server. Moreover, we use consistent hashing for chunk allocation and failure recovery. Our experimental results show that the proposed system reduces the storage space by 60% than the system without consistent hashing.
| Original language | English |
|---|---|
| Pages (from-to) | 173-183 |
| Number of pages | 11 |
| Journal | Lecture Notes in Computer Science |
| Volume | 7971 |
| State | Published - 2013 |
Keywords
- Chunk placement
- Consistent hashing
- Deduplication
- Distributed file system