Data Deduplication Project

Data Deduplication Introduction

This is a data deduplication project implemented using a client-server architecture. The servercli folder acts as the server and the client folder as the client, allowing the client to interact with the server from any machine in the network.

The client sends files to the server, which performs the deduplication task. It only stores files that are unique compared to those already on the server. Otherwise, it saves a reference to the existing file.

Each client's file names and their locations on the server are maintained and stored to achieve this task.

Running the Server

To run the server:

g++ -o a server -lboost_serialization -lstdc++fs

Command line parameter for the server is the port number, such as 8000

./a 8000

Note: The executable file name for the server should be of one character

Running the Client

To run the client:

g++ -o cli cli.cpp

Command line arguments for the clients are the IP address of the server and the port number (this should be the same as that used in the server).

./cli 127.0.0.1 8000

Note: Since the same machine is working as the server, we use 127.0.0.1 (loopback address) as the IP of the server.

Implementation Details

For more details on the implementation of this project, please visit the repository on GitHub: Click Here

Time and Space Complexity

Operation	Time Complexity	Space Complexity
Deduplication	O(n)	O(n)
Server Operation	O(1)	O(n)
Client Operation	O(1)	O(n)