Description
In this project, you need to build a simple distributed database management system (D2_DB) that
will operate in 2 Linux virtual instances in the Google Cloud Platform. Your team should explore and
implement data structure concepts for creating the databases, its management system, and security.
In addition, your team should build an analytics engine to perform basic data analytics. Your database
should handle requests handle from 2 users (one user for each VM instance). The database
management system layer should provide a command-line interface (Graphical User Interface is not
required), and perform various functionalities of database management system listed below.
Functional Requirements:
Module 1: DB Design
1. You need to identify one or two linear data structure(s), which can be used for query, and
data processing.
2. Once the data is processed it can be stored in a customized file format, which will be
considered as your persistent storage
Customized File Format cannot be in any of these formats – JSON/XML/CSV/Serialized or
binary
3. You need to maintain data dictionary or meta data file for each local DBMS, and a global
metadata file; You can use same customized file format to maintain the data dictionary.
Project Requirement
Module 2: Query Implementation
1. Your application should validate and execute the following operations.
(Standard SQL command and queries only)
a. Create database
b. Use database
c. Create table
d. Insert into table
e. Select from table with single where condition (AND || OR || NOT are not required)
f. Update one column with single where condition (AND not required)
g. Delete a row with single where condition
Module 3: Transaction Processing
1. Your system should identify what is a transaction and what is a normal query.
2. In case of transaction, you cannot write the data immediately to persistent storage, you need
to perform the operation in the linear data structure.
3. Transaction follows ACID properties, so you need to consider that. [Hint: It can be achieved
by scanning logs]
{You do need to work on concurrency control. Single distributed transaction is sufficient}
Module 4: Log Management
You need to create 3 Logs – JSON format is allowed for Logs
a. General Logs: query execution time, state of the database (e.g. how many tables are
there with number of records at a given time)
b. Event Logs: changes in database, concurrent transactions, crash reports, etc.
c. Query Logs: capture user queries and timestamp of query submission
{Do not use any in-build package or libraries; You need to perform normal file read write operations
to capture the query, timestamp, login details, data change etc.}
Module 5: Data Modelling – Reverse Engineering
1. Your application should provide option for generating an ERD based on current database
state.
2. User will provide the database name, and the application will create ERD based on the
metadata, and data files.
{You do not have to create any graphical ERD, a text file containing tables, columns, relationships,
cardinality will be sufficient}
Module 6: Export Structure & Value
1. Your application should provide an option for exporting structure and values (in standard
SQL format) for each database, which is selected by the end user.
2. This is like SQL dump (structure+value) created by any standard DBMS Export option.
3. You cannot use any external packages, and as mentioned, your export format must be in SQL
4. Your data dump must capture the current state of the database (E.g. if there is any update
performed on a data, it must be reflected in the data dump. Do not write create or insert
statements from console input to a file as the SQL export. You must dynamically generate it)
Project Requirement
Module 7: Analytics
1. The D2_DB application should provide some basic analytics – The results must be written in
files, and displayed on screen
a. How many queries (valid + invalid) are submitted by Database. (E.g. ran in VM1
instance)
E.g.: >> count queries;
“user SDEY submitted 10 queries for DB1 running on Virtual Machine 1”
“user Alex submitted 3 queries for DB2 running on Virtual Machine 2”
b. How many Update operations are successfully executed by Tables
E.g. >> count update DB1;
“Total 9 Update operations are performed on Employee”
“Total 3 Update operations are performed on Department”
Module 8: User Interface & Login Security
1. Your user interface design should be basic console based
2. It must provide access to the valid user only.
3. User interface can be menu driven, and provide options for registration or login
4. For registration, it should accept the userID, password, and security questions/answers – and
store all information in the User_Profile text file
a. Use some hashing like md5, Sha1 etc. and store the hashed UserID, hashed password
in the file. (you can use Java libraries for hashing)
b. You do not have to encrypt security question/answer.
5. If a registered user wants to access the system he/she/they needs to provide valid UserID,
and password, which will be hashed, and checked with the entry available in the User_Profile
text file. The security answer will also be asked at login. Since you are building a distributed
database management system, User_Profile text file needs to be present in both vistual
machine instances.
6. Once the users gain access, they get 5 options.
E.g.
1. Write Queries
2. Export
3. Data Model
4. Analytics
7. The “Write Queries” option should work for both normal queries and transaction.
Deliverables:
Phase 1 (Feasibility Study and Design): (2%)
• Each group will work on the problem and identify a solution.
• Each group will perform a feasibility study and submit a short-recorded group presentation
with their meeting Logs. (only 1-member can present for 10 min. as group representative)
• Each group will submit their tentative design, and implementation plan (maximum 5 pages.
Include citations if you use any source for background study, coding skills, or concepts.)
Phase 2 (Go-Live and Project Closure): (20%)
• Each group will complete the task incorporating the changes suggested in Phase 1
Project Requirement
• Each group will complete the project implementation and record the details in the form of a
final report
• Each group will submit a report (40 to 65 pages) and a recorded group presentation
explaining major code blocks, and demonstration of the product. (every member needs to
present)
o Final Report should contain architecture details, design details, implementation
details of each module, use case, test case, meeting logs etc.
• There will be a synchronous Q&A session with each group at the end of the course
Visual Representation for your understanding
GCP VM 1 running Linux GCP VM 2 running Linux
D2_DB Package
Java Code + Data Files_VM1 +
Folders_VM1 + Meta Data_Local_VM1
+ Meta Data_Global
D2_DB Package
Java Code + Data Files_VM2 +
Folders_VM2 + Meta Data_Local_VM2
+ Meta Data_Global
ssh to either VM and execute your Java
code, which will perform the
Distributed DB operation
Local Machine
Cloud Machine 1 Cloud Machine 2

