Description
The objective of this assignment is to test your understanding of MongoDB pipeline aggregates method, sharding and replication. The Assignment is worth 5.0% of your final grade. The Assignment is marked out of 100. When doing the assignment you will need to use MongoDB, the cloud database management system. There is a brief Instruction containing all needed commands given in an Appendix to this Assignment. Read it carefully before starting doing Assignment. In the course of answering assignment questions, you will need to issue command prompt instructions and to use a MongoDB shell of: A single sharded, A multisharded, and A multisharded and replicated MongoDB installation. For deployment of each of these installations, scripts are provided in the Instruction. There are two parts to this assignment. The first one is about using (pipeline) aggregate() methods in the mongo shell of a single sharded deployment. The collection documents for the first part contain data about marinas, boats, sailors, and boat reservations made by sailors. These documents are contained in the file ass4_reserves_17.json Import the file above into a collection with the name reserves. Note, the file ass4_reserves_17.json contains data that slightly differ from your reserves collection in Assignment 3. Important In your answers to the assignment questions, show the commands and methods you issued and used and the answers produced by MongoDB on the standard output. Part I [62 marks] To answer questions in this part of the assignment, use the collection reserves you imported from the file ass4_reserves.json. Question 1. [10 marks] Retrieve all unique sailors. In your answer, sailor documents should have the structure of a simple (non embedded) document, like: {_id: <’reserves.sailor.sailorId’>, name: <’reserves.sailor.name’>, skills: <’reserves.sailor.skills’>, address: <’reserves.sailor.address’>} e.g. {“_id”: 110, “name”: “Paul”, “skills”: [“row”], “address”: “Upper Hutt”} Question 2. [14 marks] Find the sailor who made the maximum number of reservations. In your answer, sailor documents should have the structure of a simple (non embedded) document, containing fields: sailorId, name, address, and no_of_reserves. Question 3. [6 marks] Find the total number of reserves made by sailors. In your answer, the output document should contain just one field with the name total_reserves. Question 4. [22 marks] Find the average number of reserves made by all sailors. If you develop a statement that contains a multistage aggregate() method that produces a correct result, you get 22 marks. If you develop a multi step procedure using manual interventions and get the correct result, you get 14 marks. Hint: The result produced by your (single) pipeline aggregation statement may be incorrect. Perform a manual checking. If you realize your statement produced an incorrect result, explain why it did and develop one that produces a correct result. Question 5. [10 marks] The sailor Paul from Upper Hutt made a reservation for the boat Mermaid in the Port Nicholson marina. When a supervisor checked accepted reserves, he realized that according to Paul’s skills, he does not qualify to drive Mermaid. So, people from the Port Nicholson marina asked you, as a respected MongoDB expert, to design a procedure that will return all boats that a given sailor is qualified to drive. What will be your solution to the problem? Note: A sailor is qualified to drive a boat only if the boat’s driven_by array is not empty (null) and is a subset of the sailor’s skills array. Bonus 15 marks question Use aggregate() method and java scripts to find sailors who made more than average reserves. You will get 15 bonus marks if you get a correct result, but do not insert manually constants into aggregate() method stages to answer the question. Part II [38 marks] Before you start answering part two questions, read carefully Instruction in the Appendix. In this part, you are going to make a number of MongoDB deployments having different numbers of shards. One of deployments is going to be both sharded and replicated. You will experiment with these deployments and the knowledge gained by experimenting will help you answer the questions. Note: Each MongoDB occupies a few hundreds of GB on disk. To avoid using disk space in vain, whenever you finish experimenting with a deployment, perform a cleanall command (have a look in the Instruction for the cleanall command). Question 6. Sharding [22 marks] Read carefully all subquestions of the question. You will use the same MongoDB deployment to answer several subquestions. Use the sha-mongo script to produce deployments with 2 shards, 5 shards, and 10 shards. Populate each of them by the user collection using the test parameter to the sha-mongo script. a) [2 marks] Which partitioning method has been used for sharding the user collection? b) [4 marks] i. How many chunks (roughly) has each shard in the case of: 2, 5, and 10 shards? [2 mark] ii. To which shard belongs the document having user_id: 55555 in the case of: 2, 5, and 10 shards? [2 mark] c) [2 marks] Use the configuration with 10 shards. Connect to the mongo shell of the server storing the shard that contains the user document with user_id: 55555. i. Retrieve the document with user_id: 55555. [1 mark] ii. Retrieve the document with user_id: 1. [1 mark] d) [2 marks] Use the configuration with 10 shards. Connect now to the shell of the mongos process. i. Retrieve the document with user_id: 55555. [1 marks] ii. Retrieve the document with user_id: 1. [1 mark] e) [4 marks] Explain MongoDB behavior in questions c) and d) above. f) [8 marks] Stop mongod server storing the shard that contains the document with user_id: 55555 and connect to the shell of the mongos process, again. i. Retrieve the document with user_id: 55555. [2 marks] ii. Retrieve the document with user_id: 1. [2 marks] iii. What percentage (roughly) of your database became unavailable? Will it become available again if you restart the mongod server? [4 marks] Question 7. Sharding and Replication [16 marks] Use the sharep-mongo script to produce a deployment with 2 shards having 3 replicas each. Populate your replica sets by the user collection using the test parameter to the sharep-mongo script. In this question, you will need to get information about the status of your replication sets (e.g. ports of master and slave servers, which servers are up and which down). Unhappily, the sharep-mongo script does not offer you this option. So you will need to do it manually. To get information about the status of a replica set, connect to the mongo shell of any of the replica servers, switch to the database of interest, and type rs.status(). a) [2 mark] Find the port number of the master server of the replica set rs0. The port number is the last five digits of the value of the server’s name field. b) [2 mark] Connect to the master server of the replica set rs0. i. Retrieve the document having user_id: 1 from the mydb.user collection. [1 mark] ii. Insert the document {“user_id”: 100000, “name”: “Steve”, “number”: 0} into the mydb.user collection. [1 mark] c) [2 mark] Connect to a slave server of the replica set rs0. i. Retrieve the document having user_id: 1 from the mydb.user collection. [1 mark] ii. Insert the document {“user_id”: 100001, “name”: “Steve”, “number”: 1} into the mydb.user collection. [1 mark] d) [2 mark] Stop the master server of the replica set rs0. i. View the status of the replica set rs0 and describe it briefly. [1 mark] ii. Connect to the mongos shell. Retrieve the document with user_id: 1 from the mydb.user collection. Insert the document {“user_id”: 100001, “name”: “Steve”, “number”: 1} into the mydb.user collection. [1 mark] e) [2 mark] Stop the remaining slave server of the replica set rs0. i. View the status of the replica set rs0 and describe it briefly. [1 mark] ii. Connect to the mongos shell. Retrieve the document with user_id: 1 from the mydb.user collection. [1 mark] f) [6 marks] Briefly describe what you have learned by doing subquestions b), c), d), and e) of question 7. Submission Instruction: Submit your answers to assignment questions via the school electronic submission system and hand-in a printed version in the hand-in box on the second floor of the Cotton Building. Please do not submit any .odt, .zip, or similar files. Also, do not submit your files in toll directory trees. All files in the same directory is just fine. Additionally, submit your commands for questions: 1, 2, 3, 4, and 5 as separate .txt files. Appendix A Short Instruction for Using MongoDB on ECS Workstations 1. MongoDB Scripts Before you try to use MongoDB on our school workstations, you have to type [~] % need mongodb This command will allow all what is needed to deploy MongoDB configurations. You may want to insert need mongodb in your .cshrc file to avoid typing need mongodb whenever you log_on. Our programmer Royce Brown produced the following four scripts for deploying different MongoDB configurations: [~] % single-mongo, [~] % rep-mongo, [~] % sha-mongo, [~] % sharep-mongo. You can run these scripts at the command line prompt of your home directory. Just run them without any parameters to see what each one does. In Assignemnt3_17 you will use single-mongo. In Assignment4_17 you will use single-mongo, sha-mongo, and sharep-mongo. After starting a MongoDB configuration (e.g. single-mongo start), you can: Import a collection from a file into a database by typing: [~] % mongoimport –db –collection –file Connect to a mongo shell by typing: [~] % mongo > 2. Useful mongo shell commands To get help: > help To see existing databases, while being in a mongo(s) shell: > show dbs To use an existing database, or to define a new one: > use To see collections in the current database: > show collections To exit from a mongo(s) shell: CTRL/d or exit Note: The default database is test. If you do not issue a use command, all your commands are going to be executed against the test database. Warning: In all deployments the same ports are assigned to servers. After finishing a session you have to stop all servers of your deployment to release ports for other uses. Failing to do so, you will make trouble to other people (potentially including yourself) wanting to use the same workstation. Later, if you want to use the same deployment again, you just do start and your deployment will resume functioning reliably. If you don’t plan to use a deployment again, don’t forget to do cleanall. You are strongly advised to use MongoDB from school lab workstations. The school does not undertake any guarantees for using MongoDB from school servers. You may install and use MongoDB on your laptop, but the school does not undertake any responsibilities for the results you obtain.

