Description
Question 1: Most Likely Mutation Tree
This question is inspired by this research article: Spada et al. J Clin Microbiol. 2004 Sep; 42(9): 4230–4236. and this episode of the erstwhile popular TV show Numb3rs https://www.hulu.com/watch/315041 (need a hulu subscription).
Viruses have RNA which mutate rapidly. Let us assume that the RNA of a viral species can be written as an l letter string made up of A, C, T and G. While replicating, a virus can mutate through random changes in k out of these l positions with probability proportional to 2−k2.
We collect viral samples starting from a single individual and after sequencing, we observe n mutants A1,…,An, but we do not know which individual mutated to another nor what the order of these mutations were. We wish to reconstruct the mutation tree that connects Ai to Aj if Ai mutated into Aj or vice-versa.
Assume that l is large enough that the same RNA sequence cannot be obtained through two different sequences of mutations.
You are given a weighted undirected graph G whose vertices are the RNA sequences A1,…,An and there is an edge between any two nodes (Ai,Aj) with weight 2−d(i,j)2, where d(i,j) is the the number of differences between Ai and Aj.
A spanning tree T of G then represents a possible history of mutations, the likelihood of which is given by the product of the edge weights of T.
Show how to compute the most likely spanning tree T in this graph.
Answer 1 (Expected Length: 6 lines)
your answer here
Question 2 (A): Distance Between Clusters
Let G be a weighted undirected graph with vertices V and edges E.
Assume that all edge weights are unique and let T be a MST for this graph.
Let us partition the vertices into two clusters V1 and V2. If an edge e:(u,v) satisfies u∈V1 and v∈V2, we will call it partition crossing.
We define the distance between these clusters as the minimum weight among all partition crossing edges of the graph.
Show that the minimum weight partition crossing edge must belong to the MST T.
Hint: Attempt a proof by contradiction.
Answer 2(A) ( Expected Length: 6 lines)
your answer here
Question 2 (B) : Distance Between Point Clusters
Let Q: {(x1,y1),…,(xn,yn)} be the coordinates of n points on a plane. Given a partition of Q into two clusters Q1 and Q2, the distance between these clusters is defined as the smallest among the pairwise distances taking one point in Q1 and one in Q2:
d(Q1,Q2)=min(xi,yi)∈Q1,(xj,yj)∈Q2 (yj−yi)2+(xj−xi)2.1, \mathcal{Q}_2) = \min{ (x_i, y_i) \in \mathcal{Q}_1, (x_j, y_j) \in \mathcal{Q}_2}\ \sqrt{(y_j-y_i)^2 + (x_j – x_i)^2} ,.
Given such a partition of Q, write an efficient algorithm to calculate the distance between them. In particular, we would like you to use the idea from 2(A) for this task. Also, what is the complexity of your method?
Answer 2 (B) (Expected Length: 8 lines)
your answer here
Question 2(C): Finding maximally separated clusters
Now you are given Q as above (n points scattered in the plane), but without a partition. Use the idea from 2(B) to partition the set Q into maximally separated clusters Q1 and Q2. Ie, find Q1 and Q2 such that d(Q1,Q2) is maximized.
Answer 2(C) (Expected Size: 4 lines)
your answer here
Question 3: Circular dependencies
Jane Programmer (of the dreaded dynamic programming assignment) has been reviewing her school’s course catalog. Classes in her department are organized into 8 divisions — PBNJ-1000 through PBNJ-8999, with each division more difficult than the last. But Jane has noticed some issues — some upper division classes have lower division prerequisites, but the opposite is true as well. In fact, her current course, PBNJ-3104, requires PBNJ-2400 and PBNJ-2400 requires PBNJ-1300, but PBNJ-1300 requires PBNJ-3104!
She wants to report these circular dependencies to the dean of the department by finding two classes which depend on each other and have the largest difference in class number. In the example above, PBNJ-3104 and PBNJ-1300 depend on each other and have a difference of 3104 – 1300 = 1804.
Given a list of classes and their prerequisites, describe an algorithm that will help Jane find the pair of classes with maximum difference between their class numbers.
What is the running time of your algorithm in terms of number of classes and prereqs?
HINT: Your algorithm should involve the strongly connected components of a graph.

