CS2103 Project 2 — LRU Cache

$30.00

Download Details:

  • Name: Assignment-2-nkt2fi.zip
  • Type: zip
  • Size: 439.87 KB

Category:

Description

5/5 - (1 vote)

Introduction
In this project you will use object-oriented design techniques to implement and encapsulate an abstract data type (ADT)known as a cache. A cache is a data structure that provides accelerated access to a relatively small subset of data thatnormally take a relatively long time to access. Caches are associative such that the user/client (we will use both namesinterchangeably) requests the value associated with a certain key. For example, in a web cache, a key might be the URL of apage on a web server, and the associated value might be the contents of the web page itself. For an image cache that allowsthe client to access frequently used images more quickly than having to load them from disk, the key might be the name ofthe image (e.g., “smurf”), and the value would be the image itself (e.g., the bytes contents of a PNG file). A cache serves asan intermediary between the user/client and a data provider — the web server or file system, in our two examples above.By using a cache to store a small subset of data in faster storage, the client application can often be accelerated significantly.Caches are ubiquitous in modern computer design; CPU caches, disk caches, web caches, etc., are all commonplace.By definition, caches can only store a subset of all possible (key,value) combinations that a user might request. Why? Thereason is that caches provide accelerated access to data — if they could contain an infinite number of data and provide themfaster than the data provider (the web server in our example above), then there would be no need for the data provider at all– you would just store everything in the cache. The fact that caches can only store a relatively small number of (key,value)pairs compared to the set of all possible such pairs that the user might request, means that the implementer of the cache hasto think carefully about which data should be stored. In addition, the implementer of the cache must ensure that access todata is as fast as possible — determining whether or not a (key,value) pair is already stored in the cache should happen inconstant time. Recall that constant time means that the amount of time does not depend on the number of data in thecache. How caches work
The figure above provides a schematic for how caches work. Suppose you wish to cache a large collection of images inRAM so that your program doesn’t have to load them from disk every time it needs to display them. The “data provider” inthis example would thus be the disk (the definitive location where all the images are stored). Whenever the client needs toaccess a particular image, it calls the get method of the cache and specifies the name of the image that it wants — “smurf”, inthis example.
Cache miss
In the cache’s implementation of the get method, the first thing it does is check whether an object with a key of “smurf” isalready stored in the cache. Let’s suppose the cache is empty, so that “smurf” is not found. In this case, we have a cache miss– a key requested by the client was not found in the cache. Whenever the cache has a miss, it must forward the get requestto (i.e., call the get method of) the data provider. While the data provider has access to all possible images the client mightever request, it is also relatively slow compared to the cache — disk (even SSD) is typically slower than RAM, for example.Once the data provider has determined the value that is associated with the key — i.e., in our example, loaded the image withthe specified name from disk — it returns the result back to the cache. The cache then stores the (key,value) pair in its owninternal memory — which, by assumption, is much faster than whatever storage is used by the data provider — and thenreturns the result back to the client. All in all, the amount of time taken by the cache to execute the get method isrepresented by “Delta T for cache miss” in the figure.
Cache hit
Now, let’s suppose that, at some time in the near future, the client requests the “smurf” image again from the cache. Sincenot much time has passed (and, more to the point, not many other other images have been requested from the cache), it islikely that the “smurf” image is still stored in the cache. In this case, the cache can return the value associated with the key”smurf” directly — without needing to consult the data provider. The total time needed to service the client’s get requestis now much reduced — represented in the figure by “Delta T for cache hit”. This is the whole point of the cache –hopefully, for a large fraction of requested keys, the associated values can be retrieved more quickly than byconsulting the data provider itself.
Eviction
Whenever the cache has a miss, it needs to consult the data provider, store the (key,value) pair in its own storage, and thenreturn the value to the client. Keep in mind that the cache’s internal storage is finite (and usually much smaller compared tothe data provider’s). If the cache’s storage is full, some existing (key,value) pair must be evicted from the cache and replacedwith the new pair. But which (key,value) pair should be replaced?
There is no definitive answer to this question, and different programmers have different strategies for trying to determinewhich will give the highest performance. In general, you want to pick your eviction strategy so as to minimize the expectednumber of cache misses. In this assignment, you will implement a commonly used heuristic known as least-recently used(LRU) — it assumes that the (key,value) pair that was requested least recently will likely not be requested in the near future.In other words, when you need to evict an element stored in the cache’s internal memory, you should evict the item that wasleast recently used.
Here’s an example of how the LRU eviction strategy works. Suppose the cache has enough internal memory to store just 2(key,value) pairs, and suppose that the following sequence of keys is requested by the client: smurf garfield margesnoopy garfield snoopy. How would the cache’s internal memory look when serving each get request, and does the cachehit or miss for each request? The answers are shown in the table below:
Requested key Cache’s memory before request Hit or miss Eviction
smurf -, – miss –
garfield smurf, – miss –
marge smurf, garfield miss smurf
snoopy marge, garfield miss garfield
garfield marge,snoopy miss marge
snoopy garfield,snoopy hit –
Implementing an LRU cache in Java
In this project, you will implement and encapsulate an LRU cache in a Java class called LRUCache. You will also write testcode both to verify that your own cache is written correctly and to catch errors in other people’s possibly incorrectly written(buggy) caches. You will be graded on the correctness of both the cache and the cache tester.
The LRUCache class you create must have a public constructor that takes two parameters. The first is the DataProvider thatthe cache will consult for every cache miss. How exactly is the data provider defined? We want to be flexible, so we willdefine a data provider as any Java class that implements a single method: U get (T key). Notice that, in contrast toArrayList, which takes only one type parameter, our cache takes two type parameters: Here, T is the type of the key, and Uis the type of the value associated with the key. For example, T might be String (the name of the image), and U might beImage. See the DataProvider interface in the project Zip file.
In addition to specifying the data provider, you also have to specify the capacity of the cache, which is equal to themaximum number of elements (i.e., (key,value) pairs) that the cache can store at one time. It is crucial that your cacheallocate exactly enough memory to accomodate capacity memory slots — no more and no less — so that our automatic testcode will evaluate your cache correctly.
The only methods that your LRU cache needs to implement are U get (T key) and int getNumMisses (). The getmethod has the exact same signature as in the DataProvider interface — indeed the Cache interface is actually a subinterfaceof DataProvider. The getNumMisses reports the number of cache “misses” that have occurred since the Cache wasinstantiated. Usage example
Here is an example of how the cache might be used: First, there needs to be a data provider. In our image cache example,this might look something like this:
public class ImageLoader implements DataProvider<String, Image> {
private final String _imageRootDirectory;
public ImageLoader (String imageRootDirectory) {
_imageRootDirectory = imageRootDirectory;
}
public Image get (String key) {
return loadImageFromDiskAtDirectory(key, imageRootDirectory);}
}
Next, we instantiate an LRUCache object and pass it an ImageLoader as the data provider along with an initial capacity (Ichose 128 arbitrarily):
final Cache<String, Image> cache = new LRUCache<String, Image>(new ImageLoader(“pathToMyImages”), 128);Now that the cache has been instantiated, we can use it:
final Image smurf = cache.get(“smurf”);
final Image garfield = cache.get(“garfield”);
// …
Note that this was just an example. While you are free to test your cache with whatever objects you like, I would suggestsomething simpler than actually loading images from disk.
LRU eviction policy
The hardest part of this assignment is implementing the LRU eviction policy so that it executes in constant time in theaverage case. In particular, your LRUCache should not just do a linear search through the cache’s internal memory lookingfor (and updating) the least-recently used item — you must devise a faster procedure. We suggest that you creatively use botha linked list and a hashtable to do so. Accordingly, you are welcome to use Java’s built-in HashMap class, which providesan off-the-shelf, high-performance hashtable. See the Javadoc on HashMap for more information.
Important: In this assignment you may not use the Java SDK LinkedHashMap class (as that would ruin all the fun).Testing an LRU cache in Java
In this project, you will be graded not only on your implementation of the LRUCache, but also on how thoroughly you test it.Your test code should hopefully help you debug your own cache implementation; in addition, we will also test your testerby using it to validate other students’ cache implementations. In particular, we will use your submitted CacheTest to test avariety of LRUCache implementations — some of which are buggy, some of which are fully correct — and give you creditaccording to how accurately your tester reports errors.
During your testing, you should create your own DataProvider — whose inner workings you can observe and controlcompletely since you are writing it — that you should instantiate and pass to the LRUCache constructor.
As noted in class, it is not mathematically possible to test for all possible errors. We will test your tester against a set oferrors that commonly occur when students try to implement an (but fail). Also: your tester is not required to verify that theasymptotic time-costs of the LRUCache methods are constant. As long as the public methods of the LRUCache you are testingoperate correctly, then your tester should consider them “correct”.
Requirements
R1 (35 points): Implement the Cache interface in an LRUCache class that implements a least-recently-used (LRU)eviction policy:
The LRUCache class should contain a public constructor with the following signature: public LRUCache(DataProvider provider, int capacity). Your LRUCache must contain exactly the number of (key,value) pairs specified in the constructor’s capacityparameter. Our automatic testing code will verify that the (key,value) pairs that are evicted — and thus must bereloaded from the DataProvider — are exactly correct according to the LRU policy described above.
1.
The get method of your LRUCache must execute in constant time in the average case. This is actually the cruxof the assignment — devising an algorithm that can find the LRU item, update it, and return the value associatedwith the key, all within a constant amount of time.
All of the gritty details of your cache implementation should be hidden — i.e., encapsulated — from theclient/user. In addition, you should not make any unnecessary variables, methods, or utility classes visible(public) to the client.
Your cache must not only return the correct value associated with a key, but it must also do so in constant time.Submissions that do not achieve constant time in the average case for the get method will receive at most50% of the total points for this part of the assignment.
R2 (15 points): Implement a class called CacheTest that instantiates one or more LRUCache objects and conductsjunit unit tests them for correctness. Your CacheTest must satisfy the following criteria:
All tests must pass on a correctly implemented LRUCache implementation.
At least 1 unit test must fail on an incorrectly implemented LRUCache implementation.
2.
Teamwork
You may work as a team on this project; the maximum team size is 2.
Style
Your code must adhere to reasonable Java style. In particular, please adhere to the following guidelines:
Each class name should be a singular noun that can be easily pluralized.
Class names should be in CamelCase; variables should be in mixedCase. Avoid “magic numbers” in your code (e.g., for (int i = 0; i < 999 /*magic number*/; i++)). Instead, useconstants, e.g., private static final int NUM_ELEPHANTS_IN_THE_ROOM = 999;, defined at the top of your classfile.
Use whitespace consistently.
No method should exceed 50 lines of code (for a “reasonable” maximum line length, e.g., 100 characters). If yourmethod is larger than that, it’s probably a sign it should be decomposed into a few helper methods.
Use comments to explain non-trivial aspects of code.
Use a comment to explain what each method does, what parameters it takes, and what it returns.
Use the final keyword whenever possible.
Use the most restrictive access modifiers (e.g., private, default, protected>, public), for both variables andmethods, that you can.
Declare variables using the weakest type (e.g., an interface rather than a specific class implementation) you can; itheninstantiate new objects according to the actual class you need. This will help to ensure maximum flexibility of yourcode. For example, instead of
final ArrayList<String> list = new ArrayList();
use
final List<String> list = new ArrayList<String>();. Getting started
Download the Project2 starter file. Unit testing
When testing your cache, it will be necessary to implement a DataProvider. We recommend that you endow your providerwith some additional methods that will enable you to verify that the cache is working correctly. For instance, if you knowthat a (key,value) pair should already be stored in the cache, and the client calls get on that same key (resulting in a cachehit), then the cache should not call the get method of your data provider — the cache should just return the value associatedwith the key immediately. By implementing a “test” data provider that offers some auxiliary methods (e.g., getNumFetches),
you can verify the correctness (or incorrectness) of the LRUCache you are testing.
What to Submit
Create a Zip file containing LRUCache.java and CacheTest.java. Submit the Zip file you created to Canvas. Submissiondeadline: Wednesday, November 7, at 11:59pm EDT.