Homework 8

Due Monday 18-Mar, at 8pm



  1. Mentor Meeting [0 pts]
    You are required to meet with your mentor every week. Failing to meet with your mentor will result in -5 on the assignment. Remember, a mentor meeting is a quick, in-person checkin before the submission deadline of this assignment. The goal of the mentor meeting is to see how you are doing, get feedback, and provide advice and tips for the class.

  2. COLLABORATIVE: Big-O Calculation (Manually graded) [20 pts]
    In a triple-quoted string at the top of your file (just below your name), include solutions to this exercise. For each of the following functions:

    1. State in just a few words what it does in general.
    2. Write the Big-O time or number of loops for each line of the program, then state the resulting Big-O of the whole program.
    3. Provide an equivalent Python function that is more than a constant-factor faster (so its worst-case Big-O runtime is in a different function family). The better your solution's Big-O runtime, the more points you get!
    4. Write the Big-O time or number of loops for each line of the new program, then state the resulting Big-O of the whole program.

    def slow1(lst): # N is the length of the list lst assert(len(lst) >= 2) a = lst.pop() b = lst.pop(0) lst.insert(0, a) lst.append(b) def slow2(lst): # N is the length of the list lst counter = 0 for i in range(len(lst)): if lst[i] not in lst[:i]: counter += 1 return counter import string def slow3(s): # N is the length of the string s maxLetter = "" maxCount = 0 for c in s: for letter in string.ascii_lowercase: if c == letter: if s.count(c) > maxCount or \ s.count(c) == maxCount and c < maxLetter: maxCount = s.count(c) maxLetter = c return maxLetter def slow4(a, b): # a and b are lists with the same length N n = len(a) assert(n == len(b)) result = abs(a[0] - b[0]) for c in a: for d in b: delta = abs(c - d) if (delta > result): result = delta return result

  3. COLLABORATIVE: movieAwards(oscarResults) [10 pts]
    Write the function movieAwards(oscarResults) that takes a set of tuples, where each tuple holds the name of a category and the name of the winning movie, then returns a dictionary mapping each movie to the number of the awards that it won. For example, if we provide the set:
    { ("Best Picture", "The Shape of Water"), ("Best Actor", "Darkest Hour"), ("Best Actress", "Three Billboards Outside Ebbing, Missouri"), ("Best Director", "The Shape of Water"), ("Best Supporting Actor", "Three Billboards Outside Ebbing, Missouri"), ("Best Supporting Actress", "I, Tonya"), ("Best Original Score", "The Shape of Water") }
    the program should return:
    { "Darkest Hour" : 1, "Three Billboards Outside Ebbing, Missouri" : 2, "The Shape of Water" : 3, "I, Tonya" : 1 }

    Note: Remember that sets are unordered! For the example above, the returned set may be in a different order than what we have shown, and that is ok.

  4. COLLABORATIVE: largestSumOfPairs(a) [10 pts]
    Write the function largestSumOfPairs(a) that takes a list of integers, and returns the largest sum of any two elements in that list, or None if the list is of size 1 or smaller. So, largestSumOfPairs([8,4,2,8]) returns the largest of (8+4), (8+2), (8+8), (4+2), (4+8), and (2+8), or 16.

    The naive solution is to try every possible pair of numbers in the list. This runs in O(n**2) time and is much too slow. Your solution should be more efficient.

  5. COLLABORATIVE: instrumentedBubbleSort(a) [10 pts]
    Write the function instrumentedBubbleSort(a). It should work just like the version given in the course notes, except instead of returning None, it should return a tuple of two values: the number of comparisons and the number of swaps.

All problems below here are SOLO.



  1. containsPythagoreanTriple(a) [10 pts]
    Write the function containsPythagoreanTriple(a) that takes a list of positive integers and returns True if there are 3 values (a,b,c) anywhere in the list such that (a,b,c) form a Pythagorean Triple (where a**2 + b**2 == c**2). So [1,3,6,2,5,1,4] returns True because of (3,4,5).

    A naive solution would be to check every possible triple (a,b,c) in the list. That runs in O(n**3). You'll have to do better than that.

  2. instrumentedSelectionSort(a) [10 pts]
    Write the function instrumentedSelectionSort(a). It should work just like the version given in the course notes, except instead of returning None, it should return a tuple of two values: the number of comparisons and the number of swaps.

  3. Facebook friends [15 pts]
    When performing tasks on large amounts of data, we frequently need to format the dataset in a way that allows us to perform our task efficiently. For example, if we want to study social networks to find mutual friendships between specific users, we may want to format our data in such a way where we don't need to iterate over the entire dataset for every single search. This is especially true if we need to perform many searches. Perhaps we have recently discussed a data type that is well suited to this task...?

    For this problem, we will provide a text file of hundreds of anonymized Facebook friendships. Please download them here if you haven't already, move the unzipped text files to the same directory as hw8.py, and then open the files in a text editor to see how the data is represented. Individual users are identified by a unique integer. Each line consists of a pair of integers representing a "friendship" between two users. For example, the line "123 234" indicates that User #123 lists User #234 as a friend. See below for a very short example of how these text files are formatted:

    403 401 403 402 401 402 402 400 401 399 402 403 403 399 401 403 402 403 402 401 400 402 399 401 403 402 399 403
    Given a specific user, we want to be able to efficiently search for all of the IDs listed as friends by that user. If we have to perform lots and lots of searches, an inefficient approach would be to iterate over every line of the text file for each query, and when a line begins with the user's ID, the friend's ID is appended to a list, and the list is returned after the entire file has been searched. This is potentially very slow for large datasets and many queries! We don't want to have to iterate over the entire dataset every time we want to know who one person is friends with.

    Let's do better than that. We'll format the dataset once in a way where we can search for a specific user's friends easily, and then we'll create some functions that will quickly perform those searches on our formatted dataset. We should be able to call these functions many times without having to worry about how big our dataset is.

    1. First, write the function formatDataset(filename), which reads the specified file and returns friendData, which is the dataset represented in a format of your choosing. For example, you might choose to return friendData as a string, though this is likely a poor choice. You could also return friendData as some sort of list or set or dictionary. It's up to you! This function does not have to be exceptionally efficient. Just make sure formatDataset(filename) runs in O(n3) time or better, and does not time out the Autograder (i.e. it should run in under 1 minute.). You should be able to do better than O(n3 pretty easily, as our solution function runs in O(n). This function will only be called once, and once we have friendData, we'll use that as an input for our queries.

      Hint: In this problem, n scales with the number of unique users in the input file which (in this problem) means that it also scales with the number of lines. For large enough datasets, the number of friends each user has is fairly constant. (i.e. maybe user 123 has 10 friends in a dataset of 100 ids. User 123 would also have about 10 friends in a dataset of 1000 ids.).


    2. Now, write the function friendsOfUser(id, friendData) where id is the integer representing a specific user, and friendData contains the stored output of your previous function. (Do not loop over the original input file for each query!) Your function should return the set of user IDs identified by the input user as Facebook friends.

      For example, if friendData is the formatted data from our previous example, friendsOfUser(401, friendData) should return:
      {402, 399, 403}


    3. Finally, write the function mutualFriends(id1, id2, friendData) where id1 and id2 are integers representing two specific users, and friendData contains the stored output of formatDataset(filename). (Again, do not loop over the original input file for each query!) Your function should return the set of user IDs identified by BOTH users as Facebook friends.

      For example, if friendData is the formatted data from our previous example, mutualFriends(401, 403, friendData) should return:
      {402, 399}


    Here's the catch: friendsofUser(id, friendData) and mutalFriends(id1, id2, friendData) must run efficiently! The autograder is going to test these functions many times with different user IDs. For full credit, friendsofUser(id, friendData) and mutalFriends(id1, id2, friendData) must run in O(n) time or better (where n is the number of unique user IDs in the input file)! Only partial credit is given for a function that runs in O(n2), and no credit for any function that times out the autograder. This means you can't simply check the entire text file for every combination of users. Remember that formatDataset(filename) does not have to be exceedingly efficient, so think carefully about how you create friendData to make searches most efficient!

    Hint 2: When testing your functions, first make sure everything works on a very small dataset, like the example given above. Then try it on successively larger datasets from the zip file. Does the time increase proportionally to the size of the file / number of unique user IDs? Or is it too slow to run on the largest dataset? Use these files to empirically approxiamte the efficiency of your functions. Do they match your expectation?

    Note: With the exception of the short example above, the datasets used in this assignment are real! Their source is Snap Datasets: Stanford Large Network Dataset Collection (Jure Leskovec and Andrej Krevl, Jun. 2014).

  4. friendsOfFriends(d) [15 pts]
    Background: we can create a dictionary mapping people to sets of their friends. For example, we might say:
    d = { } d["jon"] = set(["arya", "tyrion"]) d["tyrion"] = set(["jon", "jaime", "pod"]) d["arya"] = set(["jon"]) d["jaime"] = set(["tyrion", "brienne"]) d["brienne"] = set(["jaime", "pod"]) d["pod"] = set(["tyrion", "brienne", "jaime"]) d["ramsay"] = set()
    With this in mind, write the function friendsOfFriends(d) that takes such a dictionary mapping people to sets of friends and returns a new dictionary mapping all the same people to sets of their friends of friends. For example, since Tyrion is a friend of Pod, and Jon is a friend of Tyrion, Jon is a friend-of-friend of Pod. This set should exclude any direct friends, so Jaime does not count as a friend-of-friend of Pod (since he is simply a friend of Pod) despite also being a friend of Tyrion's.

    Thus, in this example, friendsOfFriends should return:
    { 'tyrion': {'arya', 'brienne'}, 'pod': {'jon'}, 'brienne': {'tyrion'}, 'arya': {'tyrion'}, 'jon': {'pod', 'jaime'}, 'jaime': {'pod', 'jon'}, 'ramsay': set() }
    Note 1: your function should not modify the initial provided dictionary!

    Note 2: you may assume that everyone listed in any of the friend sets also is included as a key in the dictionary.

    Note 3: you may assume that a person will never be included in their own friend set. You should also not include people in their own friends-of-friends sets!

    Note 4: you may not assume that if Person1 lists Person2 as a friend, Person2 will list Person1 as a friend! Sometimes friendships are only one-way. =(

    Hint: How is this different or similar to the facebook friends problem?