CMUQ 15-121 Sets & Maps
1. Introduction
In this set of notes we’ll talk about two new abstract data types as well as the Java implementations of them: Sets and Maps (also called dictionaries). You’ve seen both of these before, as they exist in Python.
2. Set
A set is a collection of elements typically used in situations where you want to quickly determine if something is present in the set.
2.1. Main Properties
A set has two, main properties:
- It is non-sequenced. This means that unlike an array, the elements in the set do not have indexes. You can’t access the n-th item in a set, because the set items are not sequenced.
- No duplicates are allowed. An item can only appear once in the set.
2.2. Sets in Java
https://docs.oracle.com/javase/8/docs/api/java/util/Set.html
Set
in Java is an interface. That interface is also a Collection
, so it has the following:
isEmpty()
size()
add(value)
contains(value)
remove(value)
- It implements
Iterable
There are two main classes available that implement Set
: TreeSet
and HashSet
. The names imply how they are built.
2.2.1. HashSet
https://docs.oracle.com/javase/8/docs/api/java/util/HashSet.html
A HashSet
implements a Set
using a hash table. Items are hashed into buckets, and if needed the table is periodically enlarged to maintain performance. On average, operations on a HashSet
are all \(O(1)\). For the most part, if you want a set, this is the implementation you should use.
2.2.2. TreeSet
https://docs.oracle.com/javase/8/docs/api/java/util/TreeSet.html
A TreeSet
implements a Set
using a tree data structure. This means that add
, contains
, and remove
are all \(O(\textrm{log } N)\). Why would you ever use this instead of a HashSet
? With a TreeSet
, the items are stored in order. That means that you can iterate through them (using the Iterator
) and they will be in order. (If you don’t need order, then just use a HashSet
.)
3. Maps
A map, also called a dictionary, is used to map keys to values.
3.1. Main Properties
A map has a number of important properties:
- It is a collection of key-value pairs. (Each key is associated with one value.)
- Keys must be unique. (You can’t insert two things with the same key.)
At first glance, these properties are limiting: What if I want to associate more than one thing with a key? For example, imagine I’m building a music manager and want to use a map to link the playlist name with the songs in the list. The key is the playlist name, but it can only map to one thing. So how do I associate it with more than one song? Easy, I put those songs in a list (or a set), and associate the list with the key.
3.2. Maps in Java
https://docs.oracle.com/javase/8/docs/api/java/util/Map.html
Map
is a Java interface. It has the following methods:
isEmpty()
clear()
size()
– Return the number of key-value pairsput(key, value)
– Put a new key, value pair into the map. If they key already exists in the map, replace the pairing and return the old value.get(key)
– Return the value mapped to the key.containsKey(key)
– Returns true there is mapping for this key and false otherwise.remove(key)
– Removes the key (and value) from the map.keySet()
– Returns a set with all the keys from the map.
Much like sets, there are two main classes that implement Map
: HashMap
and TreeMap
.
3.2.1. HashMap
https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html
A HashMap
implements a Map
using a hash table. The key is hashed and the key value pair is stored in the appropriate bucket and if needed the table is periodically enlarged to maintain performance.
The main operations average \(O(1)\) performance due to the hashtable. If you need a map, you probably want to use a HashMap
.
3.2.2. TreeMap
https://docs.oracle.com/javase/8/docs/api/java/util/TreeMap.html
You are actually already intimately familiar with a TreeMap
: You’ve built one. They KeyValueBinarySearchTree
you constructed for homework 6 is very similar to a TreeMap
. There are a few differences, however:
- A
TreeMap
doesn’t use a binary search tree, instead it uses something called a red-black tree. (As a side note,TreeSet
also uses a red-black tree.) - Due to the red-black tree, the tree always stays balanced, providing guaranteed complexities.
The main operations are \(O(\textrm{log } N)\) due to the tree. Why would you use a TreeMap
instead of HashMap
? Much like sets, it is about order. You can iterate through the keys of a TreeMap
in order. The keys of a HashMap
are not usefully ordered.
4. Examples
We’ll do some examples using these in class. Check out the demo code repository to find them. (After class, of course.)