class: center, middle # Data Structures for Interviews 2014-10-22 Justin Zhao, ADI Adapted from Zack Newman adicu.com --- # Expectations for this talk I assume familiarity with Java (if you know some other language, you can probably figure it out). I didn't check all of my code. The is **not** a replacement for 3134. It should be a good preview that gets you ready for some basic interviews. PLEASE stop me if I say something that doesn't make sense. > He who asks is a fool for five minutes, but he who does not ask remains a fool forever. > .right[*Chinese Proverb*] --- # Gameplan * Review of Big O * Data Structures * Sorting Algorithms * Sample Interview Questions --- # What Will Be Covered * Functionality, operations, applicability of data structures * Overview of important data structure algorithms and performance * Sample interview questions and how to tackle them with data structures --- # What Will Not Be Covered * The answers to every interview question * Best interview strategies (interview practice) * Implementation details (3134) > It's probably good for you to understand details of implementation, but not necessary for most interviews --- # Review of Big O * "Big O" is the "asymptotic runtime" * Expressed as function of the size of the inputs * Consider best, worst, and average cases  --- # Review of Big O /** * Return the index of the minimum element in an integer array. */ public static int findMin(int[] array) { int minIndex = 0; for (int i = 0; i < array.length; i++) { if (array[i] < array[minIndex]) { minIndex = i; } } } --- # Review of Big O /** * Return the index of the minimum element in an integer array. */ public static int findMin(int[] array) { int minIndex = 0; // O(1) for (int i = 0; i < array.length; i++) { // n times if (array[i] < array[minIndex]) { // O(1) minIndex = i; // O(1) } } } // O(1) + n (O(1) + O(1)) = O(1) + n O(1) = O(1) + O(n) = O(n) --- # Review of Big O /** * Return the index of array (which must be sorted) at which value can be * found, or -1 if it is absent. */ public static int find(int[] array, int value) { return findHelper(array, 0, array.length - 1, value); } // O(lg n) private static int findHelper(int[] array, int minIndex, int maxIndex, int value) { if (maxIndex < minIndex) { return -1; } int index = (maxIndex + minIndex) / 2; if (value < array[index]) { return findHelper(array, minIndex, index, value); } else if (value > array[index]) { return findHelper(array, index + 1, maxIndex, value); } else { return index; } } --- # Review of Big O Which performance is better? `O(10000 n)` vs. `O(n)` `O(log n)` vs. `O(log(log n))` `O(n log n)` vs. `O(n log k)` `O(0.000001 n^2)` vs. `O(9999999 n^1.999999)` `O(2^n)` vs. `O(3^n)` `O(n!)` vs. `O(n^n)` --- # Abstract Data Type (ADT) * What is a "data structure" anyway? We define ADTs first * An interface for interacting with data * lists, stacks, sets, queues, maps, trees, priority queue etc. * Defines operations and results, but not how they're implemented --- # Data Structures * Data Structure is an implementation of ADT * For example, an "ArrayList" or "HashMap" * Many ADTs can be implemented as the same Data Structure. --- # Data Structure Primitives ## Arrays int[] array = {1, 3, 5, 2, 6, 9}; /* * Index: 0 1 2 3 4 5 * Value: 1 3 5 2 6 9 */ --- # Data Structure Primitives ## Linked lists  --- # Data Structure Primitives ## Linked lists // Attributes public class Node { public int value; public Node next; public Node(int value, Node next) { this.value = value; this.next = next; } } // Methods public interface MyList { public int get(int index); public void update(int index, int value); public void append(int value); public String toString(); } --- # List Performance
Array
List
Access
O(1)
O(n)
Update
O(1)
O(n)
Append
O(1)/O(n)
O(1)
Traversal
O(n)
O(n)
Always think about the performance of the data structure How about an "arraylist"? --- # Lists: Standard Library import java.util.LinkedList; import java.util.ArrayList; import java.util.List; // ... List
a = new ArrayList
(); List
b = new LinkedList
(); for (int i = 0; i < 10; i++) { a.add(i); b.add(i); } a.set(5, 0); b.remove(5); System.out.println(a); // [0, 1, 2, 3, 4, 0, 6, 7, 8, 9] System.out.println(b); // [0, 1, 2, 3, 4, 6, 7, 8, 9] --- # Linked list: Question How do you reverse a linked list? --- # Linked list: Question public static Node reverse(Node head) { Node prev = null; Node curr = head; Node next; while (curr != null ) { next = curr.next; curr.next = prev; prev = curr; curr = next; } return prev; } --- # Stacks  --- # Stacks // Last-in first-out data structure public interface MyStack { // Add a value to the stack public void push(int value); // Remove a value from the stack public int pop(); // See the value on the "top" of the stack (next to be removed) public int peek(); } MyStack a = ...; // [ ] a.push(1); // [ 1 ] a.push(2); // [ 2 1 ] a.peek(); // returns 2 a.push(3); // [ 3 2 1 ] a.pop(); // [ 2 1 ], returns 3 a.push(4); // [ 4 2 1 ] a.peek(); // returns 4 --- # Stacks
Array
List
Push
O(1)/O(n)
O(1)
Pop
O(1)
O(1)
Peek
O(1)
O(1)
--- # Stacks: Standard Library Stack
a = new Stack
(); a.push(1); a.push(2); System.out.println(a.pop()); // 2 a.push(3); System.out.println(a); // [1, 3] --- # Stacks: Question Write a function to determine if a string consisting of the characters '{', '}', '[', and ']' is balanced. For example, "{[]}" is balanced, and "{[}]" is not. --- # Stacks: Question public static boolean isBalanced(String braces) { Stack
stack = new Stack
(); for (int i = 0; i < braces.length(); i++) { switch (braces.charAt(i)) { case '{': stack.push('{'); break; case '[': stack.push('['); break; case '}': if (stack.pop() != '{') { return false; } break; case ']': if (stack.pop() != '[') { return false; } break; } } return stack.isEmpty(); } --- # Queues  --- # Queues // First-in first-out data structure public interface MyQueue { // Add a value to the back of the queue public void enqueue(int value); // Remove a value from front of the queue public int dequeue(); // See the value on the "front" of the queue (next to be removed) public int peek(); } MyStack a = ...; // [ ] a.enqueue(1); // [ 1 ] a.enqueue(2); // [ 1 2 ] a.peek(); // returns 1 a.enqueue(3); // [ 1 2 3 ] a.dequeue(); // [ 2 3 ], returns 1 a.enqueue(4); // [ 2 3 4 ] a.peek(); // returns 2 --- # Queues
Array
List
Enqueue
O(1)/O(n)
O(1)
Dequeue
O(1)
O(1)
Peek
O(1)
O(1)
--- # Queues: Standard Library import java.util.Queue; import java.util.ArrayDeque; Queue
stack = new ArrayDeque
(); --- # Queues: Question Given one queue and one stack, how do you reverse the stack? --- # Queues: Question Stack
stack = new Stack
(); Queue
queue = new ArrayDeque
(); stack.push(1); stack.push(2); stack.push(3); System.out.println(stack); // [1, 2, 3] while (!stack.isEmpty()) { queue.add(stack.pop()); } while (!queue.isEmpty()) { stack.push(queue.remove()); } System.out.println(stack); // [3, 2, 1] --- # Hash Maps An absolutely essential concept. Supported operations: * insert() * delete() * find() Why is it so impressive? All of these operations can be done in constant O(1) time. --- # Hash Maps Think of a hash map like a really really big array. array[0] = 1 // insert(): O(1) array[1] = null // delete(): O(1) x = array[2] // find(): O(1) An array that, instead of integer indexes, has indexes that can be anything. map["0"] -> 1 map["Adam Cannon"] -> 1004 map["BullsAndCows"] -> 9001 map["thisisareallylongstringthatistoocoolomgomgomgomgomgomgadiadiadilove"] -> 0 Insert, find, and delete are all O(1) now. That's it? --- # Hash Maps No. :( An infinite size array for infinite possible indexes is unreasonable. As it turns out, however, there's a way to simplify anything (String, object, integer) into an integer This is called the **Hash function**: maps an object (of some type) to an integer. Example: x % 10027 --- # Hash Maps **Hash function**: maps an object (of some type) to an integer. Example: public int hash(String s) { int hashVal = 0; for (int i = 0; i < s.length(); i++) { hashVal = s.charAt(i) + hashVal * 37; } return hashVal; } --- # Hash Maps For insert(): 1. Take any thing `X` that maps to `Y` 2. Simplify `X` into an integer `i` using a hash function 3. Set `array[i] = Y` For find(): 1. Take any thing `X` that you want to find `Y` for. 2. Simplify `X` into an integer `i` using the same hash function 3. Set `Y = array[i]` --- # Hash Maps This is awesome! Well, there are some problems with hash maps * Requires a lot of extra memory and a good hash function to avoid collisions * What is a "good" hash function? * Finding max or min is kind of difficult But in general: excellent for what it can do. --- # Hash Maps: Standard library import java.util.HashSet; import java.util.HashMap; // ... HashSet
set = new HashSet
(); set.add("dog"); set.add("cat"); set.add("fish"); System.out.println(set.contains("dog")); // true System.out.println(set.contains("horse")); // false HashMap
map = new HashMap
(); map.put("Jenny", "867-5309"); System.out.println(map.get("Jenny")); // 867-5309 --- # Hash Maps: Question Implement a word counter (not case sensitive, and ignoring special characters). Example: "Gabba gabba hey, gabba gabba hey!" -> { "gabba": 4, "hey": 2 } --- # Hash Maps: Question import java.util.HashMap; public static HashMap
countWords(String document) { HashMap
counts = new HashMap
(); for (String word : document.split(" ")) { String key = word.toLowerCase().replaceAll("[^a-zA-Z]", ""); if (counts.containsKey(key)) { counts.put(key, counts.get(key) + 1); } else { counts.put(key, 1); } } return counts; } --- # Other Hash-y Things Hashtables, HashSets, Dictionaries (Python) Slight variations, but all revolve around the same idea of index simplification via a hash function for constant time insert, delete, and find. If you are really interested, lookup how to resolve hash map collisions or textbook hash functions --- # Trees  --- # Trees public class Node { public int value; public Node left; public Node right; public Node(int value, Node left, Node right) { this.value = value; this.left = left; this.right = right; } public Node(int value) { this(value, null, null); } } --- # Trees These are **mad** useful. Can represent hierarchical data: parse trees in NLP, database indexes, etc. Fast Vocabulary * Root – The top node in a tree. * Parent – The converse notion of child. * Siblings – Nodes with the same parent. * Descendant – a node reachable by repeated proceeding from parent to child. * Ancestor – a node reachable by repeated proceeding from child to parent. * Leaf – a node with no children. * Edge – connection between one node to another. * Path – a sequence of nodes and edges connecting a node with a descendant. * Level – The level of a node is defined by 1 + the number of connections between the node and the root. * Height – The height of a node is the number of edges on the longest downward path between the root and a leaf. --- # Trees: Binary Search Tree **Binary Search Tree**: every value in the left subtree of a node with a value is less than that value; every value in the right subtree of a node with a value is greater than that value. This gives us O(log(n)) retrieval (in the average but not worst case; more on that later). --- # Trees: Binary Search Tree  Before we move further... --- # Recursion public int fib(int n) { if (n < 2) { return 1; } else { return fib(n - 1) + fib(n - 2); } } --- # Trees: Binary Search Tree public boolean find(Node root, int value) { if (root == null) { return false; } if (value < root.value) { return find(root.left, value); } else if (value > root.value) { return find(root.right, value); } else { return true; } } --- # Trees: Binary Search Tree Worst Case: Unbalanced Tree  --- # Trees: Traversal We can traverse the tree in one of a few ways. ## Pre-Order (XLR)  FBADCEGIH --- # Trees: Traversal We can traverse the tree in one of a few ways. ## In-Order (LXR)  ABCDEFGHI --- # Trees: Traversal We can traverse the tree in one of a few ways. ## Post-Order (LRX)  ACEDBHIGF --- # Trees: Question How do we do a level-order traversal of a tree?  --- # Trees: Question public void levelOrder(Node root) { Queue
queue = new Queue
(); queue.add(root); while (!queue.isEmpty()) { Node curr = queue.remove(); System.out.println(curr.value); if (curr.left != null) { queue.add(curr.left); } if (curr.right != null) { queue.add(curr.right); } } } --- # Trees: Heap A Priority Queue is an ADT is a list where each element has a "priority" associated with it. Operations: * `insert_with_priority()` * `pull_highest_priority_element()` --- # Trees: Heap The maximally efficient way to implement this ADT is with a heap (`O(log n)` performance for both operations) Heap Property: Parents are always more important than their children  --- # Trees: Heap To add an element: insert and reorder the tree (`O(log n)`) To pull the highest priority element: pop the root out and reorder the tree (`O(log n)`) --- # More About Trees AVL Trees (Trees that always stay balanced using rotation) Red-Black Trees (Trees that always stay balanced using colors (fun!)) B-Trees (Trees designed for lots of data) --- # Graphs ADT consisting of Nodes and Edges  --- # Graphs Basic operations: * `adjacent(G, x, y)`: tests whether there is an edge from node x to node y. * `neighbors(G, x)`: lists all nodes y such that there is an edge from x to y. * `add(G, x, y)`: adds to G the edge from x to y, if it is not there. * `delete(G, x, y)`: removes the edge from x to y, if it is there. * `get_node_value(G, x)`: returns the value associated with the node x. * `set_node_value(G, x, a)`: sets the value associated with the node x to a. * `get_edge_value(G, x, y)`: returns the value associated to the edge (x,y). * `set_edge_value(G, x, y, v)`: sets the value associated to the edge (x,y) to v. --- # Graphs: Implementation One way is to use an "adjacency list": We have a list of nodes and every node stores a list of adjacent nodes. public class Node { public int value; public ArrayList
edges; } public class Edge { public Node destination; public int weight; } public class Graph { public ArrayList
nodes; } --- # Graphs: Performance Say we have V nodes and E edges. What is the performance for these operations? * Add vertex * Add edge * Remove vertex * Remove edge Depends on implementation, but this is something you look up. --- # Graphs: Searching Problem: Given a node, can we reach this other node? Search! --- # Graphs: Depth First Search (DFS) bool search(Node root, Node dest) { if (root.value == dest.value) return true; root.visited = true; for (Node n : root.adjacent) { if (!n.visited) { if (search(n, dest)) return true; } } return false; } --- # Graphs: Breadth First Search (BFS) bool search(Node root, Node dest) { Queue q = new Queue(); if (root.value == dest.value) return true; root.visited = true; q.enqueue(root); while (!q.isEmpty()) { Node r = q.dequeue(); for (Node n in r.adjacent) { if (!n.visited) { if (search(n, dest)) return true; queue.enqueue(n); } } } return false; } --- # Graphs: More Interesting Graph Problems (wiki): * Djikstra's * Floyd-Warshall * Ford-Fulkerson * Kruskal's --- # Sorting The problem: given an array containing many values, permute the array so that those values are in order. You'll definitely be expected to have a high-level understanding of these and know the runtimes. Maybe you'll have to implement one. --- # Sorting ## A common tool: swapping private static void swap(int[] array, int i, int j) { int tmp = array[i]; array[i] = array[j]; array[j] = tmp; } --- # Sorting: Selection Sort  Basically, find the minimum element n times. --- # Sorting: Selection Sort public static void selectionSort(int array[]) { for (int i = 0; i < array.length; i++) { int minIndex = i; for (int j = i; j < array.length; j++) { if (array[j] < array[minIndex]) { minIndex = j; } } swap(array, i, minIndex); } } General Information * O(n^2) performance * O(1) space --- # Sorting: Bubble Sort  The maximum elements bubble to the top --- # Sorting: Bubble Sort public static void bubbleSort(int array[]) { for (int i = 0; i < array.length; i++) { for (int j = 0; j < array.length - 1; j++) { if (array[j] > array[j + 1]) { swap(array, j, j + 1); } } } } General information * O(n^2) performance * O(1) space --- # Sorting: Merge sort  1. Split the array in half and sort each subarray. 2. Weave the arrays back together in one fluid pass. --- # Sorting: Merge sort Here's where I'm going to stop implementing things and just explain them. Wikipedia is great. Merge sort is recursive. function mergeSort(A): split A into A_beginning and A_end at the middle mergeSort(A_beginning) mergeSort(A_end) return merge(A_beginning, A_end) General Information * O(n log n) performance * O(n) space --- # Sorting: Quicksort  "Divide and Conquer" --- # Sorting: Quicksort function quickSort(A): pick a "pivot" element in A move all elements less than the pivot to the beginning of A move all elements greater than the pivot to the end of A put the pivot in the middle quicksort the beginning of A recursively quicksort the end of A recursively General Information * O(n log n) performance * Absolute worst case O(n^2) performance --- # Sorting: Counting sort Comparison-based sorts can't be faster than n * log(n). BUT non-comparison based ones can. There are catches, though. public void countingSort(int[] array, int max) { int[] counts = new int[max]; for (int i = 0; i < array.length; i++) { counts[array[i] - 1]++; } int i = 0; for (int j = 0; j < max; j++) { while (counts[j] > 0) { array[i++] = j + 1; } } } This is a prime example of how NOT to implement a hash map. --- # Sorting: In General Anything slower than O(n^2) is not okay. O(n^2) sorting algorithms are generally simpler, easier to code, and require less auxillary space O(n log n) is the best comparison based sorting algorithm performance you can get (proven mathematically). These, however, tend to be more complicated and require more memory. Faster than O(n log n) sorting sorting usually makes at least one huge assumption about our data set (e.g. counting sort) --- # Sorting: An Interview Question You are given an unsorted array of size n. However, you are also told that where each element is in the unsorted array right now is at most k-1 positions away from its final correctly sorted position. You are told that 1 < k < n. What is the best sorting strategy and what is our performance? --- # Sorting: An Interview Question Solution (in Python): import heapq def better_sort(array, k): final_array = [] heap = [] for x in xrange(k): heapq.heappush(heap, array[x]) index = k while heap: final_array.append(heapq.heappop(heap)) if index < len(array): heapq.heappush(heap, array[index]) index += 1 return final_array Performance: O(n log k) --- # Other things to know: * How to code in your interview language * How to use Unix (scripting, regexes) * How a computer works (bits and bytes) * Object-oriented programming --- # Sample Interview Questions https://github.com/adicu/interview_help --- # Other resources * Cracking the Coding Interview * Google / Wikipedia * ICPC * CLRS * https://adicu.com/resources 