Data Structures for Interviews

class: center, middle

# Data Structures for Interviews

2014-10-22

Justin Zhao, ADI

Adapted from Zack Newman

adicu.com

---

# Expectations for this talk

I assume familiarity with Java (if you know some other language, you can probably
figure it out). I didn't check all of my code.

The is **not** a replacement for 3134. It should be a good preview that gets you
ready for some basic interviews.

PLEASE stop me if I say something that doesn't make sense.

> He who asks is a fool for five minutes, but he who does not ask remains a fool forever.
> .right[*Chinese Proverb*]

---

# Gameplan

* Review of Big O

* Data Structures

* Sorting Algorithms

* Sample Interview Questions

---

# What Will Be Covered

* Functionality, operations, applicability of data structures

* Overview of important data structure algorithms and performance

* Sample interview questions and how to tackle them with data structures

---

# What Will Not Be Covered

* The answers to every interview question

* Best interview strategies (interview practice)

* Implementation details (3134)

> It's probably good for you to understand details of implementation, but not necessary for most interviews

---

# Review of Big O

* "Big O" is the "asymptotic runtime"

* Expressed as function of the size of the inputs

* Consider best, worst, and average cases

![Big O](http://i.stack.imgur.com/WcBRI.png)

---

# Review of Big O

/**
     * Return the index of the minimum element in an integer array.
     */
    public static int findMin(int[] array) {
        int minIndex = 0;
        for (int i = 0; i < array.length; i++) {
            if (array[i] < array[minIndex]) {
                minIndex = i;
            }
        }
    }

---

# Review of Big O

/**
     * Return the index of the minimum element in an integer array.
     */
    public static int findMin(int[] array) {
        int minIndex = 0; // O(1)
        for (int i = 0; i < array.length; i++) { // n times
            if (array[i] < array[minIndex]) { // O(1)
                minIndex = i; // O(1)
            }
        }
    } // O(1) + n (O(1) + O(1)) = O(1) + n O(1) = O(1) + O(n) = O(n)

---

# Review of Big O

/**
     * Return the index of array (which must be sorted) at which value can be
     * found, or -1 if it is absent.
     */
    public static int find(int[] array, int value) {
        return findHelper(array, 0, array.length - 1, value);
    } // O(lg n)

private static int findHelper(int[] array, int minIndex, int maxIndex,
            int value) {
        if (maxIndex < minIndex) {
            return -1;
        }

int index = (maxIndex + minIndex) / 2;
        if (value < array[index]) {
            return findHelper(array, minIndex, index, value);
        } else if (value > array[index]) {
            return findHelper(array, index + 1, maxIndex, value);
        } else {
            return index;
        }
    }

---

# Review of Big O

Which performance is better?

`O(10000 n)` vs. `O(n)`

`O(log n)` vs. `O(log(log n))`

`O(n log n)` vs. `O(n log k)`

`O(0.000001 n^2)` vs. `O(9999999 n^1.999999)`

`O(2^n)` vs. `O(3^n)`

`O(n!)` vs. `O(n^n)`

---

# Abstract Data Type (ADT)

* What is a "data structure" anyway? We define ADTs first

* An interface for interacting with data

* lists, stacks, sets, queues, maps, trees, priority queue etc.

* Defines operations and results, but not how they're implemented

---

# Data Structures

* Data Structure is an implementation of ADT

* For example, an "ArrayList" or "HashMap"

* Many ADTs can be implemented as the same Data Structure.

---

# Data Structure Primitives

## Arrays

int[] array = {1, 3, 5, 2, 6, 9};

/*
       * Index: 0 1 2 3 4 5
       * Value: 1 3 5 2 6 9
       */

---

# Data Structure Primitives

## Linked lists

![Linked List](http://upload.wikimedia.org/wikipedia/commons/1/1b/C_language_linked_list.png)

---

# Data Structure Primitives

## Linked lists

// Attributes
      public class Node {
          public int value;
          public Node next;

public Node(int value, Node next) {
              this.value = value;
              this.next = next;
          }
      }

// Methods
      public interface MyList {
          public int get(int index);

public void update(int index, int value);

public void append(int value);

public String toString();
      }

---

# List Performance

<table>
  <th>
    <td>Array</td> <td>List</td>
  </th>
  <tr>
    <td>Access</td>    <td>O(1)</td>  <td>O(n)</td>
  </tr>
  <tr>
    <td>Update</td>    <td>O(1)</td>  <td>O(n)</td>
  </tr>
  <tr>
    <td>Append</td>    <td>O(1)/O(n)</td>  <td>O(1)</td>
  </tr>
  <tr>
    <td>Traversal</td> <td>O(n)</td>  <td>O(n)</td>
  </tr>
</table>

Always think about the performance of the data structure

How about an "arraylist"?

---

# Lists: Standard Library

import java.util.LinkedList;
        import java.util.ArrayList;
        import java.util.List;

// ...

List<Integer> a = new ArrayList<Integer>();
        List<Integer> b = new LinkedList<Integer>();

for (int i = 0; i < 10; i++) {
            a.add(i);
            b.add(i);
        }
        a.set(5, 0);
        b.remove(5);
        System.out.println(a); // [0, 1, 2, 3, 4, 0, 6, 7, 8, 9]
        System.out.println(b); // [0, 1, 2, 3, 4, 6, 7, 8, 9]

---

# Linked list: Question

How do you reverse a linked list?

---

# Linked list: Question

public static Node reverse(Node head) {
          Node prev = null;
          Node curr = head;
          Node next;

while (curr != null ) {
              next = curr.next;
              curr.next = prev;
              prev = curr;
              curr = next;
          }
          return prev;
      }

---

# Stacks

![Stack](http://www.cs.cmu.edu/~mrmiller/15-121/Homework/hw8/stack.png)

---

# Stacks

// Last-in first-out data structure
    public interface MyStack {
        // Add a value to the stack
        public void push(int value);

// Remove a value from the stack
        public int pop();

// See the value on the "top" of the stack (next to be removed)
        public int peek();
    }

MyStack a = ...; // [ ]
    a.push(1); // [ 1 ]
    a.push(2); // [ 2 1 ]
    a.peek(); // returns 2
    a.push(3); // [ 3 2 1 ]
    a.pop(); // [ 2 1 ], returns 3
    a.push(4); // [ 4 2 1 ]
    a.peek(); // returns 4

---

# Stacks

<table>
  <th>
    <td>Array</td> <td>List</td>
  </th>
  <tr>
    <td>Push</td>      <td>O(1)/O(n)</td>  <td>O(1)</td>
  </tr>
  <tr>
    <td>Pop</td>       <td>O(1)</td>  <td>O(1)</td>
  </tr>
  <tr>
    <td>Peek</td>      <td>O(1)</td>  <td>O(1)</td>
  </tr>
</table>

---

# Stacks: Standard Library

Stack<Integer> a = new Stack<Integer>();

a.push(1);
        a.push(2);
        System.out.println(a.pop()); // 2
        a.push(3);
        System.out.println(a); // [1, 3]

---

# Stacks: Question

Write a function to determine if a string consisting of the characters '{', '}',
'[', and ']' is balanced.

For example, "{[]}" is balanced, and "{[}]" is not.

---

# Stacks: Question

public static boolean isBalanced(String braces) {
        Stack<Character> stack = new Stack<Character>();
        for (int i = 0; i < braces.length(); i++) {
            switch (braces.charAt(i)) {
                case '{': stack.push('{');
                    break;
                case '[': stack.push('[');
                    break;
                case '}': if (stack.pop() != '{') { return false; }
                    break;
                case ']': if (stack.pop() != '[') { return false; }
                    break;
            }
        }
        return stack.isEmpty();
    }

---

# Queues

![Queue](http://www.mmrelearning.ca/pluginfile.php/8007/mod_page/content/1/Unit_7_Linked_Lists/queue.jpg)

---

# Queues

// First-in first-out data structure
    public interface MyQueue {
        // Add a value to the back of the queue
        public void enqueue(int value);

// Remove a value from front of the queue
        public int dequeue();

// See the value on the "front" of the queue (next to be removed)
        public int peek();
    }

MyStack a = ...; // [ ]
    a.enqueue(1); // [ 1 ]
    a.enqueue(2); // [ 1 2 ]
    a.peek(); // returns 1
    a.enqueue(3); // [ 1 2 3 ]
    a.dequeue(); // [ 2 3 ], returns 1
    a.enqueue(4); // [ 2 3 4 ]
    a.peek(); // returns 2

---

# Queues

<table>
  <th>
    <td>Array</td> <td>List</td>
  </th>
  <tr>
    <td>Enqueue</td>   <td>O(1)/O(n)</td>  <td>O(1)</td>
  </tr>
  <tr>
    <td>Dequeue</td>   <td>O(1)</td>  <td>O(1)</td>
  </tr>
  <tr>
    <td>Peek</td>      <td>O(1)</td>  <td>O(1)</td>
  </tr>
</table>

---

# Queues: Standard Library

import java.util.Queue;
      import java.util.ArrayDeque;

Queue<String> stack = new ArrayDeque<String>();

---

# Queues: Question

Given one queue and one stack, how do you reverse the stack?

---

# Queues: Question

Stack<Integer> stack = new Stack<Integer>();
        Queue<Integer> queue = new ArrayDeque<Integer>();

stack.push(1);
        stack.push(2);
        stack.push(3);
        System.out.println(stack); // [1, 2, 3]
        while (!stack.isEmpty()) {
            queue.add(stack.pop());
        }
        while (!queue.isEmpty()) {
            stack.push(queue.remove());
        }
        System.out.println(stack); // [3, 2, 1]

---

# Hash Maps

An absolutely essential concept.

Supported operations:

* insert()
* delete()
* find()

Why is it so impressive? All of these operations can be done in constant O(1) time.

---

# Hash Maps

Think of a hash map like a really really big array.

array[0] = 1 // insert(): O(1)
    array[1] = null // delete(): O(1)
    x = array[2] // find(): O(1)

An array that, instead of integer indexes, has indexes that can be anything.

map["0"] -> 1
    map["Adam Cannon"] -> 1004
    map["BullsAndCows"] -> 9001
    map["thisisareallylongstringthatistoocoolomgomgomgomgomgomgadiadiadilove"] -> 0

Insert, find, and delete are all O(1) now. That's it?

---

# Hash Maps

No. :( An infinite size array for infinite possible indexes is unreasonable.

As it turns out, however, there's a way to simplify anything (String, object, integer) into an integer

This is called the **Hash function**: maps an object (of some type) to an integer.

Example:

x % 10027

---

# Hash Maps

**Hash function**: maps an object (of some type) to an integer.

Example:

public int hash(String s) {
        int hashVal = 0;
        for (int i = 0; i < s.length(); i++) {
            hashVal = s.charAt(i) + hashVal * 37;
        }
        return hashVal;
    }

---

# Hash Maps

For insert():

1. Take any thing `X` that maps to `Y`
2. Simplify `X` into an integer `i` using a hash function
3. Set `array[i] = Y`

For find():

1. Take any thing `X` that you want to find `Y` for.
2. Simplify `X` into an integer `i` using the same hash function
3. Set `Y = array[i]`

---

# Hash Maps

This is awesome!

Well, there are some problems with hash maps

* Requires a lot of extra memory and a good hash function to avoid collisions
* What is a "good" hash function?
* Finding max or min is kind of difficult

But in general: excellent for what it can do.

---

# Hash Maps: Standard library

import java.util.HashSet;
    import java.util.HashMap;

// ...

HashSet<String> set = new HashSet<String>();
    set.add("dog");
    set.add("cat");
    set.add("fish");
    System.out.println(set.contains("dog")); // true
    System.out.println(set.contains("horse")); // false

HashMap<String, String> map = new HashMap<String, String>();
    map.put("Jenny", "867-5309");
    System.out.println(map.get("Jenny")); // 867-5309

---

# Hash Maps: Question

Implement a word counter (not case sensitive, and ignoring special characters).

Example:

"Gabba gabba hey, gabba gabba hey!" -> { "gabba": 4, "hey": 2 }

---

# Hash Maps: Question

import java.util.HashMap;

public static HashMap<String, Integer> countWords(String document) {
        HashMap<String, Integer> counts = new HashMap<String, Integer>();
        for (String word : document.split(" ")) {
            String key = word.toLowerCase().replaceAll("[^a-zA-Z]", "");
            if (counts.containsKey(key)) {
                counts.put(key, counts.get(key) + 1);
            } else {
                counts.put(key, 1);
            }
        }
        return counts;
    }

---

# Other Hash-y Things

Hashtables, HashSets, Dictionaries (Python)

Slight variations, but all revolve around the same idea of index simplification via a hash function for constant time insert, delete, and find.

If you are really interested, lookup how to resolve hash map collisions or textbook hash functions

---

# Trees

![Tree](http://upload.wikimedia.org/wikipedia/commons/f/f7/Binary_tree.svg)

---

# Trees

public class Node {
        public int value;
        public Node left;
        public Node right;

public Node(int value, Node left, Node right) {
            this.value = value;
            this.left = left;
            this.right = right;
        }

public Node(int value) {
            this(value, null, null);
        }
    }

---

# Trees

These are **mad** useful. Can represent hierarchical data: parse trees in NLP, database indexes, etc.

Fast Vocabulary

* Root – The top node in a tree.
* Parent – The converse notion of child.
* Siblings – Nodes with the same parent.
* Descendant – a node reachable by repeated proceeding from parent to child.
* Ancestor – a node reachable by repeated proceeding from child to parent.
* Leaf – a node with no children.
* Edge – connection between one node to another.
* Path – a sequence of nodes and edges connecting a node with a descendant.
* Level – The level of a node is defined by 1 + the number of connections between the node and the root.
* Height – The height of a node is the number of edges on the longest downward path between the root and a leaf.

---

# Trees: Binary Search Tree

**Binary Search Tree**: every value in the left subtree of a node with a value is less than that value; every value in the right subtree of a node with a value is greater than that value.

This gives us O(log(n)) retrieval (in the average but not worst case; more on that later).

---

# Trees: Binary Search Tree

![BST](http://upload.wikimedia.org/wikipedia/commons/d/da/Binary_search_tree.svg)

Before we move further...

---

# Recursion

public int fib(int n) {
          if (n < 2) {
              return 1;
          } else {
              return fib(n - 1) + fib(n - 2);
          }
      }

---

# Trees: Binary Search Tree

public boolean find(Node root, int value) {
        if (root == null) {
            return false;
        }
        if (value < root.value) {
            return find(root.left, value);
        } else if (value > root.value) {
            return find(root.right, value);
        } else {
            return true;
        }
    }

---

# Trees: Binary Search Tree

Worst Case: Unbalanced Tree

![BST worst](http://epaperpress.com/sortsearch/images/fig33.gif)

---

# Trees: Traversal

We can traverse the tree in one of a few ways.

## Pre-Order (XLR)

![preorder](http://upload.wikimedia.org/wikipedia/commons/d/d4/Sorted_binary_tree_preorder.svg)

FBADCEGIH

---

# Trees: Traversal

We can traverse the tree in one of a few ways.

## In-Order (LXR)

![inorder](http://upload.wikimedia.org/wikipedia/commons/7/77/Sorted_binary_tree_inorder.svg)

ABCDEFGHI

---

# Trees: Traversal

We can traverse the tree in one of a few ways.

## Post-Order (LRX)

![postorder](http://upload.wikimedia.org/wikipedia/commons/9/9d/Sorted_binary_tree_postorder.svg)

ACEDBHIGF

---

# Trees: Question

How do we do a level-order traversal of a tree?

![levelorder](http://upload.wikimedia.org/wikipedia/commons/d/d1/Sorted_binary_tree_breadth-first_traversal.svg)

---

# Trees: Question

public void levelOrder(Node root) {
        Queue<Node> queue = new Queue<Node>();
        queue.add(root);
        while (!queue.isEmpty()) {
            Node curr = queue.remove();
            System.out.println(curr.value);
            if (curr.left != null) {
                queue.add(curr.left);
            }
            if (curr.right != null) {
                queue.add(curr.right);
            }
        }
    }

---

# Trees: Heap

A Priority Queue is an ADT is a list where each element has a "priority" associated with it.

Operations:

* `insert_with_priority()`
* `pull_highest_priority_element()`

---

# Trees: Heap

The maximally efficient way to implement this ADT is with a heap (`O(log n)` performance for both operations)

Heap Property: Parents are always more important than their children

![heap](http://upload.wikimedia.org/wikipedia/commons/3/38/Max-Heap.svg)

---

# Trees: Heap

To add an element: insert and reorder the tree (`O(log n)`)

To pull the highest priority element: pop the root out and reorder the tree (`O(log n)`)

---

# More About Trees

AVL Trees (Trees that always stay balanced using rotation)

Red-Black Trees (Trees that always stay balanced using colors (fun!))

B-Trees (Trees designed for lots of data)

---
# Graphs

ADT consisting of Nodes and Edges

![heap](http://upload.wikimedia.org/wikipedia/commons/a/a2/Directed.svg)

---

# Graphs

Basic operations:

* `adjacent(G, x, y)`: tests whether there is an edge from node x to node y.
* `neighbors(G, x)`: lists all nodes y such that there is an edge from x to y.
* `add(G, x, y)`: adds to G the edge from x to y, if it is not there.
* `delete(G, x, y)`: removes the edge from x to y, if it is there.
* `get_node_value(G, x)`: returns the value associated with the node x.
* `set_node_value(G, x, a)`: sets the value associated with the node x to a.
* `get_edge_value(G, x, y)`: returns the value associated to the edge (x,y).
* `set_edge_value(G, x, y, v)`: sets the value associated to the edge (x,y) to v.

---

# Graphs: Implementation

One way is to use an "adjacency list": We have a list of nodes and every node stores a list of adjacent nodes.

public class Node {
      public int value;
      public ArrayList<Edges> edges;
    }

public class Edge {
      public Node destination;
      public int weight;
    }

public class Graph {
      public ArrayList<Node> nodes;
    }

---

# Graphs: Performance

Say we have V nodes and E edges.

What is the performance for these operations?

* Add vertex
* Add edge
* Remove vertex
* Remove edge

Depends on implementation, but this is something you look up.

---
# Graphs: Searching

Problem: Given a node, can we reach this other node?

Search!

---
# Graphs: Depth First Search (DFS)

bool search(Node root, Node dest) {
      if (root.value == dest.value)
        return true;
      root.visited = true;
      for (Node n : root.adjacent) {
        if (!n.visited) {
          if (search(n, dest))
            return true;
        }
      }
      return false;
    }

---
# Graphs: Breadth First Search (BFS)

bool search(Node root, Node dest) {
      Queue q = new Queue();
      if (root.value == dest.value)
        return true;
      root.visited = true;
      q.enqueue(root);
      while (!q.isEmpty()) {
        Node r = q.dequeue();
        for (Node n in r.adjacent) {
          if (!n.visited) {
            if (search(n, dest))
              return true;
            queue.enqueue(n);
          }
        }
      }
      return false;
    }

---
# Graphs: More

Interesting Graph Problems (wiki):

* Djikstra's
* Floyd-Warshall
* Ford-Fulkerson
* Kruskal's

---

# Sorting

The problem: given an array containing many values, permute the array so that
those values are in order.

You'll definitely be expected to have a high-level understanding of these and
know the runtimes. Maybe you'll have to implement one.

---

# Sorting

## A common tool: swapping

private static void swap(int[] array, int i, int j) {
        int tmp = array[i];
        array[i] = array[j];
        array[j] = tmp;
    }

---

# Sorting: Selection Sort

![Selection Numbers](http://upload.wikimedia.org/wikipedia/commons/9/94/Selection-Sort-Animation.gif)

Basically, find the minimum element n times.

---

# Sorting: Selection Sort

public static void selectionSort(int array[]) {
        for (int i = 0; i < array.length; i++) {
            int minIndex = i;
            for (int j = i; j < array.length; j++) {
                if (array[j] < array[minIndex]) {
                    minIndex = j;
                }
            }
            swap(array, i, minIndex);
        }
    }

General Information

* O(n^2) performance
* O(1) space

---

# Sorting: Bubble Sort

![Bubble Numbers](http://upload.wikimedia.org/wikipedia/commons/c/c8/Bubble-sort-example-300px.gif)

The maximum elements bubble to the top

---

# Sorting: Bubble Sort

public static void bubbleSort(int array[]) {
         for (int i = 0; i < array.length; i++) {
            for (int j = 0; j < array.length - 1; j++) {
                if (array[j] > array[j + 1]) {
                    swap(array, j, j + 1);
                }
            }
         }
     }

General information

* O(n^2) performance
* O(1) space

---

# Sorting: Merge sort

![Merge Numbers](http://upload.wikimedia.org/wikipedia/commons/c/cc/Merge-sort-example-300px.gif)

1. Split the array in half and sort each subarray.
2. Weave the arrays back together in one fluid pass.

---

# Sorting: Merge sort

Here's where I'm going to stop implementing things and just explain them.
Wikipedia is great.

Merge sort is recursive.

function mergeSort(A):
        split A into A_beginning and A_end at the middle
        mergeSort(A_beginning)
        mergeSort(A_end)
        return merge(A_beginning, A_end)

General Information

* O(n log n) performance
* O(n) space

---

# Sorting: Quicksort

![Quicksort Numbers](http://upload.wikimedia.org/wikipedia/commons/9/9c/Quicksort-example.gif)

"Divide and Conquer"

---

# Sorting: Quicksort

function quickSort(A):
        pick a "pivot" element in A
        move all elements less than the pivot to the beginning of A
        move all elements greater than the pivot to the end of A
        put the pivot in the middle
        quicksort the beginning of A recursively
        quicksort the end of A recursively

General Information

* O(n log n) performance
* Absolute worst case O(n^2) performance

---

# Sorting: Counting sort

Comparison-based sorts can't be faster than n * log(n). BUT non-comparison based
ones can. There are catches, though.

public void countingSort(int[] array, int max) {
        int[] counts = new int[max];
        for (int i = 0; i < array.length; i++) {
            counts[array[i] - 1]++;
        }
        int i = 0;
        for (int j = 0; j < max; j++) {
            while (counts[j] > 0) {
                array[i++] = j + 1;
            }
        }
    }

This is a prime example of how NOT to implement a hash map.

---

# Sorting: In General

Anything slower than O(n^2) is not okay.

O(n^2) sorting algorithms are generally simpler, easier to code, and require less auxillary space

O(n log n) is the best comparison based sorting algorithm performance you can get (proven mathematically). These, however, tend to be more complicated and require more memory.

Faster than O(n log n) sorting sorting usually makes at least one huge assumption about our data set (e.g. counting sort)

---

# Sorting: An Interview Question

You are given an unsorted array of size n. However, you are also told that where each element is in the unsorted array right now is at most k-1 positions away from its final correctly sorted position. You are told that 1 < k < n.

What is the best sorting strategy and what is our performance?

---

# Sorting: An Interview Question

Solution (in Python):

import heapq

def better_sort(array, k):
        final_array = []
        heap = []
        for x in xrange(k):
            heapq.heappush(heap, array[x])
        index = k
        while heap:
            final_array.append(heapq.heappop(heap))
            if index < len(array):
                heapq.heappush(heap, array[index])
                index += 1
        return final_array

Performance: O(n log k)

---

# Other things to know:

* How to code in your interview language

* How to use Unix (scripting, regexes)

* How a computer works (bits and bytes)

* Object-oriented programming

---

# Sample Interview Questions

https://github.com/adicu/interview_help

---

# Other resources

* Cracking the Coding Interview

* Google / Wikipedia

* ICPC

* CLRS

* https://adicu.com/resources

![ADI](http://adicu.com/img/logo.png)