E4620 Numerical Methods for Data Analysis

Department of Electrical Engineering, Columbia University

E4620 is typically taught once per year in the Fall semester. The information below is meant to provide a snapshot of the material covered.

Overview

Course description

E4620 is intended to provide students with an introduction to the mathematical and computational foundations of data analysis. Students will learn how to understand data analysis through the lens of linear algebra, specifically through the use of matrix factorization techniques. The course will take a principled approach to addressing the theory and computational complexity of numerical linear algebra algorithms for a range of problems including: data fitting, data classification, clustering, and data reduction. Theory and algorithms will be illustrated using a wide range of engineering applications. The course can be loosely broken down into 4 parts:

  • Fundamentals: vector spaces, geometry, linear independence

  • Matrices: factorizations, spectrum, linear systems, and graphs

  • Least Squares: least squares, regularization, gradient descent, constrained least squares

  • Data Analysis and Reduction: principal component analysis, K-means clustering, and spectral clustering, perceptron algorithm

Lecture slides

Fall’25 update: This is a new course and slides after topic 13 and additional material will be added soon. Slides for the least squares topics are taken from Lieven Vandenberghe's EECE133A class at UCLA.

1. Vector spaces
2.Vector space geometry
3. Linear independence
4. Matrix basics
5. Eigenvalues and vectors
6. Inverse matrices
7. QR decomposition
8. Gram-Schmidt orthogonalization
9. Least squares I: Problem formulation and solution methods
10. Least squares II: Data fitting
11. Least Squares III: Multi-objective problems
12. Singular value decomposition
13. Principal component analysis
14: Fundamental theorem of linear algebra
15: The Randomized SVD
16: Spectral clustering
17: Gradient descent and the perceptron algorithm

Additional material

a. K-means clustering
b. Pseudo-inverses
c. Positive definite matrices and Cholesky factorization

Textbook

There is no official textbook for the class, however the material loosly follows 1--2 below.

1. Introduction to Applied Linear Algebra – Vectors, Matrices, and Least Squares, Boyd & Vandenberghe, Cambridge University Press
2. Numerical Linear Algebra, Trefethen and Bau, SIAM
3.Linear Angebra Done Right, Axler, Open Access
4. Data-Driven Science and Engineering, Brunton & Kuntz, Cambridge University Press

Note: 1 and 3 are freely availablbe from the websites listed.

Prerequisites

Therre are no formal prerequisites for this class beyond teh ability to write short scripts in a high-level scripting language such as Python, Julia, or MATLAB.