Zezhou Huang

You can also call me Zachary

I'm currently a Ph.D student at Columbia University advised by Eugene Wu.

My research areas are DBMS, Machine Learning, Data Integration, Data Wrangling, HCI.

I'm currently building a factorized DBMS (database management system) to manage Database with Large Join Graph and efficiently execute semiring aggregation queries, which are at the heart of data analytics (e.g. pearson correlation, PCA, SVD...) and machine learning tasks (e.g. linear regression, support vector machine, regression tree...).

The key insight is to apply theoretic ideas from PGM (Probalistic Graphical Model) to DBMS including variable elimination, greedy ordering and junction tree. There is a strong mapping between inference tasks in PGM and aggregation queries in DBMS:

Our system is up to three orders of magnitudes faster than PostgreSQL over TPC-DS dataset. We find that these theoretic ideas have a wide applications for analytics in DBMS, and we are continuously looking for applications that solve more real-world problems.


Publication:


Research Projects:


Personal Projects: