This is a short post on my experience taking
XCS224n,
or "NLP with Deep Learning," this fall through the Stanford Center for Professional Development (SCPD).
Prior to taking
this course, I didn't find too much information about it online. Here, I will
share what it covers, how it differs from the more well-known
CS224n, and who I think the course would benefit.
What is XCS224n? [Top]
XCS224n is an entirely online version of Stanford University's on-campus course
CS224n, also called "NLP with Deep Learning," taught by
Christopher Manning. Being offered through the SCPD, the target
audience for XCS224n is working professionals with some machine learning experience.
The course cost
$1595 when I took it,
although the SCPD is increasing tuition to
$1750
for all of their "AI Professional Program" courses.
Some fast facts: the course is an in-depth and technical survey of natural language processing using neural networks.
It covers the progression of NLP technology from the first efforts to create word vectors, on to
RNN-based approaches for various tasks (e.g. classification, NER, translation) and the
motivation for attention in RNNs, leading into
the current Transformer-based state of affairs. Along the way, Prof. Manning gives a detailed
review of general deep learning concepts, such as gradient descent, computation graphs, and the backpropagation algorithm.
The course requires comfort with multivariable calculus, probability, and linear algebra.
Basic knowledge of data structures and algorithms
is also helpful for both understanding various course concepts (e.g. topological sort for backprop), as well
as their applications (such as how stacks are used in dependency parsers). All
coding assignments are done in Python, and requires the use of Pytorch and Numpy. The course
provides lectures and tutorials on these libraries if you are not familiar with them; that said,
I would personally advise to get some hands-on exposure to both before taking this course.
The course is conducted entirely online and lasts around 10 weeks. It
is designed to require an average commitment of 10-15 hours per week, which I found to be
more or less correct.
Course materials are delivered through a combination of GitHub and SCPD's own online portal.
The course is conducted asynchronously: students are provided with a series of
CS224n video lectures pre-recorded by Prof. Manning, edited for brevity and split into shorter
sub-videos for ease of navigation. The SCPD portal keeps track of your
progress on each lecture as you move through the curriculum. The lectures can be
watched at any pace; however, there are five assignments with
hard deadlines distributed throughout the 10 weeks. At the end, all students
with a score of 70% or higher receive a certificate from the SCPD acknowledging that they
passed the course.
There is no interaction with students of the on-campus CS224n, nor is there
in-person instruction by Prof. Manning. Each student is assigned to a Course Facilitator (CF),
all of whom are Stanford affilates that have taken CS224n previously. Communication with CFs, course staff,
and fellow students is done via a Slack community maintained by SCPD.
The main difference between XCS224n and CS224n is the final project; the capstone of
CS224n is a final project in which students work either individually or in groups to
apply their learnings from the semester to
a problem of their choosing.
XCS224n does not have a final project.
If this is a dealbreaker for you, there does seem
to be a way of taking CS224n
online.
The cost is in the neighborhood of
$5000 and, unlike XCS224n, grants
Stanford academic credit.
I cannot speak further to this as I didn't take this option.
Isn't all of this available online for free? [Top]
Stanford has made the lecture videos for CS224n
(in fact, both the Winter 2019 and Winter 2021 iterations)
available on YouTube for free. These, in and of themselves, are some of the best
resources available anywhere for getting an introduction to NLP.
Since the assignments are also available on the CS224n website, you could
definitely go through the course curriculum on your own.
I won't go into the broader pros and cons of paid courses versus
self-guided learning; what you prefer would depend on your learning style, personal
circumstances, and professional goals. That said, I can think of several benefits to taking the course
in this instance.
First, having access to the CFs was
by far the best part of the course. One difficulty in self-studying a topic is that when you
have a question, you may not have anyone readily available to give you a precise, reliable answer.
Stack Overflow or Reddit may have some gems, but your mileage will vary. Having someone
available via Slack or email to answer whatever question you may have, especially ones that
are theoretical or research-oriented, vastly enhances the learning experience.
The course staff also organized various events, such as talks by the CFs on
careers in machine learning and issues in machine learning ethics, and a live Q&A session with Prof. Manning.
If you are in the process of switching careers, you may find the opportunity to connect with and
learn from the CFs, Prof. Manning, and fellow students to be valuable.
On a more technical note, Assignments 4 and 5 require the use of GPUs. The SCPD
provides 65 hours of computing credits for Microsoft Azure to enable everyone
to complete the assignments. It's not required to use Azure; you can use whatever
cloud provider you prefer, or go local.
I was able to do the assignments on a local GPU with 6 GB of memory, and I saw
some chatter on the course Slack about students successfully using Apple M1/M2 chips.
But for anyone who needs it, it is nice to have the computing credits and technical support from
the CFs.
In closing this section, I would recommend that anyone who is interested in
XCS224n take advantage of the free lecture videos and assignments beforehand. After watching the
first couple lectures and trying out the first assignment, you can come to a determination
about whether to turn back and review more fundamental ML, keep going on your own,
or enroll in the course. Another consideration is that your employer may cover part or
all of the tuition
as part of employee professional development programs.
More on the Assignments [Top]
There are five graded assignments in the course. They are a mixture of multiple-choice
quizzes,
coding portions that are 100% autograded via Gradescope, and (mostly extra credit)
written portions that are more math-focused. These assignments are, in varying
degrees, condensed versions of those in CS224n. The topics of the assignments are:
- Exploring Word Embeddings
- Understanding and Implementing Word2Vec
- Neural Transition-based Dependency Parsing
- Neural Machine Translation with RNNs
- Self-attention, Transformers, and Pretraining
Assignments 1 and 2 cover various techniques for training word vectors
from large text corpora. Assignment 3 deals with dependency
parsing, which is the problem of assigning directional relationships to
the words of a sentence that collectively capture the grammatical structure and
intended meaning of that sentence. Assignment 4 involves implementing and
training an LSTM-based model to translate from Cherokee to English, and Assignment 5 applies
a generative Transformer model to a task involving real-world knowledge.
Overall, I was happy with the pacing and difficulty level of the assignments.
The coding portions were balanced between writing data preprocessing code,
and implementing models and training pipelines in Pytorch
according to a spec to perform the actual tasks.
The written portions were thought-provoking and provided
mathematical rigor to concepts that are often introduced in
more handwavy ways.
Because this course skips the final project, the assignments only cover about half of the curriculum.
Subsequent modules delve deeper into applications of the Transformer
architecture, along with techniques for training, improving, and evaluating models in NLP.
Conclusion [Top]
I had a good experience with this course, and would recommend it to anyone
looking for an efficient but thorough introduction to (or review of) modern NLP.
The decision to pursue the course through the SCPD versus self-study comes down
to the need for guidance from the CFs and fellow students,
financial considerations, and professional considerations such as certification and networking.
I think that the SCPD option gets you through the curriculum faster.
Either way, this course will leave you prepared to apply existing
NLP models to real-world problems, develop your own models,
and follow along with the latest research.