Computer Science E6998 section 009, Semantic Technologies in IBM WatsonTM

Spring 2013

 

Time:

Friday 2:15-4 pm

Location:

627 Mudd

 

 

 

 

Professor:

Alfio Massimiliano Gliozzo

Office Hours:

Friday 4-6pm (by appointment)

Email:

ag3366 [at] columbia.edu

 Phone:

 

Office:

726 CEPSR

 

 

 

 

 

 

Teacher Assistant:

Or Biran

Phone:

 

Email:

orb [at] cs.columbia.edu

Office Hours:

Wed 2-4pm

Office:

702 CEPSR

 

 

 

 

Macintosh HD:Users:gliozzo:Desktop:Screen Shot 2013-01-15 at 10.29.55 AM.png

 

 

Course Description

IBM Watson is ushering in a new era of computing, cognitive systems.  Based on natural language processing, generate hypothesis, algorithmically test possible responses, navigate Big Data, deliver evidence based insights, and learn through iterations and outcomes, this new class of computing  behaves more like the world’s most sophisticated computer—the human brain.  Since its historic debut on Jeopardy!, IBM Watson has been put to work in healthcare and financial services helping to transform these industries. 

This course covers background concepts for working with IBM Watson and semantic technology in general, highlighting the current research problems and describing existing solutions. Students will gain an appreciation for what it’s like to work in one of the most advanced software research environments in the world. They will have the opportunity of learning from the developers of IBM Watson, providing their own insights, ideas and solutions to problems.

This is a seminar style class that will be taught directly from the voice of the developers of IBM Watson. Guest speakers from the IBM Watson team will be presenting their research areas. 

Students will be required to perform a research project on areas of interest for the Watson Technology team, contributing to the advancement of the State of the Art in the field. 

 

Textbook

 

 

 

Selected papers from IBM Journal of Research and Development

Volume: 56 , Issue: 3.4 “This is Watson”.

 

 

Requirements

 

Students will design and carry out a research project. A list of possible projects will be provided by the professor, but students may also propose projects of their own, provided they are approved by the professor. Throughout the course, students will submit incremental versions of their project. There will be no midterms or finals.

 

Research projects will be assigned in the area of Natural Language Processing, Machine Learning and Information Retrieval, with particular focus on developing components and techniques that can be potentially beneficial for the IBM Watson technology. They will involve the description of the state of the art in the selected task, the identification of an innovative solution to the given problem, coding UIMA based text analytics to implement the proposed solutions and evaluating the technique in benchmark tasks.

 

All students are required to have a Computer Science Account for this class. To sign up for one, go to the CRF website and then click on "Apply for an Account".

 

 

Syllabus

 

Date

Topic

Speaker

Reading   (* means optional)

Jan 25th

Introduction: The JEOPARDY! Challenge

Alfio Gliozzo

1.      Special Questions and techniques *

2.      Simulation, learning, and optimization techniques in Watson's game strategies *

3.      In the game: The interface between Watson and Jeopardy! *

4.      Jeopardy! historical data *

Feb 1st

The Deep QA architecture

Alfio Gliozzo

1.      Textual resource acquisition and engineering

2.      Introduction to “This is Watson”

Feb 8th

The Deep QA architecture

Natural Language Processing Background

Alfio Gliozzo

1.      Finding needles in the haystack: Search and candidate generation

2.      Question analysis: How Watson reads a clue

Feb 15th

Natural Language Processing in Watson

Alfio Gliozzo

1.      Textual evidence gathering and analysis

2.      Deep parsing in Watson

Feb 22nd

Knowledge representation Background

Structured Knowledge in Watson (basic)

Semantic Web

Alfio Gliozzo

1.      Typing candidate answers using type coercion

2.      Structured data and inference in DeepQA

3.      Automatic knowledge extraction from documents

4.      http://www.w3.org/RDF/ *

5.      http://www.w3.org/2004/OWL/ *

Mar 1st

Domain Adaptation

Alfio Gliozzo

1.      Domain Adaptation

Mar 8th

UIMA

Siddharth Patwardhan

1.      UIMA overview and setup *

2.      UIMA tutorials and users guides *

3.      UIMA tools *

4.      UIMA references *

5.      UIMA async scaleout *

Mar 15th

UIMA (hands on)

Siddharth Patwardhan

 

1.      Making Watson fast

 

Recommended: bring a laptop to class. Make sure Java and Eclipse are installed.  (if you have never used Eclipse, go over an online tutotial such as this one).

 

Mar  22nd

SPRING BREAK

 

 

Mar  29th

Midterm Student Workshop

 

Apr 5th

Distributional Semantics

Alfio Gliozzo

1.      From Distributional to Contextual Similarity

2.      Using Distributional Similarity for Lexical Expansion in Knowledge-based Word Sense Disambiguation

3.      Semantic Domains in Computational Linguistics

4.      http://www.machinelinking.com/ *

5.      www.jobimtext.org *

Apr 12nd

Distributional Semantics I

Alfio Gliozzo

 

Apr 19th

Machine Learning  and Strategy in Watson

David Gondek

Apr 26th

Advanced Semantic Analysis, Sources Linked Data and Text, Tycor, Answer Lookup, Evidence Diffusion, Semantic Technologies (vision) Crowdsourcing, Information Extraction

Chris Welty

 

May 17th

Final Student Workshop

 

 

 

Slides from the classes are available on courseworks (if you are auditing the class, contact Or to get access)

 

Links to Resources

 

§  Deep QA publications website

§  Videos on Watson

§  Apache UIMA

 

 

Prerequisites

 

Students must have taken one of Artificial Intelligence, Natural Language Processing, Machine Learning or Search Engine Technology as a pre-requisite.

 

 

About the instructor

 

http://www.icsi.berkeley.edu/icsi/sites/default/files/gliozzo.jpg

Alfio Gliozzo is a research staff member at the IBM T.J. Watson Research Center. He is currently a technical leader on the Dee pQA team, coordinating a research team focused on unsupervised learning from text. At the same time, he is a key contributor of the Watson core technology for domain adaptation. He has been involved in both academic research and industry for 12 years, achieving a significant track record in delivering semantic technologies across different applications, patents and scientific publications.

 

 

Guest lecturers will include

 

Eric Brown

David Gondek

Chris Welty

Siddharth Patwardhan

 

 

Paper

 

We’ve published a paper on this course! 

 

Alfio Gliozzo, Or Biran, Siddharth Patwardhan and Kathleen McKeown. Semantic Technologies in IBM WatsonTM. To appear in the proceedings of the Workshop on Teaching NLP/CL (TNLP) at the Association of Computational Linguistics (ACL). 2013. (pdf)