Under the barrage of information from the Internet, executives and professionals are more often sorting information than reading what is useful. Computer-generated text summaries may soon help decide what is important and what is not.
Columbia Digital News System
Kathy McKeown, professor and chairman of computer science, has led a collaboration with several of her colleagues to develop a suite of programs that can present updated news summaries. The Columbia Digital News System (CDNS) provides links to live news sources, then summarizes, classifies and presents the new information, which can include text, images, video and structured documents such as databases. The programs seek out news updates on the Internet, summarize information extracted from the news articles, show links to related images and create a taxonomic structure of documents and images.
"With the explosion of available online information, it is becoming ever more difficult to access only information that is needed," said McKeown, who is working with Columbia colleague Shih-Fu Chang on the project. "Automatically generated summaries can aid a person in determining whether or not to read the full text or access the full data set."
McKeown is conducting research in natural language processing and generation in three main areas. The first, summary generation, involves the generation of natural language text, or summaries, from news articles on topics such as terrorism. In the second, statistical natural language, she is using statistical analysis of large texts to identify constraints on how words are used. Such results can be used to create an automated computer dictionary. Finally, she is working on the generation of multimedia explanation, developing techniques to coordinate languagespoken or writtenand graphics.
Reporters covering breaking news events on deadline frequently find they have multiple, often conflicting, sources of information, including wire service reports, newspaper Internet sites, television news and firsthand notes. CDNS can extract salient bits of informationfor example, the death toll in a terrorist attackfrom different sources and present it to the user as updates become available. Professionals such as stock analysts, investors and government regulatory officials will also find continuous news updates useful.
 | An artificial intelligence system called MAGIC gathers selected patient information from a hospital's databases and generates a multimedia display alerting caregivers to a patient's condition | |
One demonstration of the system tracked the World Trade Center bombing and presented maps of the area and images of the scene; another summarized a series of terrorist attacks in Colombia. WebSEEk searches and catalogues the images that accompany the summaries.
The goal of the CDNS summaries is to brief the user on information found in multiple documents retrieved from searches. In a user focus group, journalists told the investigators that these summaries would be helpful in determining the reliability of the information and whether it is worth following up with the original sources.
Unlike other summarizers, CDNS does not extract sentences from the original material. It uses natural language processing techniques, including information extraction techniques developed as part of the Advanced Research Projects Agency message understanding systems, to extract structured information from the documents. That information might include important facts about related terrorist events: where they took place, who the perpetrators are believed to be, whether hostages were taken, etc.
Text generation tools developed by McKeown's group look for similarities and differences across the articles, identify which facts to include in a summary and decide what words to use to convey those facts. Updates tell how perceptions of an event have changed over time.
McKeown expects to develop a multilingual version that would produce an English paragraph that summarizes, for example, a series of Japanese news articles on the same event.
STREAK and PLANDOC
McKeown's research group is developing STREAK, which generates brief descriptions of basketball games and will ultimately produce one-paragraph summaries using box score statistics that place current results in context by making comparisons to previous games. Jointly with Bellcore, McKeown has developed a system, PLANDOC, that creates reports based on the work of telephone planning engineers.
MAGIC
She has collaborated with Steve Feiner, associate professor of computer science, on software that can automatically report the status of patients after they have undergone heart bypass surgery. Now, a resident places a telephone call to brief an intensive care nurse on patient status and anticipated needs upon delivery to an intensive care unit. At Columbia- Presbyterian Medical Center, as at many large academic medical centers, there is a considerable amount of patient information available on-line, but no convenient way to retrieve the information in a timely manner.
Feiner and McKeown, along with Mukesh Dalal, assistant professor of computer science, and Desmond Jordan, associate professor of clinical anesthesiology at Columbia's College of Physicians and Surgeons, have created an artificial intelligence system called MAGIC, which can automatically generate multimedia displays that include text, automated speech and graphics. It gathers selected information quickly and automatically from databases on a hospital's computers and generates coherent visual and audio displays, including spoken text telling caregivers of a patient's condition.
McKeown's web site may be found at http://cs.columbia.edu/~kathy/.
|