IATS-XII Information Technology (IT) Abstracts

IATS XII

Vancouver

15-21 August 2010

IT Panel Homepage

IT Panels:

IATS-IX (2000)
IATS-X (2003)
IATS-XI (2006)
IATS-XII (2010)
IATS-XIII (2013)

General Conference Links:

First Circular
Second Circular
Call for IT Papers

Conference Homepage

Hosted by:

University of British Columbia

Proceedings of the
Tibetan Information Technology Panel
(Paper Abstracts)

Robert R. Chilton
Asian Classics Input Project (ACIP)
“Sorting Unicode Tibetan Data: From Collation Algorithm to Implementation”

Support for collating unicode Tibetan in culturally expected order appeared in Mimer SQL in 2004 and in Microsoft Vista in 2006. Both Mimer and Microsoft implement a "standard" collation schema for Tibetan-script data; however, because of technical differences in the manner of implementation, there are some differences in the sort order produced by these two systems. These two implementations are described and compared. The Common Locale Data Repository (CLDR) information for "standard" Tibetan collation (which serves for both Tibetan and Dzongkha) is presented and explained. A standalone cross-platform applet which implements the Mimer/CLDR collation will be distributed.

Paul G. Hackett
Columbia University
“The Use of yig-cha and chos-kyi-rnam-grangs in Computing Lexical Cohesion for Tibetan Topic Boundary Detection”

To properly implement a simple Tibetan Information Retrieval (IR) system segmentation of one form or another (n-gram, POS-tagging, dictionary substring matching, etc.) must be performed. To take Tibetan indexing to a more sophisticated level however, some form of topic detection must be employed. This paper reports the results of the application to Tibetan of one technique for topic boundary detection: Lexical Cohesion. The resources developed and deployed, the theoretical model used, and its potential applications are discussed.

William A. Magee
Dharma Drum Buddhist College
“Tibetan in the Virtual Classroom: Displaying Complex Scripts in Second Life and OpenSim”

An important part of the "Hopkins Tibetan Treasures Research Archive" (http://haa.ddbc.edu.tw) is the development of Tibetan language-learning systems for use in the virtual environments of Second Life and OpenSim. This paper describes the methodological difficulties encountered and the solutions determined upon in order to bring to completion this innovative aspect of the Hopkins Archive project. Without going into programming details or pedagogical theory, the main technical problem to overcome was the display of complex scripts in the virtual classroom. For purposes of solution grouping, the project pursued two technical paradigms: Fixed-content Language Display and Real-time Input Language Display. This paper describes seven specific tools developed to teach the Tibetan language inside Second Life and OpenSim using Fixed-content Language Display and Real-time Input Language Display solutions. The project also adopted generic designs for writing and displaying languages with complex and unique scripts (such as most Buddhist research languages: Sanskrit, Chinese, Japanese, etc.). Working from the templates created for the virtual Tibetan classroom, scholars of Sanskrit, Chinese, and so forth can readily design their own language locales. The paper concludes with a discussion of virtual classroom desiderata and a call for a consortium of universities to create and host a virtual environment for the teaching of the Buddhist studies research languages in Second Life and OpenSim.

Tashi Tsering
China Center for Tibetology
“Introducing Qomolangma Tibetan Unicode Fonts and Qomolangma Wylie Input Method for Windows Vista and Windows 7”

In the past two years, Tibetan computer experts from CTRC and Tibetan University had designed and created 10 new Unicode Tibetan fonts, which cover all types of Tibetan traditional fonts, including Drutsa and Chuyig. We named the ten fonts as following names, from which we can see the type of the fonts. They are: Qomolangma-Uchen Sarchen, Qomolangma-Uchen Sarchung, Qomolangma-Uchen Sutung, Qomolangma-Uchen Suring, Qomolangma-Drutsa, Qomolangma-Betsu, Qomolangma-Tsuring, Qomolangma-Tsutong, Qomolangma-Tsumachu and Qomolangma-Chuyig. In addition to the font development, the same team also developed a Tibetan input method for Windows Vista and Windows 7 based on Wylie transliteration. This keyboard could be another option for user to type Tibetan in Windows system, beside Microsoft Tibetan keyboard. A presentation about the fonts and the keyboard will be presented in the seminar, and a CD of these font files and the keyboard will be handed out in the seminar for free to use.

Tsering Gya (presenter) and Dbangphyug Tsering
Qinghai Normal University
“Research on a Standard for POS Tagging of Contemporary Tibetan for TIP”

The Provincial Key Laboratory of Tibetan Information Processing and Machine Translation, Qinghai Normal University has devised Part of Speech (POS) Tagging Norms for Tibetan Information Processing (TIP) based on the automatic segmentation and manual analysis of a large-scale Tibetan corpus. In order to achieve the two goals of computer automatic segmentation and tagging of the Tibetan corpus, the process of developing Part of Speech Tagging Norms was to first classify empty words and content words, and then determine the largest category. The major categories were then separated into sub-categories and different depth subclasses.

last updated: 27 May 2012