|
|
Proceedings of the
Tibetan Information Technology Panel
(Paper Abstracts)
Robert
R. Chilton
Asian Classics Input Project (ACIP)
Sorting Unicode Tibetan Data: From Collation Algorithm to Implementation
Support for collating unicode Tibetan in culturally expected order appeared in Mimer SQL in 2004 and in Microsoft Vista in 2006. Both Mimer and Microsoft implement a "standard" collation schema for Tibetan-script data; however, because of technical differences in the manner of implementation, there are some differences in the sort order produced by these two systems. These two implementations are described and compared. The Common Locale Data Repository (CLDR) information for "standard" Tibetan collation (which serves for both Tibetan and Dzongkha) is presented and explained. A standalone cross-platform applet which implements the Mimer/CLDR collation will be distributed.
Paul
G. Hackett
Columbia University
The Use of yig-cha and chos-kyi-rnam-grangs in Computing Lexical Cohesion for Tibetan Topic Boundary Detection
To properly implement a simple Tibetan Information Retrieval (IR) system segmentation of one form or another (n-gram, POS-tagging, dictionary substring matching, etc.) must be performed. To take Tibetan indexing to a more sophisticated level however, some form of topic detection must be employed. This paper reports the results of the application to Tibetan of one technique for topic boundary detection: Lexical Cohesion. The resources developed and deployed, the theoretical model used, and its potential applications are discussed.
William A. Magee
Dharma Drum Buddhist College
Tibetan in the Virtual Classroom: Displaying Complex Scripts in Second Life and OpenSim
An important part of the "Hopkins Tibetan Treasures Research Archive" (http://haa.ddbc.edu.tw) is the
development of Tibetan language-learning systems for use in the virtual environments of Second Life
and OpenSim. This paper describes the methodological difficulties encountered and the solutions
determined upon in order to bring to completion this innovative aspect of the Hopkins Archive project.
Without going into programming details or pedagogical theory, the main technical problem to
overcome was the display of complex scripts in the virtual classroom. For purposes of solution grouping,
the project pursued two technical paradigms: Fixed-content Language Display and Real-time
Input Language Display. This paper describes seven specific tools developed to teach the Tibetan
language inside Second Life and OpenSim using Fixed-content Language Display and Real-time Input
Language Display solutions. The project also adopted generic designs for writing and displaying
languages with complex and unique scripts (such as most Buddhist research languages: Sanskrit,
Chinese, Japanese, etc.). Working from the templates created for the virtual Tibetan classroom,
scholars of Sanskrit, Chinese, and so forth can readily design their own language locales. The paper
concludes with a discussion of virtual classroom desiderata and a call for a consortium of universities
to create and host a virtual environment for the teaching of the Buddhist studies research languages in
Second Life and OpenSim.
Tashi
Tsering
China Center for Tibetology
Introducing Qomolangma Tibetan Unicode Fonts and Qomolangma Wylie Input Method for Windows Vista and Windows 7
In the past two years, Tibetan computer experts from CTRC and Tibetan University had designed and created 10 new Unicode Tibetan fonts, which cover all types of Tibetan traditional fonts, including Drutsa and Chuyig. We named the ten fonts as following names, from which we can see the type of the fonts. They are: Qomolangma-Uchen Sarchen, Qomolangma-Uchen Sarchung, Qomolangma-Uchen Sutung, Qomolangma-Uchen Suring, Qomolangma-Drutsa, Qomolangma-Betsu, Qomolangma-Tsuring, Qomolangma-Tsutong, Qomolangma-Tsumachu and Qomolangma-Chuyig. In addition to the font development, the same team also developed a Tibetan input method for Windows Vista and Windows 7 based on Wylie transliteration. This keyboard could be another option for user to type Tibetan in Windows system, beside Microsoft Tibetan keyboard. A presentation about the fonts and the keyboard will be presented in the seminar, and a CD of these font files and the keyboard will be handed out in the seminar for free to use.
Tsering Gya (presenter) and Dbangphyug Tsering
Qinghai Normal University
Research on a Standard for POS Tagging of Contemporary Tibetan for TIP
The Provincial Key Laboratory of Tibetan Information Processing and Machine Translation, Qinghai Normal University has devised Part of Speech (POS) Tagging Norms for Tibetan Information Processing (TIP) based on the automatic segmentation and manual analysis of a large-scale Tibetan corpus. In order to achieve the two goals of computer automatic segmentation and tagging of the Tibetan corpus, the process of developing Part of Speech Tagging Norms was to first classify empty words and content words, and then determine the largest category. The major categories were then separated into sub-categories and different depth subclasses.
last updated:
27 May 2012
|