IATS X

Oxford


6-12 September 2003


IT Panel Homepage

IT Panels:

IATS-IX (2000)
IATS-X (2003)
IATS-XI (2006)
IATS-XII (2010)
IATS-XIII (2013)


General Conference Links:

IATS-X HOME

Contact Details
First Circular
Second Circular
Third Circular
Programme
List of Panels
Further Information

Post Conference



Hosted by:


 
St Hugh’s College, Oxford

Tibetan Information Technology Panel
(Paper Abstracts)

Robert R. Chilton
Asian Classics Input Project (ACIP)
“Sorting Unicode Tibetan using a Multi-Weight Collation Algorithm”

This paper considers the question of how to sort Unicode Tibetan data using a collation methodology ("International String Ordering") that is well understood and in widespread use. An algorithm for sorting Tibetan-script data is an essential component of any computer system which fully supports Tibetan-script languages. The paper describes a standard method for implementing collation of Unicode Tibetan data using a collation element table in which the collation elements are assigned multiple levels of sort weight.

Paul G. Hackett
Columbia University
“An Entropy-based Assessment of the Unicode Encoding for Tibetan”

This paper presents an analysis of the Unicode encoding scheme for Tibetan from the standpoint of morphological entropy. We can speak of two levels of entropy in Tibetan: syllable-level entropy (a measure of the probability of the sequential occurrence of syllables), and letter-level entropy (a measure of the probability of the sequential occurrence of letters). Syllable-level entropy is a purely statistical calculation that is a function of the domain of the literature sampled, while letter-level entropy is relatively domain independent. Letter-level entropy can be calculated statistically, though a theoretical upper bound can also be postulated based on language dependent morphology rules. This paper presents both theoretical and statistical estimates of letter-level entropy for Tibetan, and explores the Tibetan Unicode encoding scheme in relation to coding ambiguity, data compression, and other issues analyzed in light of an entropy-based language model.

David Newman, et al.
University of Virginia
“The Use of Technology in Representing Cultural Geography in Tibetan Studies: GIS, XML and Flash”

The present paper will discuss technical issues surrounding the use of GIS technologies as a means of integrating a variety of data sets to create an integrated GIS model for the collaborative publication of place studies on Tibet on the Internet. We will focus on the creation of an XML DTD for a gazetteer and how we have utilized it for Tibetan places, including issues of how to deal with variant toponyms, the relationship between contemporary and historical toponyms and geographical regions, the relationship between administrative and ethno-linguistic regions and so forth. In addition, we will more briefly discuss the problem of relating such textual resources on Tibetan places to GIS-based digital maps of Tibet that show broad statistical data. The specific technical model illustrated will be using Flash to display XML data sets, but the focus will be on how this provides for a comprehensive solution to the documentation of Tibetan places.

Tashi Tsering
University of Virginia
“On the Design of a Tibetan Font Converter”

Since there are many different encoding schemes existing for Tibetan script in the community, there is strong requirement for converters for converting files between these encodings. This paper discusses some of the issues related to the design and development of a Tibetan font converter, and reports on the work of the author in developing a more comprehensive and more efficient converter.

Christopher E. Walker
University of Chicago
“Lhasa Verb Database for Language Learning”

This paper explores the production issues pertaining to both information technology and verbal classifications in the Lhasa Verbs project. While other major languages abound with instructional cd-roms aimed at the language student, Tibetan has seen a late start in birth of such learning tools. On a pragmatic level, the Lhasa Verbs cd-rom is an attempt to remedy this situation by providing an interactive database including audio and query functions. But more than that, this database raises important questions as to the future of Tibetan lexical items in new forms of information technology. By expanding the traditional and limited categories of tha dad pa and tha mi dad pa into richer schemes necessary for the non-native learner, how can we work towards a consensus of classification which will aid in computation analysis? If intentionality as expressed through verbal auxiliaries is a crucial feature of the central dialect, how do we merge the syntactic frames based on studies of the classic language? The Lhasa Verbs project also points to future directions with Tibetan input methods, the display of Unicode in databases, the use of classroom video, and ways in which technology can be employed to highlight those characteristics of the Tibetan language that traditional pedagogy has overlooked (tones, vowel length). This paper argues that as Unicode emerges as the standard of text encoding, we should also push for greater standards of Tibetan lexical classification emcompassing colloquial discourse.

last updated: 27 May 2012