Tibetan
Information Technology Panel
(Paper Abstracts)
Robert
R. Chilton
Asian Classics Input Project (ACIP)
Sorting Unicode Tibetan using a Multi-Weight Collation Algorithm
This
paper considers the question of how to sort Unicode Tibetan data
using a collation methodology ("International String Ordering") that is
well understood and in widespread use. An algorithm for sorting
Tibetan-script data is an essential component of any computer system
which fully supports Tibetan-script languages. The paper describes a
standard method for implementing collation of Unicode Tibetan data using
a collation element table in which the collation elements are assigned
multiple levels of sort weight.
Paul
G. Hackett
Columbia University
An Entropy-based Assessment of the Unicode Encoding for Tibetan
This
paper presents an analysis of the Unicode encoding scheme for Tibetan
from the standpoint of morphological entropy. We can speak of two levels
of entropy in Tibetan: syllable-level entropy (a measure of the probability
of the sequential occurrence of syllables), and letter-level entropy
(a measure of the probability of the sequential occurrence of letters).
Syllable-level entropy is a purely statistical calculation that is a
function of the domain of the literature sampled, while letter-level
entropy is relatively domain independent. Letter-level entropy can be
calculated statistically, though a theoretical upper bound can also
be postulated based on language dependent morphology rules. This paper
presents both theoretical and statistical estimates of letter-level
entropy for Tibetan, and explores the Tibetan Unicode encoding scheme
in relation to coding ambiguity, data compression, and other issues
analyzed in light of an entropy-based language model.
David
Newman, et al.
University of Virginia
The Use of Technology in Representing Cultural Geography in Tibetan
Studies: GIS, XML and Flash
The
present paper will discuss technical issues surrounding the use of GIS
technologies as a means of integrating a variety of data sets to create
an integrated GIS model for the collaborative publication of place studies
on Tibet on the Internet. We will focus on the creation of an XML DTD
for a gazetteer and how we have utilized it for Tibetan places, including
issues of how to deal with variant toponyms, the relationship between
contemporary and historical toponyms and geographical regions, the relationship
between administrative and ethno-linguistic regions and so forth. In
addition, we will more briefly discuss the problem of relating such
textual resources on Tibetan places to GIS-based digital maps of Tibet
that show broad statistical data. The specific technical model illustrated
will be using Flash to display XML data sets, but the focus will be
on how this provides for a comprehensive solution to the documentation
of Tibetan places.
Tashi
Tsering
University of Virginia
On the Design of a Tibetan Font Converter
Since
there are many different encoding schemes existing for Tibetan script
in the community, there is strong requirement for converters for converting
files between these encodings. This paper discusses some of the issues
related to the design and development of a Tibetan font converter, and
reports on the work of the author in developing a more comprehensive
and more efficient converter.
Christopher
E. Walker
University of Chicago
Lhasa Verb Database for Language Learning
This paper explores the production issues pertaining to both information
technology and verbal classifications in the Lhasa Verbs project. While
other major languages abound with instructional cd-roms aimed at the
language student, Tibetan has seen a late start in birth of such learning
tools. On a pragmatic level, the Lhasa Verbs cd-rom is an attempt to remedy
this situation by providing an interactive database including audio and
query functions. But more than that, this database raises important
questions as to the future of Tibetan lexical items in new forms of
information technology. By expanding the traditional and limited categories
of tha dad pa and tha mi dad pa into richer schemes necessary for the
non-native learner, how can we work towards a consensus of classification
which will aid in computation analysis? If intentionality as expressed
through verbal auxiliaries is a crucial feature of the central dialect, how
do we merge the syntactic frames based on studies of the classic language?
The Lhasa Verbs project also points to future directions with Tibetan input
methods, the display of Unicode in databases, the use of classroom video,
and ways in which technology can be employed to highlight those
characteristics of the Tibetan language that traditional pedagogy has
overlooked (tones, vowel length). This paper argues that as Unicode emerges
as the standard of text encoding, we should also push for greater standards
of Tibetan lexical classification emcompassing colloquial discourse.
last updated:
27 May 2012
|