Full-Text Electronic Databases of Pre-modern Japanese Literature

By Satoko Shimazaki


This project seeks to introduce significant electronic full-text databases for pre-modern Japanese literature. There are numerous on-line sites reproducing classical works today, but databases that are useful for researchers in Japanese studies are rare. Since e-texts cannot substitute for annotated scholarly print editions, their major utility is the capacity to search the texts for specific words and word combinations.  For those with full Japanese language capability on their computers, of course, you can simply download the texts and then use the search functions in your word-processing programs.  In section 4 below, a list of full-text databases without search functions is provided for those using a Japanese interface with search capabilities.

But for the rest, it will be necessary to rely on the search functions provided by the sites themselves, on which I have focused in this report, specifically the Japanese Text Initiative at University of Virginia, the various electronic databases at Kokubungaku Kenkyû Shiryôkan (NIJL), and the Kichô shiryô gazô at Kyoto University, which offers digitalized images of original manuscripts.  Online full-texts of Japanese literature are still fairly new and there are still many inconveniences for users. But e-texts are a rapidly expanding field, which are sure to bring about significant changes in textual research methods.

Each site described below is is divided into the following four sections:

Introduction: a general overview of the project
Basic information: background information on the site and its purpose.
Content: features of its text and search functions
Future prospects: future plans on the improvement and goals of the site.


1. Introduction: Index of E-texts
2. Searchable Full-Text Databases
    A) Japanese Text Initiative
    B) Kokubungaku kenkyû shiryôkan (NIJL)
        a. NKBT Full-Text Database
        b. Nijûichi-dai shû Database
        c. Renga Database
3. Imaged Manuscripts
4. Other On-line Texts for Japanese Literature


The best starting point for locating particular electronic text versions of classical (but not modern) Japanese literature texts is the PMJS index of "Translations of Classical Japanese Works."   This crucial resource lists translations into European languages of all major classical works, as well as where they can be found in electronic form, either on the Web or CD-ROM.



 Japanese Text Initiative
URL: http://etext.lib.virginia.edu/japanese

Introduction: The Japanese Text Initiative (JTI) is presently the most important site for pre-modern electronic texts. JTI was created for research purposes and contains a search function that allows one to look for any word or character in the entire JTI text, either by itself or in a selected context. As JTI is primarily intended for English-speaking scholars, the users are less likely to face technical problems in downloading the materials as long as one’s interface can read and input Japanese characters. Although the database is still limited in selection at present, it is constantly being updated with new materials. The project has received international acclaim as JTI was named the winner of the second annual Digital Archives Award by Digital Frontier Kyoto, Japan in October 2000. With the inclusion of a greater number of texts, JTI will become an indispensable site for scholars in pre-modern Japanese literature.

Basic information:  JTI is an ongoing collaborative electronic project between the University of Virginia library and the University of Pittsburgh library based at the University of Virginia’s Electronic Text Center. Scholars both in the United States and Japan have participated in this project those including Professor J. Thomas Rimer and Mae J. Smethurst of the University of Pittsburgh and Professor Lewis Cook of Queens College.

The site intends to provide on-line web texts of both classical and modern Japanese literature in Japanese characters. The e-texts are intended not as a substitute for the current printed editions but to add search capabilities for words and characters to overcome the limitations of hard copies. The site is originally aimed at English-speaking scholars and students and thus English translations are provided for works where possible. Materials download in HTML format and thus it is possible to search both in English and Japanese.

As of December 2000, JTI has included nearly 44 works, mostly pre-modern texts, and scholars’ introductions. The site is constantly up-dated. It is the most rapidly growing database for on-line Japanese texts with a comprehensive word search function.

Content: The key feature of this site is the useful organization of the text and the rapid and comprehensive search functions.

Each text is visually very well structured. For long works such as the Tale of Genji, a small frame with a table of contents appear on the right hand side while the text itself is displayed in the larger window. The table of content is not provided for shorter works, instead, it appears with a break between each chapter.

Source information and annotations on the text are provided for every work. Each text contains an acknowledgment section to clarify who has input, proofread, and tagged the text as well as an editorial note to show the manuscript used to reproduce the work on-line and links to other sites where the work is also available. Some materials contain an introduction, for example, Kokinshû, where scholars give an overview of the work and bibliographical information. The most complete section in JTI is the Noh database where notes, English translations, a list of technical terms in Japanese, and a glossary of Japanese Noh terms are provided.

What distinguishes JTI from other electronic literature text sites is that it permits rapid word or a character searches. Two methods of search are possible: simple search and compound search. In the simple search, one can input a word or character and select to perform a search within the full text or within the bibliographical information. As an option, the search can be limited to the entire work of a particular author or to a single work. The compound search function, in addition to the option offered in simple search, allows one to look for a character, character string or words near another character or string. Thus one may conduct search of word X near word Y within a range of 40, 80, or 120, characters.

In the "Tips for search," JTI states that search can be performed in Japanese characters, English, Rômaji, or combinations of Japanese and English. Despite the notice, the results tend to appear only in the form it is provided in the text. A word could also appear both in kanji and kana equivalent. Therefore, it is safest to try the search in various formats to get the most accurate data.

Excerpts of the identified part and the number of matches in the text are presented in the result. If one has cookies enabled in one’s browser, the selected characters or words appear highlighted in red in the results.

Future prospects: In the short term, JTI intends to put all Twenty Classical Works in J. Thomas Rimer’s A Reader’s Guide to Japanese Literature (New York: Kodansha, 1988) on-line. In the longer goal, they plan to add pre-twentieth-century works such as Shinkokinshû, other anthologies of Hachidaishû, and twenty-century literature in Professor Rimer’s guide, those that are free from copyright law restriction.


Kokubungaku Kenkyû Shiryôkan
URL: http://www.nijl.ac.jp/databases/databases.htm

Introduction:  The NIJL (National Institute of Japanese Literature = Kokuritsu Kokubungaku Kenkyû Shiryôkan) in Tokyo offers various on-line text databases, including the Iwanami Nihon Koten bungaku taikei, the Nijûichi-dai shû, a collection of  renga, the Eiri Genji monogatari, and a collection of texts of modern poetry. Here, I will introduce only the first three databases, since the others are either still under construction or inaccessible to those without a Japanese operating system. All three sites are temporarily opened to the public before the replacement of a final version. Each has an excellent search function and is, therefore, a useful site for researchers in pre-modern Japanese literature.

The NKBT  Database is particularly pioneering in offering the entire digitalized text of  Iwanami’s Nihon Koten bungaku taikei on the Web. The full text can be downloaded, and the online version offers two search functions: word search and frequency order search by characters or word. One drawback of this database compared to JTI at University of Virginia is that it is extremely slow and some characters are not distinguishable on a non-Japanese Windows system. If this improves after the drastic reconstruction to take place in January 2000, this database will be the most comprehensive one for researchers in pre-modern Japanese literature.

With the Nijûichi-dai shû and renga databases, even though both are still in progress, rapid word search and downloading is possible and the text is free from “mojibake”. Since JTI does not include the Nijûichi-dai shû nor any renga collections at present, this is the only site with search functions for both sources. The Nijûichi-dai shû database can to a limited extent be used as a substitute for the Kokka taikan for those without access to the CD-ROM version of the latter.


Basic information:  Nihon koten bungaku honbun database (jikken ban) [NKBT-DB] is a site operated by the Kokubungaku kenkyû shiryôkan and it contains the 560 full texts of pre-modern Japanese literature included in 100-volume Nihon Koten Bungaku Taikei (kyû ban) from Iwanami Shoten. The site provides the full text of a work, word search, and a list of letters and characters in frequency order in the text. NKBT-DB is the largest on-line pre-modern text database.

NKBT-DB has been running from April 1999 as a tentative research site. The purpose was to improve the system before releasing the final version on web. The test version will be suspended from January 12th, 2001 for the construction of the improved version, which will open, possibly, in February 1st, 2001. The present tentative site performs slowly since all interactions are processed via the main computer. In the new system, Kokubungaku kenkyû shiryôkan will move all databases to an exclusive server, which will improve the downloading and search procedure. The renewed version is also expected to come out with corrections of the present version where necessary.

NKBT-DB is intended solely for research purpose and thus the users are limited: the site allows access only to researchers, participating scholars at the center, those with special admission, faculties in academia, librarians, undergraduate and graduate students. As for undergraduate students, the site can be used only for their thesis project. In order to gain access to the database, a membership is required. This is free at the moment but might change after the renewal of the system. Membership can be obtained by filling out an application on-line. Within a few weeks, a user ID and a password are sent to you via e-mail. As for undergraduate students, in addition to the application, Kokubungaku kenkyû shiryôkan requires a proposal of their thesis project and a letter of approval from their advisor.

Since it is controlled by the main computer of the Kokubungaku kenkyû shiryôkan, the database has a time restriction for users. Downloading of materials and research functions can be performed only between 9:30-21:00, Japan time, and the main server is also down on Saturdays, Sundays, Holidays, between new years, and the end of each month. This inconvenience should be resolved after the database is transferred to a private server.

Content:  NKBT-DB contains the largest pre-modern texts and has a useful search function, however, often characters are not distinguishable on a non-Japanese operating system and the download time is extremely slow.

NKBT-DB offers two options in accessing the database: in HTML format or in Tel-net. The HTML format, which is the only option for those using English windows, is recommended because it is visually easier to read and faster. However, Kokubungaku kenkyû shiryôkan states that the Tel-net format includes more comprehensive information. After selecting the format, one must limit the search by the first hiragana of the title of the text and the period it belongs to, jôdai, chûko, chûse, or kinse. It is only after this procedure that a list of title appears. In this sense, this site is designed for those with a particular text in mind. An ID and pass-word is required and one can either select to download the full text, search for a word in the text, or attain the list of letters and characters in frequency order.

This database is not designed for mere viewing purpose, as the full-text is hard to read due to its small font. Rather, it is useful as a supplement after the word search. Also, in this tentative version, it is important to pay attention to the size of a document indicated before downloading since even the smallest material requires time to download.

Despite the disadvantage, NKBT-DB is still extremely useful due to its excellent search function. First of all, the letter and character frequency search (moji hindo risuto no sakusei), organizes all words, letters, characters and symbols in the text in frequency order. Each entry is provided with the number of times it appears in the text. At the top of the result, the number of different words, characters, and symbols in the text as well as their total cumulation is indicated. As for its word search (goi mata ha moji retsu kensaku to hyôji), this must be done only in Japanese characters. After entering a word, letter, character, or short sentence for search, the result provides the number of hits in the text with the page and line number. The context, in which the intended words, character, or short sentence appears is also shown. These informations allow one to return to the text and easily find the intended place.

Future prospects: NKBT-DB's slow downloading process, which has been its major concern, should improve as soon as the database is switched to an exclusive server. As for the other problem, the minor "mojibake" that appears in an English Windows, the only solution seems to be to access the site through a Japanese operating system. As the database is not originally designed for those accessing from a different interface, it is less likely that Kokubungaku kenkyû shiryôkan would change the entire system for this purpose.


Basic information: Nijûichi-dai shû database (NDS-DB) contains the full text of twenty-one major waka collection from Kokinwaka shû to shin zoku kokin waka shû. The source is taken from Shôho version Nijûichi-dai shû at the Kokubungaku kenkyû shiryôkan. This is still an ongoing project and the site, at present, is still a tentative one.  Unlike the NKBT database, NDS-DB requires no ID or password and it is opened to the public. The site allows for rapid search and every character is distinguishable even on a non-Japanese computer.

Content:  NDS-DB is visually very well organized and contains a good word search function.

Word search can be done for the word itself or in a certain context. Also, one can limit the search to a particular collection or look for it in the entire twenty-one collections. There is an option to further limit the word search to its preface, annotation, poem, or footnotes.

Future Prospects: The completed version of the NDS-DB is almost finished as the site is going under the supervision of the specialists for each collection. Eventually, NDS-DB will provide a bridge to the newest version of the Kokka Taikan by appending the poem number. Also, minor interpretation of the reading of the poem will be added as reference for research. The completed version should allow one to cut, paste, and write into the poems freely after downloading them on one’s server.


Basic information: Kokubungaku Kenkyû Shiryôkan provides very little information on this database. Basically this is a site to look for a renga through period, key words, or author. It contains a large number of renga from various different sources, however, a full list of texts used to create this site is not indicated. The source information is provided only in the search result. This site is, at present, also a tentative one. It still performs very well even on a non-Japanese operating system.

Content:  The most useful feature of the site is its comprehensive search function. One can limit a search by the specific year (up to two), key words (up to two) in the title or the text, and author of the hokku or wakiku. The result includes the title, renga, authors, composed year, and bibliographical information. The result of the word or character search is highlighted.


Kichô Shiryô Gazô (Kyôto University)
URL: http://ddb.libnet.kulib.kyoto-u.ac.jp/exhibit/index.html

Introduction:  Kichô Shiryô Gazô [KSG] database created by Kyôto University is the most significant site for original materials on pre-modern Japanese literature. It has a large collection and is reproduced on-line in clear picture. In the United States, it is difficult for scholars to gain access to these valuable materials, however, through this site, one can read them on screen by enlarging the image to its original size. Every page of the text is very well scanned and all characters are distinguishable on-screen. Kyoto University aims to eventually add the full text beside the images to allow for word searches. Once this becomes possible, KSG can be used both to look at the original material and to search for a key term in the text.

Basic information:  KSG is an on-line project owned by Kyoto University. It aims to create a "desktop library" to provide access to the valuable original pre-modern texts owned by the University. The project has started in 1996 and has been open to the public from 1998. By as of December 2000, Kyoto University has included one hundred and twenty six full texts.

KSG started with the intention of allowing public access to the valuable pre-modern text and preserving the old materials on the web while the writings are still distinguishable. As a result, the site is open to anyone; however, it is aimed at researchers in pre-modern Japanese studies. Kyoto University has spent more time putting as many works as possible on-line instead of providing commentaries, modern translation or English translation for the general viewer. The only exception to these criteria is Konjyaku monogatari shû, which includes a full introduction with a scholar’s interpretation, annotation, and translation.

In putting up the actual pages of the old materials, Kyoto University has paid much attention to the following three points: 1.Making all words decipherable: 2. Providing an image where one can distinguish between the original writing and a stain or worm eaten parts. Also, making clear the red writings added later: 3. Allowing enlargement to view small details especially for pictorial images. The result is satisfactory. All text downloads fast and all of the writing is distinguishable when enlarged.

Content:  Each text is provided with a small window with the table of content on the left, and is well organized. First the image appears in the size to fit the screen, but by clicking on the image, it enlarges to the actual size of the document. Each material is provided with bibliographical information and some include commentaries and introduction.

There are five ways of finding materials. To get an overview of the whole collection, it is easiest to click on "subete no shiryô wo ichiran suru." Another way is to limit the text by its type: newly added document (shinki tsuika gazô), national treasure (kokuhô), cultural assets (jûyô bunkazai), valuable document (kichô shiryô), materials from the bureau (bukyoku shozô gazô), and for beginners (shoshinsha muke), which includes a summary of the context and introduction. Subject search is also possible by going to the "naiyô kara sagasu". If you have a particular title in mind, you can search by the first hiragana sound of its title. Also, if the work is included in a special collection such as Ishinshiryô gazô database, zôkeishoin bon, fujikawa, bunko, or tanimura bunko, one can directly jump to the index in each database.

Future Prospects: Kyoto University aims to eventually improve the quality of the pictures to be able to put up works that require a clearer image such as maps. Also, in the long run, they hope to be able to include text information and data along with the images. The word search function is another concern. Once the text is created, and the keywords are organized, this will be possible in the KSG. The word search function, which might take a while before it appears, would turn the site into a powerful research tool as well as a significant resource for viewing original materials.


Introduction:  The list of on-line sources provide here is open to the public and does not have a word search function. In this sense, these are not intended as a research tool except if one uses a Japanese interface with a search function. Most of the works provided here have not been finally checked by a scholar. Therefore, some information might be less accurate compared to University of Virginia or Kokubungaku Kenkyu Shiryôkan sites.

Nihon Bungaku tô Tekisuto Fairu  (Text Files for Japanese Literature)
URL: http://kuzan.f-edu.fukui-u.ac.jp/bungaku.htm

This site is maintained by Okajima Akihiro of Fukui University, providing links to downloadable pre-modern Japanese texts, and itself contains a large private collection. Most of the entries are based on the list by M. Shibata’s at Meisei University and other private sources. Works are organized in chronological order and download quickly. Among the private sites without search functions, Okajima’s is the most comprehensive.

Aozora Bunko
URL: http://www.aozora.gr.jp/main.html

Aozora Bunko is an internet library created by volunteers under the support of the Toyota Foundation. It puts up works of which the copyrights have expired (fifty years after the death of the author). The collection is large and it is constantly updated.  Most of the emphasis is on texts of modern Japanese literature, particularly in the first half of the 20th century.  All works can be downloaded in HTML, text file, or richer text format.

Yôkyoku 350-banshû Nyûryoku

This is an ongoing project among volunteers to input the 350 plays in the Noh repertoire, based on the Meicho zenshû edition of Yôkyoku sanbyaku gojyû ban and the Akao Shômondô version Yôkyoku sanbyaku gojyû ban shû. Roughly 70% of the project has already been completed. The name of the person in charge of each work is indicated in the entire list of  works. The purpose of the project is not for research but to provide access to Noh texts for anyone interested. The site does not include any search functions for the texts themselves.