Instructions : Semantic linking - Chinese Text Project

Semantic linking

This feature refers to links between a unique instance of a word (i.e. a "token") in the textual database, and a specific usage class in the CTP dictionary (a "type"). Intuitively, the idea behind this is to provide a way for the system to distinguish which occurrences of a particular character correspond to a particular usage of a word. This is important because, unlike a human reader, an automated system cannot reliably identify on its own which instances of a character such as "墨" occur as a single-character word meaning something like "black ink", which occurences refer to Mozi or the Mohist school, and which occurrences are in fact a character in a compound word such as "墨子" or "墨翟". Human inputted semantic links allow the system to take these different usages of a character into account. Since this data must be input manually, and cannot be automatically generated by computer except in certain relatively simple cases, at present it does not cover all characters in the database.

Dictionary entries

One application of the semantic linking data is to provide detailed dictionary entries with many examples. Because the system knows which 墨 are used to mean "ink", and which are used to refer to the Mohists, it can give specific examples of each of these. Being an electronic rather than printed resource, there is no limit in principle to how many examples can be given beyond the number of actual historical instances. For brevity, five to ten examples are given on the dictionary pages themselves, but by clicking the link to the right of an entry, it is possible to search the relevant section of the database for all occurrences known to the system.

Searching for proper names

As well as providing definitions for many terms, the semantic linking data also encodes equivalency data for proper names. One common criticism of digital textual resources is that their ease of use can lead researchers to overlook less obvious references to historical individuals. For example, someone interested in references to the philosopher Xunzi might try searching for "荀子" in the database, and be surprised that there are relatively few occurrences. The researcher would then try searching for him by name, and be pleased to see far more references to "荀卿". But were the researcher to stop there, he or she would be missing out on many other important instances of direct references to the man Xunzi. Though often referred to as "荀卿", there are also many references to him under the name "孫卿" or "孫卿子", as well as cases where only his surname "荀" is used in a manner that unmistakeably refers to Xunzi.

A closely related problem with textual searches is the issue of false positives: there may be instances of the two characters "荀子" appearing together in that order, yet not referring to the philosopher Xunzi. This is rather graphically illustrated by a search for "桓公" - though a majority of instances refer to 小白, the brother of 公子糾, many do not. Similarly, a search for "孟子" will return many references to the influential Confucian thinker, but also some referring to wives of various rulers.

In an attempt to address these problems, the Chinese Text Project search function gives a warning whenever a textual search term matches a proper name in the CTP dictionary, and offers suggestions for related searches that might be of assistance (such searches can also be made directly through the Advanced search function, or by clicking the links of dictionary entries).