在Facebook上關注我們,隨時得到最新消息 在Twitter上關注我們,隨時得到最新消息 在新浪微博上關注我們,隨時得到最新消息 在豆瓣上關注我們,隨時得到最新消息
中國哲學書電子化計劃
討論區 -> 最新消息 -> Word clouds

2012-09-24 07:32:19Word clouds
發言者:admin (CTP Admin)Word clouds have been added for all texts on the site. These show a graphical representation of unexpectedly frequent words and characters in a particular text, with more frequent and/or unusual words in larger fonts. To display the word cloud for a particular text, click on the icon shown on its contents page under "Media".

2012-09-25 12:35:05Word clouds
發言者:bao_pu (Scott Barnwell)The description says "word clouds on this site highlight unusually frequent words in a text, rather than simply frequent words." When looking at the Daodejing, the largest word was Tianxia 天下, which surprised me, and after looking at the Xunzi, Mozi, Zhuangzi, Mengzi, Guanzi, Huainanzi, Lüshi Chunqiu and Shenzi, it does not seem to be that unusual of a word. But I assume that grouped with all the others (the classics, histories, etymology, military, etc.) it IS an unusual word. This is a drawback of taking such a large group of texts to draw data from, but I still think the Word Clouds are interesting.

2012-09-25 14:24:40Word clouds
發言者:dsturgeon (Donald Sturgeon)It's true that "天下" is a relatively common word. But at the same time, if you look at the number of times that the word "天下" appears in a text versus the total number of characters in that text, of all the pre-Qin and Han texts on the site, only Shen Bu Hai exceeds the Dao De Jing in terms of frequency of "天下". In the Dao De Jing, we have 61 天下's in 5278 characters, so the frequency of 天下 in the Dao De Jing is about 61/5278 * 100% =~ 1.15%. In pre-Qin and Han texts, we have about 10608 天下's in around 5.5 million characters, so overall the frequency of 天下 in pre-Qin and Han texts is closer to 10608/5520329 * 100% =~ 0.19%. 1.15% is significantly more than 0.19%, which is why it is statistically significant that the Dao De Jing should mention "天下" so often relative to other texts. The Mozi certainly talks about "天下" a lot, but statistically speaking, the Mozi has this word appearing 523 times in 80421 characters, which gives us 0.65% - well above average, but by no means as frequent as the same word in the Dao De Jing. The Shuo Yuan is an example of a text in which the word "天下" appears about as often as it does generally in pre-Qin and Han text; in the word cloud for Shuo Yuan, "天下" does appear, but in a fairly small font: ctext.org/media.pl?if=en&id=7 .

The idea is something like this: if you were to pick, at random, a contiguous selection of pre-Qin and Han Chinese text of the same length as the Dao De Jing, it would be an extremely likely result that the text you chose would contain fewer occurrences of the word "天下" than the Dao De Jing. Typically, there just aren't that many 天下's. In fact, you'd pretty much have to have chosen the Shen Bu Hai or the Dao De Jing itself to get the opposite result, since all other texts contain fewer occurrences of this word than the Dao De Jing. In this sense, the Dao De Jing uses the word "天下" disproportionately often as compared to other texts.

This may be surprising, and of course there can be many different explanations as to the reason for it, but it does appear to be a statistical fact that on the face of it seems worthy of attention.

If you want to look at the actual numbers, you can do a normal search on the site and then click on "Statistics" to see numbers of occurrences of a character or word in different texts. If you then click on the link for "Export data", you'll get a table that you should be able to copy and paste into Excel (or import into any other spreadsheet program) to process further. This functionality is still under construction, but might already be useful if you're interested in looking at character and word frequencies.

2012-09-26 15:40:53Word clouds
發言者:bao_pu (Scott Barnwell)Thanks Donald,

You've given the Daodejing as containing 5278, where can I find the total characters in each text?

2012-09-26 16:01:14Word clouds
發言者:dsturgeon (Donald Sturgeon)That functionality hasn't been officially released yet either; at the moment the word counts only appear in the data you get when you click on "Export data" as described above, as the column currently labeled "Text length". The counts do not include punctuation or any emendations applied to the texts. Please don't rely on these numbers at present though as this feature has not yet been properly tested and some data may be incorrect.

2012-09-26 19:58:33Word clouds
發言者:bao_pu (Scott Barnwell)It seems it is not fully functional yet, as you said. When I search the Daodejing for 天 and click export data, all I see in the right-hand box is:
"Section/title" "天" "Text length"

I get that same result for some other texts as well.

2012-09-26 20:27:03Word clouds
發言者:bao_pu (Scott Barnwell)It seems it is not fully functional yet, as you said. When I search the Daodejing for 天 and click export data, all I see in the right-hand box is:
"Section/title" "天" "Text length"

I get that same result for some other texts as well.

2012-09-27 00:09:12Word clouds
發言者:dsturgeon (Donald Sturgeon)The "stats" function is intended to give a summary of the number of occurrences of a search term in subdivisions of a text (e.g. chapters, volumes, etc.). Since the Daoedejing doesn't have any, searching it in stats mode doesn't give you any data. Try searching in "Daoism" instead.



若您想要參與討論,請在下述的表格輸入您的CTP賬號及密碼登錄。若您尚未申請CTP賬號,請免費註冊

登入
帳號:
密碼:
不要自動登出
忘記密碼

喜歡我們的網站請支持我們的發展網站的設計與内容(c)版權2006-2024如果您想引用本網站上的内容,請同時加上至本站的鏈接:https://ctext.org/zh。請注意:嚴禁使用自動下載軟体下載本網站的大量網頁,違者自動封鎖,不另行通知。沪ICP备09015720号-3若有任何意見或建議,請在此提出