Estimating the Entropy of Chinese Using the Sliding-Window Entropy Estimator
Main Article Content
Abstract
Three different text sources, namely a Chinese newspaper, the classical novel “Red Chamber Dream†and the modem prose “The Sahara†are selected for small-sample studies of the entropy of Chinese. We use the sliding window entropy estimator with the window size fixed at 1000 characters. By varying the number of window shifts up to 1000, we obtain entropy estimates of Chinese for the three different text sources. To improve the slow rate of convergence of the sliding window entropy estimator, we adopt the restricted sliding window estimator due to Kontoyiannis et al. Experimental indications are that modem Chinese has an entropy of less than 4.5 bits/character and that this entropy is less than that of classical Chinese.
Downloads
Article Details
Licensee MJS, Universiti Malaya, Malaysia. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).