# New Research

2013/09/01

# 統計的情報量でデータを科学する

## 議員定数配分へのアプローチ

### Associate Professor Etsuo Kumagai

Data Science Research Group Members: Professor Yutaka Kano, Associate Professor Etsuo Kumagai, and Assistant Professor Kengo Kamatani. The following is an introduction by Associate Professor E. Kumagai.

Included here are sample mean and sample variance using data-based statistics. By considering these samples as random variables and investigating their distributions and interesting parameters, we are able to obtain certain properties.

We consider two important concepts in statistics, the Fisher information with respect to the parameters in a distribution and the Kullback-Leibler information with respect to the difference between two distributions. The latter is always greater than or equal to zero, and if the two distributions are equivalent, it becomes zero because it formulates the difference of two quasi-entropies with respect to the two distributions.

The less biased the distribution is, the more the entropy increases, and the more biased the distribution is, the more the entropy decreases. For example, if a die almost always shows the number 1, its entropy becomes small, but if the die shows every number equivalently, the entropy becomes large.

Now, as an application of the statistical information, we consider the apportionment problem, which decreases to a disparity of one vote in an election.

In Japan, this problem relates to the equity in deciding an apportionment in small electoral districts for the House of Representatives and House of Councilors. In Renyi's information based on the Kullback-Leibler information, it is shown that the famous five divisor methods are all included. We arrive at a more equitable apportionment, one which more directly relates the distribution of the population to that of the apportionment.

Through application of such statistical studies, we work with our graduate students both theoretically and practically, for example, with comparison of downside measures, which include Value at Risk (VaR), performances of neural network (NN) and support vector machine (SVM) in the regression problem, and the decision problems of prior distributions in Bayes statistics.