無限木構造隠れMarkovモデルによる階層的品詞の教師なし学習

2017年03月09日

概要

はじめに

無限木構造隠れMarkovモデル(Infinite Tree HMM、以下iTHMM)は、状態遷移確率を木構造棒折り過程により表す隠れマルコフモデルです。

以前に実装したInfinite HMMと同様どちらも状態数がデータから決まりますが、iTHMMは階層的な状態を学習することができます。

たとえば自然言語の品詞を考えると、「名詞」という親には「名詞 - 人名」などの子がありますが、従来のIHMMではこのような階層的な状態を学習することはできません。

実装難易度は以前に実装したNPYLMと同じくらいだと感じました。

最後まで諦めない強い精神力が必要です。

またこの論文は密度が濃いため、この記事では要点だけ解説します。

参考文献

実装にあたって読まなければならない論文です。

棒折り過程

iTHMMを実装するには棒折り過程(以下SBP)がどのような仕組みになっているかを理解する必要があります。

ここでは基底分布を$G_0$とし、$k$番目のデータ$\theta_k$が$\theta_k \sim G_0$のように生成されている場合を考えます。

まず長さ1の棒を用意します。

image

この棒を、現在の長さに対する比率$\gamma_1 \sim {\rm Be}(1, \alpha)$で折り、折った棒を$\theta_1$の位置に立てます。

image

比率$\gamma_k$は$k$の値によらず$0 < \gamma_k < 1$を満たします。

$\gamma_k$は棒の長さではなく比率であることをよく覚えておいてください。

次に棒を$\gamma_2 \sim {\rm Be}(1, \alpha)$の比率で折り、$\theta_2$の位置に立てます。

image

同様に$\gamma_3 \sim {\rm Be}(1, \alpha)$の比率で折り、$\theta_3$の位置に立てます。

image

これを無限回繰り返すと$G_0$から離散分布$G$を生成することができます。

最終的に$\theta_k$の位置に立っている棒の長さが$\theta_k$の確率になります。

また長さが1の棒から始めているため、確率の総和は必ず1になります。

このように生成される$G$は多項分布になっているため、これをHMMに用いれば、無限の状態についてそれぞれの事前確率を生成することができます。

次に、$k$回目に折った棒の長さ$\pi_k$を考えます。

これは$k-1$回までの棒折り全てで残った棒を選択し、最後に折った棒を選択することになるので、以下のように表すことができます。

\[\begin{align} \pi_k = \gamma_k \prod_{j=1}^{k-1}(1-\gamma_j) \end{align}\\]

何度も言いますが$\pi_k$は棒の長さであり、$\gamma_k$は比率であることに注意しましょう。

このようにして得られる無限次元の多項分布を$\pi=\{\pi_1, \pi_2, \ldots\}$で表します。

この$\pi$はハイパーパラメータ$\alpha$を調整することでどのような形にもなりますが、データから$\gamma_k$や$\pi_k$の事後確率を求めることができます。

いま、$\pi$からの実現値${\cal D} = \{x_1, x_2, \ldots\}$が与えられたとします。

${\cal D}$のもとでの$\pi$の事後確率$p(\pi \mid \cal D)$は、SBPを構成するそれぞれの確率変数$\gamma_k$の事後確率の積で表現することができ、$\cal D$の中で値が$k$であるものの個数を$n_0(k)$、値が$k$よりも大きいものの個数を$n_1(k)$とすると、$\gamma_k$の事後確率は

\[\begin{align} \gamma_k \mid {\cal D} \sim {\rm Be}(1+n_0(k), \alpha+n_1(k)) \end{align}\\]

となります。

この$\gamma_k$の期待値は以下のように計算できます。

\[\begin{align} \double E[\gamma_k \mid {\cal D}] = \frac{1+n_0(k)}{1+\alpha+n_0(k)+n_1(k)} \end{align}\\]

したがって、$\pi_k$の事後確率の期待値は、式(1)と式(3)から

\[\begin{align} \double E[\pi_k \mid {\cal D}] &= \frac{1+n_0(k)}{1+\alpha+n_0(k)+n_1(k)}\prod_{j=1}^{k-1}\left\{ 1-\frac{1+n_0(j)}{1+\alpha+n_0(j)+n_1(j)} \right\}\\ &= \frac{1+n_0(k)}{1+\alpha+n_0(k)+n_1(k)}\prod_{j=1}^{k-1}\frac{\alpha+n_1(j)}{1+\alpha+n_0(j)+n_1(j)}\\ \end{align}\\]

となります。

木構造棒折り過程

上の棒折り過程の例では折った棒に親子関係などは一切ありませんが、木構造棒折り過程(以下TSSB)は折った棒をノードとする木構造を作ることで親子関係を考えます。

またTSSBは階層クラスタリングのための事前分布であり、データを実際に観測する前に、データをクラスタリングするとしたらどういう分割になるか?ということをモデル化するためのものです。

ここで言うデータとは品詞列$s_1 \ldots s_{\infty}$の各品詞$s_t$のことだと考えるとわかりやすいです。

実際に品詞列を観測する前に、もし品詞列$s_1 \ldots s_{\infty}$をこれから観測するとしたら、それぞれの品詞$s_t$の値はどうなるのか?というのが「クラスタリングのための事前分布」です。

ここでは各クラスタを品詞と考えています。

TSSBでは棒に対する再帰的な処理を行うことで各$s_t$が木構造のどの位置に配置されるかを決定しますが、ここではその処理の流れについて先に見ていきます。

(クラスタリングされるデータそれぞれを「客」と呼びます。配置されて初めて値が決定します。)

1. 棒が与えられる

image

2. 折る

比率$\nu_1$で折ります。

image

3. 停止するかどうか決める

確率$\nu_1$で客は折った棒にとどまります。

そうでない場合は子の棒に降ります。

image

親にとどまる場合はここで処理終了です。

4. 折る

比率$\psi_1$で子の棒を折ります。

また確率$\psi_1$で折った棒に止まるかどうかを決定します。

image

ここでは止まらずに次に進むことを考えます。

5. 折る

比率$\psi_2$で残った子の棒を折ります。

また確率$\psi_2$で折った棒に止まるかどうかを決定します。

image

今回はここで止まるとします。

6. 1に戻る

子の棒に止まったら、その棒に対して処理1から順に同様の処理を再帰的に行います。

TSSBの処理はこの1~6の繰り返しですが、実際にこの後の処理がどのように行われるかを可視化すると、まず比率$\nu_2$で棒を折ります。

image

確率$\nu_2$で親にとどまり、確率$1-\nu_2$で子に降ります。

image

この作業を品詞列$s_1 \ldots s_{\infty}$のそれぞれの品詞について行うと、木は細分化され以下のようになります。

image

木構造は以下のようになっています。

image

このようにして得られるTSSBですが、上から押しつぶすと一本の棒になり、IHMMにおける品詞の事前分布と同じであることがわかります。

image

棒の長さが確率を表すのは同じですが、TSSBではそれに加えてそれぞれの棒の階層構造を考えることができます。

通常の棒折り過程でクラスタリングを行えば、それぞれのクラスタは整数値$k=1,2,\ldots$で表すことができますが、TSSBの各クラスタ(つまりそれぞれの棒)は可変長の整数列$\boldsymbol s$で表されます。

まずルートノードは空の配列[]になります。

ルートノードの子は左から順に[1]、[2]、[3]のように表します。

同様にノード[1]の子は[1,1]、[1,2]、[1,3]のように表します。

また、あるノードの左側にあるノードのインデックスは辞書順で先に来ます。

たとえばノード[1,2]とノード[1,3]であれば[1,2]の方が左側に来ます。

TSSBのCDP表現

TSSBにおいて客がノード$\boldsymbol s$に止まる確率$\pi_{\boldsymbol s}$は以下のように定義されます。

\[\begin{align} \pi_{\boldsymbol s} &= \nu_{\boldsymbol s}\phi_{\boldsymbol s}\prod_{\boldsymbol s' \prec \boldsymbol s} \phi_{\boldsymbol s}(1 - \nu_{\boldsymbol s'})\\ &= \nu_{\boldsymbol s}\prod_{\boldsymbol s' \prec \boldsymbol s}(1 - \nu_{\boldsymbol s'})\cdot \prod_{\boldsymbol s'\preceq \boldsymbol s}\phi_{\boldsymbol s'}\\ \nu_{\boldsymbol s} &\sim {\rm Be}(1, \alpha)\\ \psi_{\boldsymbol sk} &\sim {\rm Be}(1, \gamma)\\ \phi_{\boldsymbol sk} &= \psi_{\boldsymbol sk}\prod_{j=1}^{k-1}(1-\psi_{sj}) \end{align}\\]

$\psi_{\boldsymbol sk}$はノード$\boldsymbol s$の$k$番目の子ノードを折る時の比率です。

この式からTSSBには$\nu$で定義される縦方向のSBPと$\psi$で定義される横方向のSBPの2つが存在することがわかります。

ここでは下図のようにノード$\boldsymbol s_{[2,3,2]}$に客が止まった状態を考えます。

image

まず縦方向ですが、これは深さのみに注目すると以下のようなSBPになっています。

image

次に横方向ですが、これはそれぞれの深さについてSBPが存在します。

image image image

客を追加したら式(3)の事後分布を更新するために$n_0(\boldsymbol s)$、$n_1(\boldsymbol s)$、$m_0(\boldsymbol s)$、$m_1(\boldsymbol s)$を更新する必要がありますが、上の図の例だと以下のようになります。

image

たった1人の客を追加するだけでもこれだけのパラメータを更新する必要があります。

無限木構造Markovモデル

TSSBを用いれば階層化された状態とそれぞれの確率が計算でき、無限木構造を持つモデルを作れることがわかりました。

しかし、このままではTSSBは状態の分布$p(s_j)$を表しているだけであり、HMMに用いるためには状態遷移確率$p(s_j \mid s_i)$を計算する必要があります。

つまりTSSB上の各ノードについて、同じTSSB上の他の全ノードへの遷移確率が必要になります。

いま、客をクラスタリングした結果以下のようなTSSBが得られたとします。

左の木構造は右のTSSBの構造だけを抜き出したものです。

image

このとき、それぞれの状態から別の状態への遷移確率はどのように定義すればよいでしょうか?

image

iTHMMでは以下のように各ノードにTSSBを持たせることで遷移確率を表します。

image

これらのTSSBはもとの木構造と同じ形をしていますが、それぞれの棒の長さは違います。

これらのTSSBでは、例えば$\boldsymbol s_{[1]}$への遷移確率は以下の場所に存在します。

image

元の木構造の位置に対応する棒の長さがそのまま遷移確率になります。

iTHMMでは、それぞれのノードが持つ遷移確率用のTSSBを$\boldsymbol \pi^{\boldsymbol s}$で表します。

上付きか下付きかで意味が違うので注意が必要です。

下付きの$\pi_{\boldsymbol s}$は棒の長さですが、上付きの$\boldsymbol \pi^{\boldsymbol s}$は、木構造上のノード$\boldsymbol s$が持つTSSBのことを差しています。

つまり、\(\boldsymbol \pi^{\boldsymbol s} = \{\pi^{\boldsymbol s}_{[]},\pi^{\boldsymbol s}_{[1]},\pi^{\boldsymbol s}_{[2]},\ldots\}\)です。

階層的TSSB

TSSBにおけるノードは木構造になっているので、各ノードからの遷移確率を表すTSSB$\boldsymbol \pi^{\boldsymbol s}$は独立ではなく、親子間の依存関係を持っていると考えるのが自然です。

例えばノード[2,3]が「名詞 - 固有名詞」を表していたとすると、[2,3]からの遷移確率$\boldsymbol \pi^{[2,3]}$は親ノードの[2](つまり「名詞」を表す状態)の遷移確率$\boldsymbol \pi^{[2]}$の影響を受けているはずです。

image

TSSBは縦と横の2つのSBPから成り立っているので、親の影響を子に与えるためには階層的なSBPを考える必要があります。

これは参考文献の式(20)以降に載っていますがここにも載せておきます。

まず親のSBPでは棒を折る比率$\beta’_k$をベータ分布から以下のように生成します。

\[\begin{align} \beta'_k \sim {\rm Be}(1, \gamma) \end{align}\\]

これらの比率で棒を折っていった時、$k$番目に折った棒の長さ$\beta_k$は

\[\begin{align} \beta_k = \beta'_k\prod_{j=1}^{k-1}(1-\beta'_j) \end{align}\\]

となります。

ここで使用した文字は参考文献に合わせてありますが、$\beta’_k$は折る比率、$\beta_k$は棒の長さであることに注意してください。

つぎに、この\(\boldsymbol \beta = \{\beta_1,\beta_2,\ldots\}\)からディリクレ過程によって$\boldsymbol \beta$の影響を受けた\(\boldsymbol \pi=\{\pi_1,\pi_2,\ldots\}\)が

\[\begin{align} \boldsymbol \pi \sim {\rm DP}(\alpha, \boldsymbol \beta) \end{align}\\]

のように生成されるとします。

このとき、$\boldsymbol \pi$を構成する棒を折るそれぞれの比率$\gamma_k$は

\[\begin{align} \gamma_k \sim {\rm Be}\left(\alpha\beta_k,\alpha\left(1-\sum_{j=1}^k\beta_j\right)\right) \end{align}\\]

のように生成されます。

またデータ$\cal D$が与えられたもとでの$\gamma_k$の事後分布の期待値は

\[\begin{align} \double E[\gamma_k \mid {\cal D}] = \frac { \alpha\beta_k+n_0(k) }{ \alpha(1-\sum_{j=1}^{k-1}\beta_j)+n_0(k)+n_1(k) } \end{align}\\]

となります。

あとはこの式(15)がTSSBではどのようになるかを考えれば良いのですが、ここで1つ問題があります。

iTHMM論文の式(20)は以下のようになっていますが、

\[\begin{align} \nu_{\boldsymbol s} \sim {\rm Be}\left(\alpha\nu'_{\boldsymbol s},\alpha\left(1-\sum_{\boldsymbol u \preceq \boldsymbol s}\nu'_{\boldsymbol u}\right)\right) \end{align}\\]

これは縦のSBPに式(14)を当てはめたもので、\(\nu'_{\boldsymbol s}\)は親のTSSBにおける\(\nu_{\boldsymbol s}\)を表しています。

このiTHMM論文の式(20)とこの記事の式(14)は記号こそ違えど本質は同じはずですが、比較してみると以下のように食い違いがあります。

image

これはiTHMM論文のほうが誤っていると考えられます。

理由を説明するために、まずiTHMM論文の式(22)と式(23)を載せます。

\[\begin{align} \double E[\nu_{\boldsymbol s} \mid {\cal D}] &= \frac { \alpha\nu'_{\boldsymbol s}+n_0(\boldsymbol s) }{ \alpha(1-\sum_{\boldsymbol u \prec \boldsymbol s}\nu'_{\boldsymbol u})+n_0(\boldsymbol s)+n_1(\boldsymbol s) }\\ \double E[\psi_{\boldsymbol sk} \mid {\cal D}] &= \frac { \alpha\psi'_{\boldsymbol sk}+m_0(\boldsymbol sk) }{ \alpha(1-\sum_{j=1}^{k-1}\psi'_{\boldsymbol sj})+m_0(\boldsymbol sk)+m_1(\boldsymbol sk) }\\ \end{align}\\]

これらの式は上の式(15)のTSSB版なのですが、分母の\(1-\sum_{\boldsymbol u \preceq \boldsymbol s}\nu'_{\boldsymbol u}\)に着目すると、\(\sum_{\boldsymbol u \preceq \boldsymbol s}\nu'_{\boldsymbol u}\)は比率の総和であるため1を超える可能性が十分にあります。

もしこれが1を超えてしまうと、分母の\(\alpha(1-\sum_{\boldsymbol u \preceq \boldsymbol s}\nu'_{\boldsymbol u})\)が負の値になり、\(\double E[\nu_{\boldsymbol s} \mid {\cal D}]\)が負の値になります。

$\nu_{\boldsymbol s}$は$0 < \nu_{\boldsymbol s} < 1$を満たす必要があるため、負の値を取ってはいけません。

しかし、\(\sum_{\boldsymbol u \preceq \boldsymbol s}\nu'_{\boldsymbol u}\)の\(\nu'_{\boldsymbol u}\)が比率ではなく棒の長さだったとしたら、棒の総和は必ず1以下になるため分母が負の値になることはありえません。

したがってこれはiTHMM論文の単純な記号の表記ミスだと考えられます。

実はiTHMM論文では縦横それぞれのSBPにおける棒の長さは横のSBPのみ上の式(10)で\(\phi_{\boldsymbol sk} = \psi_{\boldsymbol sk}\prod_{j=1}^{k-1}(1-\psi_{sj})\)と記号に割り当てられているだけであり、縦のSBPにおける棒の長さがどの記号にも割り当てられていません。

さらにiTHMM論文では$\alpha$が多重定義されており混乱を招きます。

そこでこれ以降の説明のためにも以下のように定義しておきます。

\[\begin{align} \nu_{\boldsymbol s} &\sim {\rm Be}(1, \alpha)\nonumber\\ \psi_{\boldsymbol sk} &\sim {\rm Be}(1, \gamma)\nonumber\\ \phi_{\boldsymbol sk} &= \psi_{\boldsymbol sk}\prod_{j=1}^{k-1}(1-\psi_{sj})\nonumber\\ \mu_{\boldsymbol s} &= \nu_{\boldsymbol s}\prod_{\boldsymbol s' \prec \boldsymbol s}(1 - \nu_{\boldsymbol s'})\\ \pi_{\boldsymbol s} &= \nu_{\boldsymbol s}\phi_{\boldsymbol s}\prod_{\boldsymbol s' \prec \boldsymbol s} \phi_{\boldsymbol s}(1 - \nu_{\boldsymbol s'})\nonumber\\ &= \nu_{\boldsymbol s}\prod_{\boldsymbol s' \prec \boldsymbol s}(1 - \nu_{\boldsymbol s'})\cdot \prod_{\boldsymbol s'\preceq \boldsymbol s}\phi_{\boldsymbol s'}\nonumber\\ &= \mu_{\boldsymbol s} \cdot \prod_{\boldsymbol s'\preceq \boldsymbol s}\phi_{\boldsymbol s'}\\ \boldsymbol \pi^{\boldsymbol s} &\sim {\rm HTSSB}(\sigma, \boldsymbol \pi^{\boldsymbol s'}) \end{align}\\]

このように表記すれば、TSSBは縦のSBPと、深さの数だけ存在する横のSBPの積になっていることがよくわかると思います。

これらの記号を用いてTSSBにおけるそれぞれのSBPの棒を折る比率の事後分布を正しく書くと以下のようになります。

\[\begin{align} \double E[\nu_{\boldsymbol s} \mid {\cal D}] &= \frac { \sigma\mu'_{\boldsymbol s}+n_0(\boldsymbol s) }{ \sigma(1-\sum_{\boldsymbol u \prec \boldsymbol s}\mu'_{\boldsymbol u})+n_0(\boldsymbol s)+n_1(\boldsymbol s) }\\ \double E[\psi_{\boldsymbol sk} \mid {\cal D}] &= \frac { \sigma\phi'_{\boldsymbol sk}+m_0(\boldsymbol sk) }{ \sigma(1-\sum_{j=1}^{k-1}\phi'_{\boldsymbol sj})+m_0(\boldsymbol sk)+m_1(\boldsymbol sk) }\\ \end{align}\\]

この式(21)と式(22)が何を意味しているかを正確に把握しなければ実装できません。

図で示すと以下のようになります。

image

image

さらに\(\mu'_{\boldsymbol s}\)や\(\phi'_{\boldsymbol sk}\)自体も親のTSSBから生成されているため、ルートノードの遷移確率TSSBに到達するまで再帰処理を行ない計算する必要があります。

iTHMMとHCDP

階層ディリクレ過程といえば、その実現例としてよく使われる中華料理店過程やPitman-Yor過程があります。

たとえば基底分布$G_0$からディリクレ過程によって分布$G$が生成され、k番目の観測値$\theta_k$が$G(\theta)$から生成される場合を考えます。

\[\begin{align} G &\sim {\rm DP}(\alpha, G_0)\\ \theta_k &\sim G(\theta) \end{align}\\]

この$\theta_k$を中華料理店過程でクラスタリングしている場合に、1番目から$n-1$番目までのデータを観測しクラスタリングし終わったとします。

このとき、$n$番目の観測値$\theta_{n}$は以下の分布から生成されます。

\[\begin{align} p(\theta_n \mid \theta_1,\ldots,\theta_{n-1}) = \frac{\alpha}{\alpha + n - 1}G_0(\theta)+\frac{n-1}{\alpha+n-1}\left(\frac{1}{n-1}\sum_{k=1}^{n-1}\delta_{\theta_k}\right) \end{align}\\]

\(\left(\frac{1}{n-1}\sum_{k=1}^{n-1}\delta_{\theta_k}\right)\)は今まで観測したデータからなる経験分布(この場合は多項分布)を表しています。

この式(25)が意味することは、データ$\theta_1,\ldots,\theta_{n-1}$を観測した後では、$\theta_n$は$\frac{\alpha}{\alpha + n - 1}$の比で$G_0(\theta)$から生成され、$\frac{n-1}{\alpha+n-1}$の比で経験分布から生成されます。

$\theta_n$が経験分布から生成された場合は客を経験分布に追加して分布を更新しますが、$G_0(\theta)$から生成された場合、経験分布に追加するとともに代理客を$G_0(\theta)$に追加することで$G_0(\theta)$も更新します。

このように客を用いた階層ディリクレ過程のクラスタリングでは「代理客」を適切に追加しなければなりません。

iTHMMの話に戻りますが、iTHMMの遷移確率を表すTSSBは縦と横のSBPからなりますが、それぞれのSBPが親のTSSBから影響を受けている階層ディリクレ過程になっています。

\[\begin{align} \boldsymbol \pi^{\boldsymbol s} \sim {\rm HTSSB}(\sigma, \boldsymbol \pi^{\boldsymbol s'}) \end{align}\\]

階層ディリクレ過程は中華料理店過程として表すこともできるため、式(25)はHTSSBでも成り立たなければなりません。

したがって、子のTSSBから新しいデータが生成された時、代理客を親のTSSBに追加する必要がありますが、これはどのようにして行えばよいでしょうか?

ここでHTSSBを中華料理店過程で表した時に、何が何から生成されているかを確認しておきましょう。

image

CRPで考えた場合、$\boldsymbol \pi^{\boldsymbol s}$を構成するSBPの棒を折る比率が親のTSSBから生成されていると考えます。

図では垂直方向のSBPのみ示していますが水平方向も同様です。

TSSBにおいて$\nu_{\boldsymbol s}$が意味するものは、「その位置で止まる確率」です。

したがって、$\nu_{\boldsymbol s}$から生成される観測値$\theta_k$は「通過」か「停止」のどちらかを表します。

(正確には確率$\nu_{\boldsymbol s}$で「停止」し、確率$1-\nu_{\boldsymbol s}$で「通過」します)

ここでノード$\boldsymbol s$を通過した客数を$n_1(\boldsymbol s)$、停止した客数を$n_0(\boldsymbol s)$、総和を$n(\boldsymbol s)=n_0(\boldsymbol s) + n_1(\boldsymbol s)$とすると、式(25)をHTSSBに当てはめたものは以下のようになります。

\[\begin{align} p(\theta_n=停止 \mid \theta_1,\ldots,\theta_{n-1}) &= \frac{\sigma}{\sigma + n(\boldsymbol s)}\nu'_{\boldsymbol s}+\frac{n(\boldsymbol s)}{\sigma+n(\boldsymbol s)}\left(\frac{n_0(\boldsymbol s)}{n(\boldsymbol s)}\right)\\ p(\theta_n=通過 \mid \theta_1,\ldots,\theta_{n-1}) &= \frac{\sigma}{\sigma + n(\boldsymbol s)}(1-\nu'_{\boldsymbol s})+\frac{n(\boldsymbol s)}{\sigma+n(\boldsymbol s)}\left(\frac{n_1(\boldsymbol s)}{n(\boldsymbol s)}\right) \end{align}\\]

$\theta_k$は「停止」か「通過」の2つのクラスタのどちらかに割り当てられます。

image

CRP的に描くと上の図のようになり、「通過」か「停止」のテーブルが生成され、客はいずれかのテーブルに着席します。

$n_1(\boldsymbol s)$はテーブル「通過」に座っている総客数、$n_0(\boldsymbol s)$はテーブル「停止」に座っている総客数を表しています。

しかし今重要なのは、「停止」や「通過」の判定の際、どのくらいの比で親の$\nu’_{\boldsymbol s}$が使われたかということです。

式(28)と式(29)より、子のTSSBで客が$\boldsymbol s$に止まる時、$\frac{\sigma}{\sigma + n(\boldsymbol s)}$の比で親TSSBの\(\nu'_{\boldsymbol s}\)を用いて停止判定を行い、$\frac{n(\boldsymbol s)}{\sigma+n(\boldsymbol s)}$の比で自分のもつ経験分布(つまり$\nu_{\boldsymbol s}$)を用いて停止判定を行っていることがわかります。。

したがって以下のベルヌーイ試行を行ない、後者が出た場合親のTSSBのノード$\boldsymbol s$に客を追加します。

\[\begin{align} \left[\frac{n(\boldsymbol s)}{n(\boldsymbol s)+\sigma}, \frac{\sigma}{n(\boldsymbol s) + \sigma}\right] \end{align}\\]

ところが、この式に対応するiTHMM論文の式(30)は以下のようになっています。

\[\begin{align} \left[\frac{n(\boldsymbol s)}{n(\boldsymbol s)+\sigma}, \frac{\sigma}{n(\boldsymbol s) + \sigma}\nu'_{\boldsymbol s}\right] \end{align}\\]

一体どちらが正しいのか私は自信を持てませんが、iTHMM論文の方は$\frac{n(\boldsymbol s)}{n(\boldsymbol s)+\sigma}$が比率を表しているのに対し$\frac{\sigma}{n(\boldsymbol s) + \sigma}\nu’_{\boldsymbol s}$が確率を表しているため不自然な感じがします。

念のため私の実装では式(30)と式(31)両方に対応させています。

追記(2017/04/07)

著者の持橋先生からコメントを頂きました。

式(31)が正しいそうです。

上のベルヌーイ試行で後者が出た場合、垂直なCDPの例だけ載せますが、以下のように客を追加します。

image

ただし、客を削除する際に代理客を適切に削除しなければならないため、実際はベルヌーイ試行ではなく専用のCRPを作ります。

親から生成されたと判定された場合にテーブルを作成し客をそこに追加しますが、子から生成された場合はすでに存在するテーブルの客数に比例した確率でテーブルを選択肢そこに客を追加します。

このCRPは以下のようにテーブル番号がない特殊なものになっており、これを用いるとテーブル数+1の多項分布を考えることができます。

image

この多項分布からサンプリングし、新しいテーブルだった場合に親TSSBに客を追加します。

逆にノード$\boldsymbol s$から客が削除された時、各テーブルの客数に比例した確率でテーブルを選択し、そこから客を1人削除します。

テーブルが空になったら親のTSSBのノード$\boldsymbol s$から客を削除します。

出力確率

HMMの出力確率$p(w_t \mid s_t)$ですが、木構造の階層を考慮し、親の品詞の出力確率が子の品詞の出力確率に影響を与えていると考えます。

image

そこでPitman-Yor過程を用いて親の出力確率から生成します。

\[\begin{align} p(\cdot \mid \boldsymbol s) \sim {\rm PYP}(p(\cdot \mid \boldsymbol s'), d_{\mid \boldsymbol s \mid}, \theta_{\mid \boldsymbol s \mid}) \end{align}\\]

ハイパーパラメータの$d_{\mid \boldsymbol s \mid}$と$\theta_{\mid \boldsymbol s \mid}$は深さごとに共通のものを使用します。

ルートノードには親がないため一様分布$H$から$p(\cdot \mid \boldsymbol s_{[]})$を生成します。

無限木構造について

iTHMMには「無限木構造」「TSSB」「HTSSB」「HPYLM」の4つの要素があります。

image

この無限木構造ですが、これは上でTSSBを用いて作ると書きましたが、

image

実際は$\boldsymbol \pi^{\boldsymbol s}$へのポインタを持っていればどのように実装しても構いません。

というのも、iTHMMではこの図にあるような$p(s_j)$を実際に用いることはなく、$p(s_j \mid s_i)$だけ考えています。

深さの固定

iTHMMは本来無限の深さまでノードが存在しますが、論文によると深さを固定したほうが良い結果が得られるようです。

これは$\boldsymbol s$が一定の深さ以上のときに$\nu_{\boldsymbol s}$の値を常に1にすれば実現できます。

実装する際の注意点

iTHMMでは親ノード親TSSB親ノードが持つTSSBの自分と同じ位置のノードを頻繁に扱います。

さらに再帰処理が非常に多いため、どのノードに対して何を計算すべきかを正しく把握しておかないと途中で実装を諦めてしまうことになります。

ここでは私が実装していて気づいた注意点やデバッグすべき項目を紹介します。

この記事でも頻繁に「客」という言葉を使いましたが、iTHMMの学習における「客」とは、品詞$\boldsymbol s_i$に続く品詞$\boldsymbol s_j$のカウントのことを指しています。

たとえば状態$[1]$に続いて状態$[1,2]$を観測したとします。

このとき客をTSSB$\boldsymbol \pi^{[1]}$のノード$\boldsymbol s_{[1,2]}$に追加します。

image

縦横それぞれのCDPに追加すると以下のようになります。

image

この時、この客が親のTSSBから生成されていると判定された時は親のTSSBのノード$\boldsymbol s_{[1,2]}$に代理客を追加します。

image

ただし、親から生成されたかどうかの判定は縦と横のCDPそれぞれ独立して行ないます。

縦だけ親から生成されたと判定された場合の代理客の配置は以下のようになります。

image

横だけ親から生成されたと判定された場合の代理客の配置は以下のようになります。

image

縦も横も親から生成されたと判定された場合の代理客の配置は以下のようになります。

image

これを親TSSBが無くなるまで上に再帰的に処理を行っていきます。

つまり代理客に対しても、それがさらに親から生成されたかどうかを判定します。

HPYLM

Pitman-Yor過程ではパラメータ$\theta_{\mid \boldsymbol s \mid}$が集中度を表しており、これを$\infty$にすると生成された分布は基底分布に一致します。

そこでわざと$\theta_{\mid \boldsymbol s \mid}$を巨大な値にし、全てのノード$\boldsymbol s$の出力分布が一様分布に近くなることを確認しましょう。

HTSSB

式(21)よりHTSSBの集中度は$\sigma$です。

そこでわざと$\sigma$を巨大な値にし、全てのノード$\boldsymbol s$のもつ$\boldsymbol \pi^{\boldsymbol s}$が、$\boldsymbol \pi^{[]}$と同一の形(構造が同じなのは当然ですが、さらにそれぞれの棒の長さも同じ)になることを確認しましょう。

客の削除

一旦訓練データのすべての客を追加てからすべての客を削除したときに、代理客も含めてモデルから客が全員いなくなるかを確認しましょう。

実験

不思議の国のアリスの原作を用いて実験を行いました。

iTHMMの実装および実験のコードはGitHubにあります。

https://github.com/musyoku/unsupervised-pos-tagging/tree/master/infinite-tree-hmm

今回はTreeTaggerを用いて正確に形態素解析を行った以外には前処理をしていません。

また実行するとわかりますが、ターミナル上では棒のインデックスに色を付けてわかりやすくしています。

image

単語の横の数値はその品詞に割当てられた回数を表しています。

$M=1$の場合

まず深さを1に固定して学習を行いました。

こうすることで先行研究であるIHMMとほぼ同じ結果が得られます。

学習後にViterbiアルゴリズムで状態系列を推定し、それがTreeTaggerの形態素解析結果とどのような対応関係にあるかをプロットしたものが以下になります。

image

この図は各行ごとに正規化を行っており、正解品詞がどの予測タグにどの程度含まれているかを表しています。

次に木構造と割り当てられた単語です。

[]
one (33) more (29) much (24) two (23) time (21) now (18) that (13) only (10) all (9) things (14) way (13) three (13) thing (13) yet (13) first (9) is (10) down (10) what (12) right (12) not (10) day (12) it (7) so (10) saying (12) for (7) and (7) as (9) was (7) tea (9) next (9) look (11) 
	[0,]
	, (1889) ! (280) ? (100) than (7) 
	[1,]
	was (243) s (151) had (134) were (57) 've (36) 're (31) never (23) all (20) heard (19) found (16) are (13) gave (13) is (11) saw (11) made (10) kept (10) did (10) only (8) both (7) very (6) opened (6) soon (5) quite (5) put (5) shook (5) hastily (4) ever (4) drew (4) stood (4) sounded (3) taught (3) 
	[2,]
	. (950) ! (87) ? (64) 
	[3,]
	she (416) it (191) ' (153) they (120) he (76) alice (72) there (54) i (44) you (34) this (19) who (14) we (14) that (13) things (4) everybody (4) people (3) ever (3) nobody (3) was (2) anything (2) nothing (2) else (2) these (2) two (1) something (1) easily (1) crumbs (1) ferrets (1) sentenced (1) 
	[4,]
	the (1286) a (472) her (116) his (78) this (49) your (49) an (49) their (39) some (35) its (34) my (30) any (28) one (21) another (15) that (13) being (10) every (9) these (8) two (7) great (7) such (6) little (6) each (6) four (6) soo (6) our (6) very (5) white (3) ten (3) tea (2) as (1) 
	[5,]
	<eos> (1101) 
	[6,]
	it (187) her (70) them (66) you (56) that (50) me (46) all (40) him (33) this (28) course (22) here (20) once (15) last (14) us (9) first (8) which (8) sight (8) hearts (7) things (6) cats (6) nonsense (6) silence (5) afraid (5) least (5) reply (5) fact (4) next (3) otherwise (3) instance (3) executed (2) meant (2) 
	[7,]
	to (412) and (15) would (15) meekly (1) 
	[8,]
	n't (162) you (61) not (53) have (15) me (7) ever (6) better (6) they (6) only (5) it (5) i (5) all (4) never (4) even (3) rather (3) old (3) cats (3) just (2) this (2) soon (2) nearly (2) he (2) some (2) hardly (2) cross (2) that (1) alice (1) well (1) she (1) either (1) bats (1) 
	[9,]
	very (67) no (52) not (43) so (40) quite (38) all (30) been (24) it (23) looking (22) herself (19) got (17) getting (15) too (14) going (14) much (13) n't (12) just (11) never (10) come (9) beginning (8) nothing (7) something (7) good (7) how (7) on (6) left (6) grown (6) always (6) soon (6) certainly (5) lying (5) 
	[10,]
	and (313) who (26) which (24) but (23) my (13) turning (7) with (6) after (4) feeling (4) time (1) alice (1) 
	[11,]
	said (258) thought (21) with (17) cried (13) added (6) shouted (5) and (4) old (3) poor (3) please (3) continued (3) interrupted (3) father (3) exclaimed (3) except (3) pleaded (3) mary (3) trying (2) asked (2) lives (2) tossing (2) screamed (2) throwing (1) filled (1) raising (1) 
	[12,]
	alice (163) herself (39) william (4) five (3) ann (3) fellow (3) she (2) seven (1) execute (1) 
	[13,]
	as (180) that (168) if (76) so (74) what (72) when (57) it (54) how (33) there (21) or (19) till (15) but (12) before (11) sure (10) whether (10) who (9) perhaps (9) only (8) everything (7) where (7) though (7) why (6) said (4) quite (4) let (4) yer (4) all (4) yet (3) anything (3) glad (3) still (3) 
	[14,]
	in (291) at (159) to (151) with (116) on (78) for (55) by (51) into (47) about (46) all (42) like (35) of (34) down (27) after (26) such (25) from (22) over (22) upon (18) off (17) out (16) round (16) as (13) near (12) took (11) is (10) under (10) among (9) up (8) behind (8) followed (8) through (8) 
	[15,]
	oh (37) why (17) however (16) no (15) down (8) yes (8) now (7) sure (7) twinkle (6) indeed (5) please (5) never (3) like (3) myself (3) five (3) bats (3) hush (3) besides (3) ah (3) sir (2) curiouser (2) tut (2) beheaded (2) edwin (2) morcar (2) thump (2) two (1) even (1) too (1) remarking (1) 
	[16,]
	do (86) could (70) would (64) did (58) 'll (44) must (32) 'd (27) can (26) should (26) will (24) ca (23) are (22) might (21) wo (19) does (16) shall (16) had (11) may (11) have (9) let (6) only (5) sha (5) has (3) almost (3) makes (3) dare (3) now (1) said (1) told (1) altogether (1) usually (1) 
	[17,]
	then (68) dear (21) here (15) way (6) more (4) collar (1) 
	[18,]
	be (114) know (70) see (53) 'm (45) think (44) have (38) get (36) go (35) say (32) do (29) like (26) tell (24) make (19) wish (19) come (17) wonder (15) find (14) eat (14) take (14) put (13) remember (12) speak (11) am (11) look (10) ask (10) grow (10) mean (10) talk (10) begin (9) hear (9) give (9) 
	[19,]
	is (35) well (19) for (9) makes (8) far (6) long (5) soon (5) suddenly (5) many (4) honour (4) begins (4) large (3) hard (3) usual (3) drink (3) forgetting (3) please (2) eagerly (2) often (2) doth (2) quickly (2) i (1) politely (1) w (1) squeaked (1) desperate (1) curled (1) became (1) 
	[20,]
	i (383) you (194) what (30) how (17) we (10) please (6) dinah (5) why (5) where (4) which (4) now (3) one (2) but (1) explanations (1) 
	[21,]
	said (94) went (63) began (45) looked (38) was (33) thought (26) is (23) seemed (22) came (19) got (18) felt (17) replied (16) ran (14) tried (14) added (13) spoke (13) knew (12) ought (12) waited (11) asked (10) sat (10) turned (9) walked (9) repeated (9) used (7) took (7) called (6) go (6) hurried (6) remarked (6) noticed (6) 
	[22,]
	come (13) first (4) treacle (2) unimportant (2) wake (2) exactly (1) 
	[23,]
	and (268) but (76) for (31) while (16) because (11) exactly (3) hold (3) even (2) turn (2) suppose (2) unless (2) bill (1) sir (1) singing (1) 
	[24,]
	does (4) ootiful (4) came (3) not (2) either (1) the (1) nine (1) 
	[25,]
	beau (4) next (3) first (2) it (1) soup (1) 
	[26,]
	up (66) on (63) out (41) off (41) down (34) back (27) about (24) again (15) away (15) round (14) so (10) anxiously (10) close (8) angrily (7) silent (7) ready (7) nothing (6) over (6) together (6) running (6) talking (5) done (5) herself (5) written (5) mad (5) much (4) slowly (4) growing (4) lessons (4) croquet (4) used (4) 
	[27,]
	well (17) 
	[28,]
	but (15) now (4) are (4) shall (2) 
	[29,]
	alice (64) he (22) that (1) 
	[30,]
	turtle (45) thing (27) hare (24) rabbit (20) idea (6) size (6) hurry (6) witness (6) minutes (6) deal (6) remark (5) box (5) gardeners (5) children (4) pigs (4) man (4) height (4) timidly (3) key (3) plan (3) girl (3) lady (3) gloves (2) change (2) boy (2) guests (2) verse (2) difficulty (2) crab (2) figure (2) quadrille (2) 
	[31,]
	just (14) once (7) 
	[32,]
	out (22) one (10) sort (8) tired (5) some (4) full (2) entirely (1) 
	[33,]
	of (365) and (57) for (25) than (6) o (4) voice (1) thanked (1) kissed (1) 
	[34,]
	little (77) mock (43) march (24) great (23) white (23) very (22) first (15) poor (15) same (14) good (14) other (14) long (13) large (13) right (11) whole (10) next (9) curious (9) three (8) last (8) old (7) own (7) few (7) jury (6) best (6) bright (5) guinea (5) young (5) sudden (5) glass (4) most (4) melancholy (4) 
	[35,]
	again (49) tone (23) now (17) voice (15) before (10) soup (8) high (7) eagerly (3) leaves (3) that (1) afterwards (1) 
	[36,]
	or (38) very (18) without (16) rather (7) half (5) merely (2) suddenly (1) 
	[37,]
	queen (62) king (52) hatter (48) gryphon (45) duchess (35) time (35) mouse (32) dormouse (31) cat (28) head (26) way (22) moment (22) door (19) caterpillar (19) rabbit (16) court (16) eyes (15) voice (14) minute (14) words (14) little (13) table (12) bit (12) baby (12) air (12) side (11) house (11) end (11) day (10) other (10) question (10) 

次に各状態の出力確率を確率順でソートしたものです。

単語 割り当て回数 確率

になっています。

この確率はHPYLMにより計算しており割り当て回数とは無関係です。

[]
one	33	0.00830262
more	29	0.00631613
much	24	0.00532289
two	23	0.00508451
time	21	0.00496532
now	18	0.00456802
that	13	0.00413099
only	10	0.00357478
all	9	0.00337613
things	14	0.0033364
way	13	0.00297883
three	13	0.00297883
thing	13	0.00297883
yet	13	0.00289937
first	9	0.00278018
is	10	0.00278018
down	10	0.00278018
what	12	0.00274045
right	12	0.00274045
not	10	0.00258153
day	12	0.00258153
it	7	0.0025418
so	10	0.0025418
saying	12	0.0025418
for	7	0.00238288
and	7	0.00238288
as	9	0.00238288
was	7	0.00238288
tea	9	0.00238288
next	9	0.00238288
look	11	0.00238288

[0,]
,	1889	0.829873
!	280	0.122935
?	100	0.043849
than	7	0.00298795

[1,]
was	243	0.261667
s	151	0.162501
had	134	0.14419
were	57	0.0612213
've	36	0.0385799
're	31	0.033192
never	23	0.0245876
all	20	0.0213819
heard	19	0.0202773
found	16	0.0170419
are	13	0.0138091
gave	13	0.0137978
is	11	0.0116756
saw	11	0.0116427
made	10	0.0105791
kept	10	0.0105683
did	10	0.0105656
only	8	0.00845366
both	7	0.00734099
very	6	0.00626612
opened	6	0.00625748
soon	5	0.00519394
quite	5	0.0051907
put	5	0.0051826
shook	5	0.00517774
hastily	4	0.00411906
ever	4	0.00411637
drew	4	0.00409747
stood	4	0.00409747
sounded	3	0.00302529
taught	3	0.0030226

[2,]
.	950	0.862663
!	87	0.0788369
?	64	0.0579473

[3,]
she	416	0.331847
it	191	0.152286
'	153	0.121947
they	120	0.0956134
he	76	0.0605008
alice	72	0.0573125
there	54	0.0429442
i	44	0.0349615
you	34	0.0269853
this	19	0.0150115
who	14	0.0110163
we	14	0.0110154
that	13	0.0100756
things	4	0.0030487
everybody	4	0.0030346
people	3	0.002243
ever	3	0.00224224
nobody	3	0.00223824
was	2	0.00144797
anything	2	0.00144702
nothing	2	0.00144606
else	2	0.00144511
these	2	0.00143939
two	1	0.00066285
something	1	0.000644175
easily	1	0.000641126
crumbs	1	0.000640364
ferrets	1	0.000640173
sentenced	1	0.000639411

[4,]
the	1286	0.533527
a	472	0.195767
her	116	0.0480518
his	78	0.0322826
this	49	0.0202536
your	49	0.0202501
an	49	0.0202495
their	39	0.0161001
some	35	0.0144433
its	34	0.0140254
my	30	0.0123663
any	28	0.0115358
one	21	0.00865551
another	15	0.00614331
that	13	0.00532354
being	10	0.00406994
every	9	0.0036531
these	8	0.00323828
two	7	0.00283678
great	7	0.00282382
such	6	0.00240841
little	6	0.00240829
each	6	0.00240781
four	6	0.00240781
soo	6	0.00240722
our	6	0.00240722
very	5	0.00199526
white	3	0.0011636
ten	3	0.0011636
tea	2	0.000754014
as	1	0.000339078

[5,]
<eos>	1101	0.999809

[6,]
it	187	0.246139
her	70	0.0919705
them	66	0.0867072
you	56	0.0735411
that	50	0.0656586
me	46	0.0603458
all	40	0.052475
him	33	0.0432164
this	28	0.0366447
course	22	0.0287261
here	20	0.0261019
once	15	0.0195144
last	14	0.0181906
us	9	0.0115985
first	8	0.0103081
which	8	0.0102855
sight	8	0.0102788
hearts	7	0.00896128
things	6	0.00767938
cats	6	0.00765054
nonsense	6	0.00764603
silence	5	0.00633979
afraid	5	0.00633754
least	5	0.00632807
reply	5	0.00632627
fact	4	0.00501102
next	3	0.00371605
otherwise	3	0.00369306
instance	3	0.00369126
executed	2	0.00238051
meant	2	0.00237826

[7,]
to	412	0.929552
and	15	0.0334122
would	15	0.0334089
meekly	1	0.00180655

[8,]
n't	162	0.429177
you	61	0.161309
not	53	0.140097
have	15	0.0392747
me	7	0.0180433
ever	6	0.0154137
better	6	0.0154056
they	6	0.0153982
only	5	0.0127982
it	5	0.012779
i	5	0.0127539
all	4	0.0101421
never	4	0.0101051
even	3	0.00745935
rather	3	0.00745639
old	3	0.00744531
cats	3	0.00744162
just	2	0.00480394
this	2	0.00480394
soon	2	0.00480394
nearly	2	0.0048032
he	2	0.00479951
some	2	0.00479655
hardly	2	0.00478547
cross	2	0.00477809
that	1	0.00219877
alice	1	0.00216257
well	1	0.00215519
she	1	0.00214706
either	1	0.00214263
bats	1	0.00212933

[9,]
very	67	0.102957
no	52	0.0798241
not	43	0.0660135
so	40	0.06139
quite	38	0.0582775
all	30	0.0460035
been	24	0.0366856
it	23	0.0351963
looking	22	0.0335999
herself	19	0.0289826
got	17	0.0258958
getting	15	0.0228245
too	14	0.021294
going	14	0.0212775
much	13	0.0195521
n't	12	0.018197
just	11	0.0166819
never	10	0.0151359
come	9	0.0135848
beginning	8	0.0120543
nothing	7	0.0105289
something	7	0.0105083
good	7	0.0104981
how	7	0.0104981
on	6	0.00896239
left	6	0.00895622
grown	6	0.00894696
always	6	0.00894593
soon	6	0.00866966
certainly	5	0.00743702
lying	5	0.007401

[10,]
and	313	0.741227
who	26	0.0611391
which	24	0.0564009
but	23	0.0540365
my	13	0.0303331
turning	7	0.0161165
with	6	0.0137508
after	4	0.00900868
feeling	4	0.00900847
time	1	0.00192169
alice	1	0.00190713

[11,]
said	258	0.70246
thought	21	0.0566899
with	17	0.0457937
cried	13	0.0348835
added	6	0.0158115
shouted	5	0.0130895
and	4	0.0103865
old	3	0.00764277
poor	3	0.00764277
please	3	0.00764222
continued	3	0.00763734
interrupted	3	0.0076368
father	3	0.00763463
exclaimed	3	0.00763409
except	3	0.00763192
pleaded	3	0.00763192
mary	3	0.00763192
trying	2	0.00492889
asked	2	0.00491533
lives	2	0.00491262
tossing	2	0.00490991
screamed	2	0.00490937
throwing	1	0.00218519
filled	1	0.00218465
raising	1	0.00218248

[12,]
alice	163	0.743363
herself	39	0.177166
william	4	0.0173541
five	3	0.0127897
ann	3	0.0127864
fellow	3	0.0127864
she	2	0.00822996
seven	1	0.00366264
execute	1	0.00365444

[13,]
as	180	0.196739
that	168	0.183626
if	76	0.0829332
so	74	0.0807676
what	72	0.0785814
when	57	0.0621456
it	54	0.058886
how	33	0.0356747
there	21	0.0227722
or	19	0.0205783
till	15	0.0161962
but	12	0.0129274
before	11	0.0118314
sure	10	0.0107315
whether	10	0.0107239
who	9	0.00963364
perhaps	9	0.00963364
only	8	0.00856828
everything	7	0.00745276
where	7	0.00744931
though	7	0.00744739
why	6	0.00635676
said	4	0.00417664
quite	4	0.00417052
let	4	0.00416898
yer	4	0.00415941
all	4	0.00397123
yet	3	0.00309137
anything	3	0.00308448
glad	3	0.00307835
still	3	0.0030749

[14,]
in	291	0.194388
at	159	0.106151
to	151	0.10081
with	116	0.0774196
on	78	0.0520151
for	55	0.036655
by	51	0.033959
into	47	0.0312852
about	46	0.0306269
all	42	0.0279753
like	35	0.023276
of	34	0.0226055
down	27	0.0179426
after	26	0.0172539
such	25	0.0165835
from	22	0.0145862
over	22	0.0145781
upon	18	0.0119003
off	17	0.011242
out	16	0.0105756
round	16	0.0105675
as	13	0.00858032
near	12	0.00789168
took	11	0.00722323
is	10	0.00657902
under	10	0.00655277
among	9	0.00588432
up	8	0.00522194
behind	8	0.00521951
followed	8	0.00521588
through	8	0.00521588

[15,]
oh	37	0.212711
why	17	0.0971442
however	16	0.0913311
no	15	0.085558
down	8	0.0451807
yes	8	0.0450979
now	7	0.0394628
sure	7	0.0393386
twinkle	6	0.0335434
indeed	5	0.0277992
please	5	0.0277772
never	3	0.0162323
like	3	0.0162323
myself	3	0.0162116
five	3	0.0162047
bats	3	0.0161978
hush	3	0.0161909
besides	3	0.0161909
ah	3	0.0161909
sir	2	0.0104316
curiouser	2	0.0104109
tut	2	0.0104109
beheaded	2	0.0104109
edwin	2	0.0104109
morcar	2	0.0104109
thump	2	0.0104109
two	1	0.00480064
even	1	0.0046847
too	1	0.00466538
remarking	1	0.00464468

[16,]
do	86	0.134067
could	70	0.109067
would	64	0.0996921
did	58	0.0903152
'll	44	0.0684385
must	32	0.0496888
'd	27	0.0418764
can	26	0.040318
should	26	0.0403139
will	24	0.037189
ca	23	0.0356265
are	22	0.0340743
might	21	0.0325036
wo	19	0.0293766
does	16	0.0246928
shall	16	0.0246912
had	11	0.0168829
may	11	0.0168768
have	9	0.01376
let	6	0.00907466
only	5	0.0075368
sha	5	0.00750193
has	3	0.00438313
almost	3	0.00438272
makes	3	0.00437903
dare	3	0.00437903
now	1	0.00129716
said	1	0.00127049
told	1	0.00125777
altogether	1	0.00125613
usually	1	0.00125366

[17,]
then	68	0.589524
dear	21	0.18086
here	15	0.128699
way	6	0.0504617
more	4	0.0331071
collar	1	0.006958

[18,]
be	114	0.102717
know	70	0.0630108
see	53	0.047658
'm	45	0.0404378
think	44	0.0395353
have	38	0.0341401
get	36	0.0323201
go	35	0.0314276
say	32	0.0287499
do	29	0.0260075
like	26	0.0233199
tell	24	0.0214849
make	19	0.0169773
wish	19	0.0169723
come	17	0.0151872
wonder	15	0.0133722
find	14	0.0124647
eat	14	0.0124647
take	14	0.0124637
put	13	0.0115712
remember	12	0.0106547
speak	11	0.00976114
am	11	0.00975217
look	10	0.00890449
ask	10	0.00885463
grow	10	0.00885463
mean	10	0.00884965
talk	10	0.00884965
begin	9	0.0079571
hear	9	0.0079571
give	9	0.00794713

[19,]
is	35	0.255978
well	19	0.138299
for	9	0.0647994
makes	8	0.0573651
far	6	0.042693
long	5	0.0353816
soon	5	0.035357
suddenly	5	0.0353308
many	4	0.0279801
honour	4	0.0279473
begins	4	0.0279473
large	3	0.0206522
hard	3	0.0206277
usual	3	0.0206031
drink	3	0.0206031
forgetting	3	0.0205949
please	2	0.0132736
eagerly	2	0.0132671
often	2	0.0132507
doth	2	0.013249
quickly	2	0.0132425
i	1	0.00593105
politely	1	0.00593105
w	1	0.00589008
squeaked	1	0.00589008
desperate	1	0.00589008
curled	1	0.00589008
became	1	0.00589008

[20,]
i	383	0.575635
you	194	0.291433
what	30	0.0448229
how	17	0.0252661
we	10	0.0147383
please	6	0.0087257
dinah	5	0.00722633
why	5	0.0072228
where	4	0.00571839
which	4	0.00571755
now	3	0.00422977
one	2	0.00274181
but	1	0.00121054
explanations	1	0.00120383

[21,]
said	94	0.133853
went	63	0.0895988
began	45	0.0639307
looked	38	0.0539268
was	33	0.0465594
thought	26	0.0368307
is	23	0.0325888
seemed	22	0.0311026
came	19	0.0268625
got	18	0.0254011
felt	17	0.02397
replied	16	0.0225701
ran	14	0.0196905
tried	14	0.0196905
added	13	0.0182731
spoke	13	0.0182639
knew	12	0.0168374
ought	12	0.0168374
waited	11	0.0154109
asked	10	0.0139982
sat	10	0.0139881
turned	9	0.0125717
walked	9	0.0125625
repeated	9	0.0125579
used	7	0.00970946
took	7	0.00970946
called	6	0.00830591
go	6	0.00829213
hurried	6	0.00828754
remarked	6	0.00828754
noticed	6	0.00828202

[22,]
come	13	0.533161
first	4	0.158407
treacle	2	0.0750168
unimportant	2	0.0750088
wake	2	0.0749787
exactly	1	0.0333695

[23,]
and	268	0.639142
but	76	0.180915
for	31	0.0735226
while	16	0.0377143
because	11	0.025779
exactly	3	0.00668907
hold	3	0.00668614
even	2	0.00430756
turn	2	0.00430649
suppose	2	0.0042985
unless	2	0.00429717
bill	1	0.00192125
sir	1	0.00191459
singing	1	0.00191192

[24,]
does	4	0.237401
ootiful	4	0.237369
came	3	0.175059
not	2	0.112657
either	1	0.0500667
the	1	0.0500212
nine	1	0.0500212

[25,]
beau	4	0.345159
next	3	0.254533
first	2	0.163743
it	1	0.0728944
soup	1	0.0727486

[26,]
up	66	0.127043
on	63	0.121257
out	41	0.0787957
off	41	0.078791
down	34	0.0649291
back	27	0.0517598
about	24	0.0459731
again	15	0.0285989
away	15	0.0285886
round	14	0.0266544
so	10	0.0189785
anxiously	10	0.0189326
close	8	0.0150763
angrily	7	0.0131365
silent	7	0.0131318
ready	7	0.0131318
nothing	6	0.0112435
over	6	0.0112107
together	6	0.011206
running	6	0.0112014
talking	5	0.00930741
done	5	0.00930367
herself	5	0.00928026
written	5	0.00927557
mad	5	0.00891196
much	4	0.00746123
slowly	4	0.00735916
growing	4	0.00735447
lessons	4	0.00735447
croquet	4	0.00734979
used	4	0.00734511

[27,]
well	17	0.987676

[28,]
but	15	0.591821
now	4	0.152087
are	4	0.151978
shall	2	0.071984

[29,]
alice	64	0.733264
he	22	0.250555
that	1	0.00922331

[30,]
turtle	45	0.188243
thing	27	0.112721
hare	24	0.100004
rabbit	20	0.0832282
idea	6	0.0244168
size	6	0.0244072
hurry	6	0.0244008
witness	6	0.0243992
minutes	6	0.0243927
deal	6	0.0243911
remark	5	0.0201993
box	5	0.0201993
gardeners	5	0.0201816
children	4	0.0160058
pigs	4	0.0159817
man	4	0.0159817
height	4	0.0159737
timidly	3	0.0118187
key	3	0.0117786
plan	3	0.0117786
girl	3	0.0117722
lady	3	0.0117722
gloves	2	0.00761721
change	2	0.00759475
boy	2	0.00757871
guests	2	0.00757871
verse	2	0.00757871
difficulty	2	0.00757871
crab	2	0.00757871
figure	2	0.00757871
quadrille	2	0.00757871

[31,]
just	14	0.656861
once	7	0.323682

[32,]
out	22	0.419188
one	10	0.18865
sort	8	0.149982
tired	5	0.0923007
some	4	0.0730952
full	2	0.0346195
entirely	1	0.0153924

[33,]
of	365	0.79303
and	57	0.123484
for	25	0.0539202
than	6	0.0126111
o	4	0.00826138
voice	1	0.0017459
thanked	1	0.00173978
kissed	1	0.00173978

[34,]
little	77	0.141193
mock	43	0.0786819
march	24	0.0437698
great	23	0.0419371
white	23	0.0419316
very	22	0.0397463
first	15	0.0273015
poor	15	0.0272397
same	14	0.0254152
good	14	0.0253946
other	14	0.0250476
long	13	0.0236045
large	13	0.0235839
right	11	0.0199473
whole	10	0.0180212
next	9	0.0162586
curious	9	0.0162311
three	8	0.014441
last	8	0.0143654
old	7	0.0125341
own	7	0.0125135
few	7	0.0125066
jury	6	0.0107151
best	6	0.0106959
bright	5	0.00883707
guinea	5	0.0088302
young	5	0.0088302
sudden	5	0.0088302
glass	4	0.00701261
most	4	0.00701124
melancholy	4	0.00700437

[35,]
again	49	0.356198
tone	23	0.16643
now	17	0.122692
voice	15	0.108053
before	10	0.0715532
soup	8	0.0569455
high	7	0.0496442
eagerly	3	0.0204493
leaves	3	0.0204397
that	1	0.00590562
afterwards	1	0.00584538

[36,]
or	38	0.434449
very	18	0.204593
without	16	0.181601
rather	7	0.0781777
half	5	0.0551982
merely	2	0.0206905
suddenly	1	0.00920978

[37,]
queen	62	0.0469432
king	52	0.0393377
hatter	48	0.0361549
gryphon	45	0.0340227
duchess	35	0.0264297
time	35	0.0264278
mouse	32	0.0241768
dormouse	31	0.0234113
cat	28	0.0211146
head	26	0.0196398
way	22	0.0166463
moment	22	0.0165651
door	19	0.0142935
caterpillar	19	0.014281
rabbit	16	0.0120268
court	16	0.0120093
eyes	15	0.0112738
voice	14	0.0105395
minute	14	0.010497
words	14	0.0104957
little	13	0.00973645
table	12	0.0089784
bit	12	0.00897215
baby	12	0.00896591
air	12	0.00896591
side	11	0.0082191
house	11	0.00821286
end	11	0.00821286
day	10	0.00752227
other	10	0.00748479
question	10	0.00747855

$M=2$の場合

次に深さを2に固定した場合の結果です。

異なるシードで2通り実験したのでまず1つめのシードの結果を載せていきます。

image

[]
more (35) one (30) much (18) once (15) day (16) down (15) yet (14) nothing (12) all (11) anything (12) not (12) well (11) two (13) enough (14) seen (14) only (9) tea (11) talking (12) indeed (11) sure (10) first (10) things (10) in (9) use (11) bill (9) here (8) mad (11) three (9) done (10) being (10) that (7) 
	[0,]
	all (1) never (1) from (1) 
		[0,0,]
		'll (52) must (37) could (35) would (29) might (23) can (23) 'd (22) should (22) never (16) shall (16) only (11) may (9) soon (9) will (7) all (6) ever (6) hardly (6) even (5) just (5) better (5) do (4) almost (3) dare (3) possibly (3) really (2) first (1) well (1) he (1) usually (1) incessantly (1) 
		[0,1,]
		in (168) at (159) with (67) into (50) for (45) on (45) to (39) by (39) after (34) all (25) from (19) upon (18) near (11) among (11) through (10) under (10) join (9) over (7) against (7) before (5) you (4) along (4) between (4) across (4) both (3) is (2) i (1) which (1) as (1) but (1) meant (1) 
		[0,2,]
		she (443) i (437) it (258) you (197) alice (132) they (115) he (95) there (62) we (25) dinah (5) people (4) nobody (4) nothing (3) anything (2) all (1) to (1) though (1) dinn (1) 
		[0,3,]
		know (64) think (30) thought (27) heard (17) wish (17) found (14) say (12) spoke (11) knew (11) wonder (10) suppose (9) saw (9) like (7) believe (7) all (7) mean (6) give (5) understand (4) remembered (4) breathe (3) hope (3) swim (3) or (2) let (2) felt (2) passed (2) says (2) fancied (2) considered (2) stretched (2) is (1) 
		[0,4,]
		not (39) just (31) never (17) turning (11) saying (9) trying (7) exactly (7) had (7) even (5) tried (5) all (5) half (5) tis (4) said (2) of (2) always (2) heard (2) n't (2) learning (2) from (1) often (1) after (1) near (1) seemed (1) 
		[0,5,]
		dear (25) oh (17) please (15) my (14) dinah (6) do (5) so (2) and (1) never (1) 
		[0,6,]
		but (106) if (78) when (65) so (57) that (57) as (49) and (42) then (32) for (32) till (16) before (10) yet (9) while (8) now (8) perhaps (8) whether (8) here (6) said (5) though (5) thought (4) even (4) really (4) suddenly (4) shall (4) all (3) found (3) still (3) i (2) first (2) nothing (2) only (2) 
		[0,7,]
		n't (171) s (158) not (56) you (44) they (8) it (6) i (5) so (3) cats (3) become (2) let (1) bats (1) 
		[0,8,]
		and (447) who (29) or (21) but (21) which (20) wondering (2) you (1) 
		[0,9,]
		of (21) let (14) will (11) tell (5) thank (2) 
		[0,10,]
		of (240) 
		[0,11,]
		' (161) do (68) did (49) would (32) could (25) ca (23) wo (21) are (15) does (12) was (11) had (11) is (9) have (8) can (6) sha (6) should (4) may (2) hastily (2) passed (1) 
		[0,12,]
		said (262) thought (25) cried (9) and (5) for (5) while (5) shouted (5) added (4) continued (3) interrupted (3) yer (3) asked (2) exclaimed (2) sighed (2) screamed (2) pleaded (2) from (2) persisted (1) muttered (1) roared (1) inquired (1) 
		[0,13,]
		to (482) would (18) and (9) will (7) always (2) 
		[0,14,]
		first (7) consider (4) give (3) does (1) 
		[0,15,]
		of (22) or (18) 
		[0,16,]
		as (139) because (10) until (5) 
		[0,17,]
		was (205) had (135) 'm (49) were (47) 've (39) 're (29) are (22) began (21) felt (14) all (13) have (9) kept (6) stood (6) both (5) am (4) fell (4) hastily (3) suddenly (3) still (3) never (2) 'd (2) i (1) passed (1) spoke (1) now (1) 
		[0,18,]
		all (38) last (19) once (12) which (8) first (6) least (5) fact (4) reply (4) asking (3) trying (2) 
		[0,19,]
		well (20) soon (12) far (7) long (6) curious (6) politely (5) hard (4) follows (3) ever (2) continued (2) large (2) usual (2) often (1) sure (1) important (1) 
		[0,20,]
		poor (7) 
		[0,21,]
		said (94) began (26) seemed (20) replied (18) see (14) added (13) ought (13) tried (12) was (10) did (9) used (8) waited (8) repeated (8) could (7) say (7) noticed (7) is (6) remarked (6) beg (6) asked (5) seems (5) drew (5) whispered (4) are (3) continued (3) meant (3) guessed (3) always (2) does (2) were (2) interrupted (2) 
		[0,22,]
		and (193) with (27) without (21) half (7) like (6) has (6) while (4) 
		[0,23,]
		tone (29) voice (19) before (12) is (9) yet (1) 
		[0,24,]
		at (12) anxiously (4) 
		[0,25,]
		is (53) was (46) am (12) makes (10) for (6) did (5) doth (3) muttered (1) 
		[0,26,]
		than (16) 
		[0,27,]
		you (26) course (16) 
		[0,28,]
		or (19) were (16) 
		[0,29,]
		
		[0,30,]
		
		[0,31,]
		she (11) began (7) ever (2) 
	[1,]
	be (117) have (51) get (37) go (34) like (24) tell (21) see (18) make (19) do (17) take (15) put (12) hear (12) got (9) remember (12) eat (12) try (11) look (10) talk (10) find (8) come (8) think (8) made (4) speak (7) keep (8) explain (7) leave (8) grow (6) change (6) call (7) help (6) learn (7) 
		[1,0,]
		it (166) her (60) them (57) you (46) me (43) him (32) this (29) time (12) us (12) sight (8) like (2) 
		[1,1,]
		! (368) ? (159) 
		[1,2,]
		again (66) here (25) now (17) together (8) herself (8) alice (5) it (4) nothing (4) aloud (3) growing (2) go (1) find (1) trouble (1) 
		[1,3,]
		turtle (45) hare (23) 
		[1,4,]
		alice (164) five (3) honour (3) me (1) 
		[1,5,]
		in (91) with (23) such (21) like (16) took (11) 
		[1,6,]
		out (55) notice (3) made (2) half (2) 
		[1,7,]
		on (97) up (79) down (65) off (49) out (42) back (29) about (27) in (27) round (19) over (17) away (16) alice (6) come (5) be (2) like (2) itself (2) take (1) 
		[1,8,]
		went (67) looked (25) got (17) turned (13) ran (12) put (12) sat (11) took (10) gave (9) made (9) came (8) walked (8) shook (7) called (6) set (5) left (5) moved (5) taught (4) puzzled (3) go (2) grow (2) making (2) spread (2) decided (2) have (1) caught (1) 
		[1,9,]
		to (65) about (46) for (29) behind (9) round (8) made (7) like (3) get (1) got (1) explain (1) 
		[1,10,]
		been (23) going (13) i (10) got (9) nothing (8) beginning (7) ready (7) begun (5) delighted (2) 
		[1,11,]
		it (29) herself (20) them (12) itself (4) help (2) settled (1) 
		[1,12,]
		ootiful (3) 
		[1,13,]
		herself (35) say (18) do (17) see (16) her (8) find (7) ask (6) begin (6) happen (6) speak (5) itself (5) know (5) think (4) listen (4) be (3) have (1) hear (1) change (1) work (1) 
		[1,14,]
		
		[1,15,]
		
		[1,16,]
		
		[1,17,]
		getting (7) now (6) grown (6) certainly (2) 
	[2,]
	<eos> (1101) 
		[2,0,]
		. (948) 
	[3,]
	every (1) old (1) how (1) three (1) then (1) 
		[3,0,]
		, (1921) where (1) difficulty (1) 
		[3,1,]
		little (73) white (23) great (23) very (21) large (17) first (16) long (16) right (14) other (13) same (12) next (11) whole (10) poor (10) good (9) curious (9) last (9) three (8) golden (7) few (7) cheshire (7) guinea (6) old (5) young (5) two (4) queer (4) best (4) sharp (4) shrill (4) most (4) unfortunate (3) short (3) 
		[3,2,]
		a (472) the (381) an (48) this (18) some (14) these (9) such (8) our (6) great (5) those (5) two (4) another (4) several (1) pennyworth (1) 
		[3,3,]
		very (87) no (53) quite (44) so (34) all (25) rather (18) too (17) only (13) any (12) as (10) good (5) perfectly (4) another (3) something (3) the (2) looking (2) a (2) next (1) most (1) 
		[3,4,]
		the (452) 
		[3,5,]
		your (15) their (1) 
		[3,6,]
		the (461) broken (2) 
		[3,7,]
		this (30) some (23) one (19) any (16) another (9) soo (4) tea (2) front (2) no (1) either (1) 
		[3,8,]
		her (123) his (69) its (41) their (39) my (31) one (28) your (28) each (5) large (3) every (2) second (1) other (1) name (1) 
		[3,9,]
		what (104) that (87) how (54) this (24) who (14) where (11) why (10) there (9) everything (6) all (5) either (5) only (4) something (4) one (3) next (2) mine (2) every (1) poor (1) rather (1) 
		[3,10,]
		your (10) beautiful (6) old (4) feeling (4) mary (3) o (3) father (3) the (1) 
		[3,11,]
		two (13) great (5) any (3) drink (3) no (1) 
		[3,12,]
		next (2) 
	[4,]
	then (9) bill (4) why (4) well (2) now (3) sir (3) alas (3) no (3) oh (1) miss (2) hush (1) prizes (1) pat (1) dears (2) yes (1) mostly (2) indeed (1) what (1) feet (1) serpent (1) remarking (1) lacie (1) lad (1) 
		[4,0,]
		oh (19) well (18) why (16) however (15) no (9) yes (9) ah (5) here (4) then (2) treacle (1) twinkle (1) 
		[4,1,]
		evidence (3) verdict (3) pardon (3) tongue (2) 
		[4,2,]
		come (19) miss (1) treacle (1) 
		[4,3,]
		now (15) bill (1) hush (1) prizes (1) 
		[4,4,]
		wow (3) hush (1) pat (1) 
		[4,5,]
		twinkle (7) sir (4) alas (1) 
	[5,]
	
		[5,0,]
		inches (5) feet (4) which (3) o (3) later (2) 
		[5,1,]
		beau (3) sure (1) 
		[5,2,]
		came (12) 
		[5,3,]
		with (5) next (4) anything (3) 
		[5,4,]
		so (25) gone (1) 
		[5,5,]
		looked (12) looking (7) changed (6) 
		[5,6,]
		close (7) come (1) 
		[5,7,]
		else (10) 
		[5,8,]
		looking (15) all (9) going (6) running (6) sitting (5) called (4) fallen (4) jumping (3) anything (1) come (1) came (1) leaving (1) 
		[5,9,]
		sure (7) afraid (6) old (6) gone (5) glad (3) 
		[5,10,]
		
		[5,11,]
		then (28) 
	[6,]
	voice (8) rabbit (7) door (7) way (6) words (5) party (5) bottle (5) mouse (4) pig (6) cat (4) house (5) pool (5) little (5) queen (3) hookah (5) whiting (5) world (3) side (3) question (3) new (4) middle (4) time (2) bit (4) place (3) pigs (4) creatures (3) reason (4) piece (3) garden (2) gryphon (2) game (2) 
		[6,0,]
		mock (45) march (23) 
		[6,1,]
		head (39) eyes (19) hand (14) voice (13) face (9) way (8) hands (8) heads (7) foot (7) mouth (7) own (7) life (6) pocket (6) arms (6) eye (6) slates (6) sister (5) history (5) feet (5) nose (5) side (4) arm (4) hair (4) chin (4) tail (4) flamingo (4) garden (3) temper (3) neck (3) knee (3) toes (3) 
		[6,2,]
		time (35) way (22) moment (21) minutes (9) rate (8) question (7) minute (5) morning (4) words (3) world (2) capital (2) party (1) witness (1) 
		[6,3,]
		queen (51) king (48) hatter (38) duchess (37) gryphon (36) dormouse (26) mouse (21) caterpillar (20) cat (17) rabbit (17) dodo (11) footman (10) cook (10) pigeon (8) soldiers (6) knave (6) trial (5) lory (5) game (4) others (4) rest (4) executioner (4) youth (3) words (2) party (2) owl (2) jurors (2) master (2) panther (2) voice (1) world (1) 
		[6,4,]
		door (15) jury (13) table (12) air (11) court (11) dance (11) baby (10) other (10) house (9) sea (9) wood (8) glass (7) queen (6) garden (6) ground (6) chimney (6) box (5) window (5) trees (5) words (4) pool (4) hall (4) mushroom (4) distance (4) players (4) game (3) lobster (3) sort (3) sky (3) jurymen (3) shore (3) 
		[6,5,]
		majesty (10) soup (6) mouse (3) william (3) ann (3) fellow (3) thing (1) 
		[6,6,]
		sort (13) kind (5) opportunity (4) tired (3) number (3) pack (2) sorts (2) piece (1) 
		[6,7,]
		much (26) many (6) 
		[6,8,]
		minute (14) tone (7) moral (6) pair (4) girl (1) air (1) hour (1) present (1) 
		[6,9,]
		end (7) top (7) fan (4) roof (4) bottom (4) officers (3) edge (3) boots (2) flurry (1) 
		[6,10,]
		thing (29) rabbit (16) witness (9) hurry (9) cat (8) key (7) question (6) deal (5) size (5) height (4) gardeners (3) creatures (2) lady (2) candle (2) kid (1) bat (1) 
		[6,11,]
		oop (4) side (2) 
		[6,12,]
		low (10) melancholy (5) offended (3) sulky (2) new (1) 
	[7,]
	that (27) little (10) having (5) being (4) things (1) shrinking (2) 
		[7,0,]
		of (122) little (1) 
		[7,1,]
		that (60) things (12) little (1) 
		[7,2,]
		that (10) 
		[7,3,]
		things (4) 
	[8,]
	one (1) else (1) prettier (1) 
	[9,]
	
	[10,]
	
		[10,0,]
		high (8) 
		[10,1,]
		clock (1) 
		[10,2,]
		clock (1) 
		[10,3,]
		
		[10,4,]
		way (3) editions (2) clock (1) 
[]
more	35	0.00730701
one	30	0.006486
much	18	0.0038998
once	15	0.00348929
day	16	0.00348929
down	15	0.00324298
yet	14	0.00307878
nothing	12	0.00303773
all	11	0.00287353
anything	12	0.00287353
not	12	0.00287353
well	11	0.00287353
two	13	0.00283248
enough	14	0.00279142
seen	14	0.00279142
only	9	0.00246302
tea	11	0.00246302
talking	12	0.00246302
indeed	11	0.00246302
sure	10	0.00246302
first	10	0.00242197
things	10	0.00242197
in	9	0.00225776
use	11	0.00225776
bill	9	0.00225776
here	8	0.00225776
mad	11	0.00225776
three	9	0.00221671
done	10	0.00221671
being	10	0.00217566
that	7	0.00205251

[0,]
all	1	0.0267383
never	1	0.0118546
from	1	0.0113656

[0,0,]
'll	52	0.142338
must	37	0.10113
could	35	0.095719
would	29	0.0792374
might	23	0.0626694
can	23	0.0622041
'd	22	0.0599639
should	22	0.0599639
never	16	0.0436076
shall	16	0.0434816
only	11	0.0297495
may	9	0.0242924
soon	9	0.0242515
will	7	0.018798
all	6	0.0163898
ever	6	0.0160532
hardly	6	0.0159682
even	5	0.0133044
just	5	0.0132651
better	5	0.0132235
do	4	0.0105585
almost	3	0.00772628
dare	3	0.0077262
possibly	3	0.00772587
really	2	0.00502243
first	1	0.00236103
well	1	0.0022785
he	1	0.00227404
usually	1	0.00223152
incessantly	1	0.00223152

[0,1,]
in	168	0.208492
at	159	0.197545
with	67	0.0831194
into	50	0.0619553
for	45	0.055815
on	45	0.055737
to	39	0.0483133
by	39	0.0482739
after	34	0.0420749
all	25	0.0310586
from	19	0.0234734
upon	18	0.0221743
near	11	0.0134681
among	11	0.0134485
through	10	0.012205
under	10	0.0122049
join	9	0.0109609
over	7	0.00847434
against	7	0.00847339
before	5	0.00602484
you	4	0.00482082
along	4	0.00474228
between	4	0.00474224
across	4	0.00474209
both	3	0.00351816
is	2	0.00235278
i	1	0.00108933
which	1	0.00106946
as	1	0.00105092
but	1	0.00105065
meant	1	0.00103048

[0,2,]
she	443	0.247936
i	437	0.244591
it	258	0.144353
you	197	0.110214
alice	132	0.0737999
they	115	0.0642863
he	95	0.0530882
there	62	0.0346066
we	25	0.0138897
dinah	5	0.0026957
people	4	0.00213203
nobody	4	0.00213179
nothing	3	0.00157735
anything	2	0.00101246
all	1	0.000501971
to	1	0.000461907
though	1	0.000456917
dinn	1	0.000451966

[0,3,]
know	64	0.212005
think	30	0.0990499
thought	27	0.0892049
heard	17	0.0559245
wish	17	0.0558614
found	14	0.0459586
say	12	0.0393126
spoke	11	0.0359881
knew	11	0.0359285
wonder	10	0.0326064
suppose	9	0.0293438
saw	9	0.0292848
like	7	0.0226922
believe	7	0.0226393
all	7	0.0225844
mean	6	0.0193172
give	5	0.0160552
understand	4	0.0126735
remembered	4	0.0126729
breathe	3	0.00935075
hope	3	0.00935075
swim	3	0.00874768
or	2	0.00621034
let	2	0.00620975
felt	2	0.00614956
passed	2	0.00614898
says	2	0.00602977
fancied	2	0.0060286
considered	2	0.0060286
stretched	2	0.0060286
is	1	0.00300973

[0,4,]
not	39	0.229724
just	31	0.182385
never	17	0.0997819
turning	11	0.0639649
saying	9	0.0521351
trying	7	0.0403759
exactly	7	0.0402983
had	7	0.0393506
even	5	0.028622
tried	5	0.0285423
all	5	0.0280732
half	5	0.0274403
tis	4	0.022548
said	2	0.0110274
of	2	0.010935
always	2	0.010874
heard	2	0.0107965
n't	2	0.0107919
learning	2	0.0107136
from	1	0.00509726
often	1	0.00487737
after	1	0.00487645
near	1	0.00487569
seemed	1	0.00487569

[0,5,]
dear	25	0.288381
oh	17	0.195368
please	15	0.172115
my	14	0.160489
dinah	6	0.0675188
do	5	0.0559545
so	2	0.021065
and	1	0.00965161
never	1	0.00955071

[0,6,]
but	106	0.162613
if	78	0.119534
when	65	0.0995654
so	57	0.0873393
that	57	0.0872794
as	49	0.0750575
and	42	0.0641311
then	32	0.0488775
for	32	0.0487037
till	16	0.0242974
before	10	0.0151486
yet	9	0.0135833
while	8	0.0121106
now	8	0.0120375
perhaps	8	0.0120094
whether	8	0.0120088
here	6	0.00866655
said	5	0.00753624
though	5	0.00743473
thought	4	0.00593341
even	4	0.00593275
really	4	0.00589997
suddenly	4	0.00589931
shall	4	0.00589891
all	3	0.00467107
found	3	0.00436449
still	3	0.00436323
i	2	0.00292867
first	2	0.0028974
nothing	2	0.00283072
only	2	0.00282979

[0,7,]
n't	171	0.372941
s	158	0.344544
not	56	0.121856
you	44	0.0956935
they	8	0.0170535
it	6	0.0126871
i	5	0.0105424
so	3	0.0061477
cats	3	0.00612495
become	2	0.00394071
let	1	0.00179582
bats	1	0.00175748

[0,8,]
and	447	0.825912
who	29	0.0532398
or	21	0.0384745
but	21	0.0384674
which	20	0.0362563
wondering	2	0.00333329
you	1	0.00151399

[0,9,]
of	21	0.392548
let	14	0.260506
will	11	0.203866
tell	5	0.0905871
thank	2	0.0339939

[0,10,]
of	240	0.999133

[0,11,]
'	161	0.345074
do	68	0.145551
did	49	0.104778
would	32	0.0682988
could	25	0.0532769
ca	23	0.048943
wo	21	0.0442431
are	15	0.0318612
does	12	0.0253811
was	11	0.0233202
had	11	0.023256
is	9	0.0190068
have	8	0.0167725
can	6	0.0125056
sha	6	0.0124631
should	4	0.00819241
may	2	0.00392169
hastily	2	0.00390225
passed	1	0.00177581

[0,12,]
said	262	0.758396
thought	25	0.071975
cried	9	0.0255344
and	5	0.0141351
for	5	0.0140707
while	5	0.0140374
shouted	5	0.01394
added	4	0.0111063
continued	3	0.0082075
interrupted	3	0.0081754
yer	3	0.00814216
asked	2	0.00530998
exclaimed	2	0.00527636
sighed	2	0.00524401
screamed	2	0.00524395
pleaded	2	0.00524369
from	2	0.00478941
persisted	1	0.0023779
muttered	1	0.0023779
roared	1	0.00234523
inquired	1	0.00234523

[0,13,]
to	482	0.930111
would	18	0.0343758
and	9	0.0170206
will	7	0.0131406
always	2	0.00348846

[0,14,]
first	7	0.453548
consider	4	0.253271
give	3	0.186781
does	1	0.0536724

[0,15,]
of	22	0.544955
or	18	0.444985

[0,16,]
as	139	0.901268
because	10	0.0636403
until	5	0.0311746

[0,17,]
was	205	0.32718
had	135	0.215758
'm	49	0.0780959
were	47	0.0749388
've	39	0.0620961
're	29	0.0460964
are	22	0.0349823
began	21	0.0333389
felt	14	0.0221393
all	13	0.0207111
have	9	0.0141147
kept	6	0.00929777
stood	6	0.00929719
both	5	0.00771855
am	4	0.0061182
fell	4	0.00609745
hastily	3	0.00451983
suddenly	3	0.00451885
still	3	0.00451885
never	2	0.00298257
'd	2	0.00291825
i	1	0.00138239
passed	1	0.00133944
spoke	1	0.00131828
now	1	0.00131508

[0,18,]
all	38	0.374752
last	19	0.186161
once	12	0.116867
which	8	0.0774059
first	6	0.0576105
least	5	0.0475599
fact	4	0.0376608
reply	4	0.0376594
asking	3	0.0277604
trying	2	0.0179081

[0,19,]
well	20	0.267724
soon	12	0.15962
far	7	0.091961
long	6	0.0784512
curious	6	0.0784502
politely	5	0.0649366
hard	4	0.0514278
follows	3	0.0379132
ever	2	0.024606
continued	2	0.0246004
large	2	0.0244074
usual	2	0.0244025
often	1	0.0109922
sure	1	0.0109005
important	1	0.0108947

[0,20,]
poor	7	0.970103

[0,21,]
said	94	0.274537
began	26	0.0755973
seemed	20	0.0579971
replied	18	0.0515652
see	14	0.0403972
added	13	0.0375873
ought	13	0.0374722
tried	12	0.0346055
was	10	0.0284632
did	9	0.0258911
used	8	0.0228547
waited	8	0.0228533
repeated	8	0.0228527
could	7	0.0200433
say	7	0.0199883
noticed	7	0.0199294
is	6	0.0172935
remarked	6	0.0170622
beg	6	0.0170049
asked	5	0.0141972
seems	5	0.014081
drew	5	0.014081
whispered	4	0.0111572
are	3	0.00846509
continued	3	0.00834778
meant	3	0.00829108
guessed	3	0.00823326
always	2	0.00542668
does	2	0.00542556
were	2	0.00542545
interrupted	2	0.00536763

[0,22,]
and	193	0.730377
with	27	0.101539
without	21	0.0787971
half	7	0.0257991
like	6	0.021994
has	6	0.0219809
while	4	0.0136928

[0,23,]
tone	29	0.411399
voice	19	0.268562
before	12	0.168646
is	9	0.125903
yet	1	0.0114956

[0,24,]
at	12	0.737153
anxiously	4	0.237407

[0,25,]
is	53	0.388376
was	46	0.336939
am	12	0.0868107
makes	10	0.072077
for	6	0.042784
did	5	0.0353728
doth	3	0.0206102
muttered	1	0.0059343

[0,26,]
than	16	0.986909

[0,27,]
you	26	0.614255
course	16	0.376121

[0,28,]
or	19	0.537099
were	16	0.451381

[0,29,]

[0,30,]

[0,31,]
she	11	0.539866
began	7	0.34004
ever	2	0.0901686

[1,]
be	117	0.131474
have	51	0.0575009
get	37	0.0411503
go	34	0.0390009
like	24	0.030336
tell	21	0.0226522
see	18	0.0204855
make	19	0.0204759
do	17	0.019426
take	15	0.017013
put	12	0.0150353
hear	12	0.0139376
got	9	0.012859
remember	12	0.0128495
eat	12	0.012651
try	11	0.0117613
look	10	0.0107593
talk	10	0.0106809
find	8	0.0106732
come	8	0.00962333
think	8	0.0096042
made	4	0.00852373
speak	7	0.00850652
keep	8	0.00850652
explain	7	0.00849695
leave	8	0.00849695
grow	6	0.0074356
change	6	0.00742604
call	7	0.00741839
help	6	0.00740883
learn	7	0.00740883

[1,0,]
it	166	0.355049
her	60	0.128073
them	57	0.121634
you	46	0.0980751
me	43	0.0916565
him	32	0.068097
this	29	0.0616731
time	12	0.0252715
us	12	0.0252713
sight	8	0.0167062
like	2	0.00399786

[1,1,]
!	368	0.6979
?	159	0.301323

[1,2,]
again	66	0.453779
here	25	0.171063
now	17	0.115893
together	8	0.0538064
herself	8	0.0524909
alice	5	0.033182
it	4	0.026307
nothing	4	0.0262837
aloud	3	0.0193264
growing	2	0.0124519
go	1	0.00627262
find	1	0.00572369
trouble	1	0.00559736

[1,3,]
turtle	45	0.658732
hare	23	0.33525

[1,4,]
alice	164	0.957858
five	3	0.0163776
honour	3	0.0163775
me	1	0.00468777

[1,5,]
in	91	0.560472
with	23	0.140738
such	21	0.128405
like	16	0.097714
took	11	0.0666816

[1,6,]
out	55	0.883754
notice	3	0.0451939
made	2	0.0291389
half	2	0.0290652

[1,7,]
on	97	0.19959
up	79	0.162485
down	65	0.133613
off	49	0.100638
out	42	0.0861976
back	29	0.0593866
about	27	0.0552781
in	27	0.0552711
round	19	0.0387762
over	17	0.034645
away	16	0.0325831
alice	6	0.0119878
come	5	0.00996436
be	2	0.00463563
like	2	0.00392455
itself	2	0.00373275
take	1	0.00176907

[1,8,]
went	67	0.26939
looked	25	0.0992361
got	17	0.0680404
turned	13	0.0516317
ran	12	0.0475993
put	12	0.0471245
sat	11	0.0435681
took	10	0.0395861
gave	9	0.0355285
made	9	0.0348757
came	8	0.0315478
walked	8	0.031471
shook	7	0.0274389
called	6	0.023458
set	5	0.0194516
left	5	0.0194012
moved	5	0.0193747
taught	4	0.0153426
puzzled	3	0.0113112
go	2	0.00817142
grow	2	0.00743196
making	2	0.00735509
spread	2	0.00727861
decided	2	0.00727861
have	1	0.00457272
caught	1	0.00327196

[1,9,]
to	65	0.381165
about	46	0.269432
for	29	0.169425
behind	9	0.0517722
round	8	0.0459034
made	7	0.0400984
like	3	0.0168283
get	1	0.00519212
got	1	0.00485764
explain	1	0.00480606

[1,10,]
been	23	0.271509
going	13	0.152382
i	10	0.116672
got	9	0.105026
nothing	8	0.0929335
beginning	7	0.0809854
ready	7	0.0809619
begun	5	0.0571785
delighted	2	0.021445

[1,11,]
it	29	0.423561
herself	20	0.291208
them	12	0.173539
itself	4	0.0559285
help	2	0.0265985
settled	1	0.0117982

[1,12,]
ootiful	3	0.930294

[1,13,]
herself	35	0.233659
say	18	0.119481
do	17	0.113293
see	16	0.106612
her	8	0.0524938
find	7	0.0459361
ask	6	0.0390407
begin	6	0.0389794
happen	6	0.0389794
speak	5	0.0324529
itself	5	0.032299
know	5	0.0322695
think	4	0.025773
listen	4	0.0255575
be	3	0.0198209
have	1	0.00699335
hear	1	0.00576255
change	1	0.00557858
work	1	0.0055166

[1,14,]

[1,15,]

[1,16,]

[1,17,]
getting	7	0.323901
now	6	0.276137
grown	6	0.276093
certainly	2	0.0857935

[2,]
<eos>	1101	0.998902

[2,0,]
.	948	0.999779

[3,]
every	1	0.0179871
old	1	0.0179102
how	1	0.0178845
three	1	0.0117411
then	1	0.00534644

[3,0,]
,	1921	0.998852
where	1	0.000419659
difficulty	1	0.000417655

[3,1,]
little	73	0.190364
white	23	0.0598238
great	23	0.0598217
very	21	0.0547619
large	17	0.0442952
first	16	0.0415073
long	16	0.0415018
right	14	0.0362705
other	13	0.0338248
same	12	0.0310316
next	11	0.0285865
whole	10	0.0257933
poor	10	0.0256176
good	9	0.0233499
curious	9	0.023177
last	9	0.023177
three	8	0.0207383
golden	7	0.0179401
few	7	0.0179401
cheshire	7	0.0175901
guinea	6	0.0149724
old	5	0.0130532
young	5	0.0127047
two	4	0.010443
queer	4	0.0100918
best	4	0.0100897
sharp	4	0.0100876
shrill	4	0.0100869
most	4	0.0099119
unfortunate	3	0.0074692
short	3	0.0074692

[3,2,]
a	472	0.48343
the	381	0.390266
an	48	0.0489897
this	18	0.018289
some	14	0.0141723
these	9	0.00903112
such	8	0.00800661
our	6	0.00595744
great	5	0.00498781
those	5	0.00493292
two	4	0.00394576
another	4	0.00394489
several	1	0.00083454
pennyworth	1	0.000834481

[3,3,]
very	87	0.258463
no	53	0.157353
quite	44	0.130414
so	34	0.100655
all	25	0.0739484
rather	18	0.0531117
too	17	0.0500615
only	13	0.0382346
any	12	0.0348128
as	10	0.0292287
good	5	0.0144227
perfectly	4	0.0113703
another	3	0.00854652
something	3	0.00847223
the	2	0.00579814
looking	2	0.00549496
a	2	0.00549435
next	1	0.00274811
most	1	0.00259462

[3,4,]
the	452	0.999553

[3,5,]
your	15	0.92488
their	1	0.050263

[3,6,]
the	461	0.99526
broken	2	0.0038922

[3,7,]
this	30	0.278815
some	23	0.21328
one	19	0.176035
any	16	0.148106
another	9	0.0825712
soo	4	0.0356069
tea	2	0.0169223
front	2	0.016917
no	1	0.0078136
either	1	0.00769368

[3,8,]
her	123	0.330138
his	69	0.18498
its	41	0.109713
their	39	0.104385
my	31	0.0823428
one	28	0.07487
your	28	0.0748641
each	5	0.0129416
large	3	0.00761463
every	2	0.00497445
second	1	0.00228538
other	1	0.00223859
name	1	0.00219071

[3,9,]
what	104	0.299188
that	87	0.250199
how	54	0.154669
this	24	0.0687926
who	14	0.0398281
where	11	0.0312558
why	10	0.0283019
there	9	0.0254207
everything	6	0.0167754
all	5	0.0139691
either	5	0.0139664
only	4	0.0110867
something	4	0.0110854
one	3	0.00828437
next	2	0.00554276
mine	2	0.0052483
every	1	0.00251327
poor	1	0.00251208
rather	1	0.00243818

[3,10,]
your	10	0.288996
beautiful	6	0.171082
old	4	0.11258
feeling	4	0.111978
mary	3	0.0825724
o	3	0.0825724
father	3	0.0825711
the	1	0.0252724

[3,11,]
two	13	0.51253
great	5	0.192903
any	3	0.112937
drink	3	0.112163
no	1	0.0327132

[3,12,]
next	2	0.898737

[4,]
then	9	0.124213
bill	4	0.0609521
why	4	0.0608574
well	2	0.0483501
now	3	0.0482044
sir	3	0.0481134
alas	3	0.0481134
no	3	0.0456658
oh	1	0.035475
miss	2	0.035475
hush	1	0.0354568
prizes	1	0.0228293
pat	1	0.0228183
dears	2	0.0228001
yes	1	0.0228001
mostly	2	0.0228001
indeed	1	0.0103438
what	1	0.0102856
feet	1	0.0102528
serpent	1	0.0102346
remarking	1	0.0101435
lacie	1	0.0101435
lad	1	0.0101435

[4,0,]
oh	19	0.188795
well	18	0.179034
why	16	0.161184
however	15	0.149748
no	9	0.0900837
yes	9	0.0894809
ah	5	0.0487473
here	4	0.0386522
then	2	0.0214544
treacle	1	0.00868247
twinkle	1	0.00868103

[4,1,]
evidence	3	0.255064
verdict	3	0.255062
pardon	3	0.255062
tongue	2	0.164234

[4,2,]
come	19	0.895109
miss	1	0.0391071
treacle	1	0.0387407

[4,3,]
now	15	0.823934
bill	1	0.0471611
hush	1	0.0460144
prizes	1	0.0454465

[4,4,]
wow	3	0.560117
hush	1	0.163998
pat	1	0.162459

[4,5,]
twinkle	7	0.567353
sir	4	0.318847
alas	1	0.0690549

[5,]

[5,0,]
inches	5	0.283409
feet	4	0.224629
which	3	0.165831
o	3	0.165831
later	2	0.10704

[5,1,]
beau	3	0.700354
sure	1	0.204259

[5,2,]
came	12	0.983323

[5,3,]
with	5	0.400715
next	4	0.317458
anything	3	0.235506

[5,4,]
so	25	0.953806
gone	1	0.0314868

[5,5,]
looked	12	0.472314
looking	7	0.27302
changed	6	0.232409

[5,6,]
close	7	0.849995
come	1	0.102245

[5,7,]
else	10	0.979453

[5,8,]
looking	15	0.26656
all	9	0.158167
going	6	0.104588
running	6	0.1023
sitting	5	0.084448
called	4	0.0688772
fallen	4	0.0688755
jumping	3	0.0510216
anything	1	0.016621
come	1	0.0166063
came	1	0.0166027
leaving	1	0.0153137

[5,9,]
sure	7	0.253499
afraid	6	0.215507
old	6	0.215506
gone	5	0.179442
glad	3	0.104439

[5,10,]

[5,11,]
then	28	0.992659

[6,]
voice	8	0.0177598
rabbit	7	0.0177598
door	7	0.0177405
way	6	0.0160094
words	5	0.0141818
party	5	0.0141239
bottle	5	0.0123156
mouse	4	0.0106618
pig	6	0.0105575
cat	4	0.010546
house	5	0.0105074
pool	5	0.0105074
little	5	0.00879559
queen	3	0.00869907
hookah	5	0.00869907
whiting	5	0.00869907
world	3	0.00869907
side	3	0.00869907
question	3	0.00869907
new	4	0.00869907
middle	4	0.00869907
time	2	0.00698731
bit	4	0.00696414
place	3	0.0069487
pigs	4	0.00692939
creatures	3	0.00692939
reason	4	0.00692553
piece	3	0.00691009
garden	2	0.00690623
gryphon	2	0.00689079
game	2	0.00689079

[6,0,]
mock	45	0.658735
march	23	0.335254

[6,1,]
head	39	0.154631
eyes	19	0.0749545
hand	14	0.0550343
voice	13	0.0516598
face	9	0.035114
way	8	0.0316746
hands	8	0.0304017
heads	7	0.0272818
foot	7	0.027147
mouth	7	0.0271462
own	7	0.0271455
life	6	0.0231645
pocket	6	0.0231623
arms	6	0.0231616
eye	6	0.0231616
slates	6	0.0231616
sister	5	0.0193133
history	5	0.0192461
feet	5	0.019182
nose	5	0.0191791
side	4	0.015465
arm	4	0.0152638
hair	4	0.015263
chin	4	0.0152623
tail	4	0.0151945
flamingo	4	0.0151938
garden	3	0.0114138
temper	3	0.0112777
neck	3	0.0112106
knee	3	0.0112105
toes	3	0.0112099

[6,2,]
time	35	0.290128
way	22	0.182
moment	21	0.173391
minutes	9	0.0733594
rate	8	0.0650265
question	7	0.0568511
minute	5	0.0400679
morning	4	0.0316963
words	3	0.0236398
world	2	0.0151879
capital	2	0.01507
party	1	0.00697328
witness	1	0.00673732

[6,3,]
queen	51	0.124356
king	48	0.116925
hatter	38	0.0924441
duchess	37	0.0900311
gryphon	36	0.0876499
dormouse	26	0.0631386
mouse	21	0.0510424
caterpillar	20	0.0484673
cat	17	0.0412607
rabbit	17	0.0408989
dodo	11	0.0264948
footman	10	0.0240129
cook	10	0.0239868
pigeon	8	0.0191285
soldiers	6	0.0142386
knave	6	0.0142064
trial	5	0.0117632
lory	5	0.0117618
game	4	0.0094122
others	4	0.00934912
rest	4	0.00931718
executioner	4	0.00931657
youth	3	0.00687198
words	2	0.00465087
party	2	0.00464985
owl	2	0.00445859
jurors	2	0.00445859
master	2	0.00442705
panther	2	0.00442671
voice	1	0.00226901
world	1	0.00210929

[6,4,]
door	15	0.0620447
jury	13	0.0546899
table	12	0.0504314
air	11	0.0461763
court	11	0.0460994
dance	11	0.0460977
baby	10	0.041922
other	10	0.0417697
house	9	0.0379017
sea	9	0.0375114
wood	8	0.0332538
glass	7	0.0290779
queen	6	0.0250577
garden	6	0.0249798
ground	6	0.0240496
chimney	6	0.023971
box	5	0.0205677
window	5	0.0205669
trees	5	0.0204892
words	4	0.0167857
pool	4	0.016626
hall	4	0.016392
mushroom	4	0.0163135
distance	4	0.0162332
players	4	0.0162332
game	3	0.0122138
lobster	3	0.0121352
sort	3	0.0120566
sky	3	0.0120566
jurymen	3	0.0119789
shore	3	0.0119781

[6,5,]
majesty	10	0.337886
soup	6	0.200097
mouse	3	0.0970366
william	3	0.0966804
ann	3	0.0965904
fellow	3	0.0965897
thing	1	0.0278301

[6,6,]
sort	13	0.387921
kind	5	0.145483
opportunity	4	0.115276
tired	3	0.0848962
number	3	0.0848952
pack	2	0.0546886
sorts	2	0.0546004
piece	1	0.0245721

[6,7,]
much	26	0.806021
many	6	0.181235

[6,8,]
minute	14	0.394324
tone	7	0.194299
moral	6	0.165735
pair	4	0.108609
girl	1	0.0231675
air	1	0.0230843
hour	1	0.0230012
present	1	0.0229198

[6,9,]
end	7	0.194494
top	7	0.194306
fan	4	0.108806
roof	4	0.108617
bottom	4	0.108616
officers	3	0.0800529
edge	3	0.0800529
boots	2	0.0514905
flurry	1	0.0229264

[6,10,]
thing	29	0.264349
rabbit	16	0.145464
witness	9	0.080823
hurry	9	0.0807703
cat	8	0.0718636
key	7	0.0624239
question	6	0.0534623
deal	5	0.0441823
size	5	0.0440792
height	4	0.0349029
gardeners	3	0.0257295
creatures	2	0.0167163
lady	2	0.016556
candle	2	0.0165554
kid	1	0.00754169
bat	1	0.00743519

[6,11,]
oop	4	0.63238
side	2	0.300094

[6,12,]
low	10	0.46652
melancholy	5	0.22862
offended	3	0.133431
sulky	2	0.0857447
new	1	0.0384953

[7,]
that	27	0.574955
little	10	0.184373
having	5	0.0749973
being	4	0.0594136
things	1	0.0437965
shrinking	2	0.0281287

[7,0,]
of	122	0.990205
little	1	0.00711806

[7,1,]
that	60	0.815719
things	12	0.162827
little	1	0.0160333

[7,2,]
that	10	0.982591

[7,3,]
things	4	0.949925

[8,]
one	1	0.267095
else	1	0.265906
prettier	1	0.265822

[9,]

[10,]

[10,0,]
high	8	0.977278

[10,1,]
clock	1	0.888958

[10,2,]
clock	1	0.888958

[10,3,]

[10,4,]
way	3	0.479415
editions	2	0.313014
clock	1	0.180403

$M=2$

深さを2に固定した時の結果です。

また別のシードで行った時のものになります。

image

[]
all (25) one (17) more (18) three (16) time (14) looking (14) that (12) nothing (12) day (10) here (11) two (10) tea (13) talking (13) first (8) indeed (12) done (12) alice (10) silence (12) then (8) now (9) something (11) voice (8) tears (10) as (8) right (8) look (8) saying (10) said (7) thinking (9) everything (8) while (7) 
	[0,]
	me (2) him (1) her (2) all (1) first (1) it (1) , (2) from (2) itself (1) once (1) both (1) which (1) hold (1) dark (1) along (1) twenty (1) 
		[0,0,]
		<eos> (1101) 
		[0,1,]
		s (156) 
		[0,2,]
		what (43) afraid (7) anything (5) things (3) say (3) trying (1) 
		[0,3,]
		. (946) ! (95) ? (61) 
		[0,4,]
		again (68) here (20) it (16) her (12) hastily (7) myself (7) together (7) cats (6) first (1) trying (1) 
		[0,5,]
		out (42) one (29) hold (4) fond (3) coming (2) cats (1) 
		[0,6,]
		, (886) ! (262) ? (93) mice (3) first (2) here (1) 
		[0,7,]
		up (75) down (69) off (60) out (45) about (37) in (31) over (30) round (29) back (24) away (18) through (13) things (10) him (7) near (6) you (5) along (4) first (3) ? (3) both (2) all (1) one (1) cats (1) 
		[0,8,]
		on (72) alice (4) back (4) , (1) in (1) pale (1) 
		[0,9,]
		it (87) herself (29) them (25) her (19) me (13) us (12) him (11) anxiously (8) all (7) itself (6) this (5) from (5) anything (4) one (3) ready (2) em (2) looking (1) lying (1) 
		[0,10,]
		going (22) looking (9) trying (8) close (8) lying (7) sitting (7) near (4) beginning (4) ready (4) coming (3) obliged (3) surprised (2) delighted (2) nowhere (2) learning (2) him (1) you (1) out (1) dark (1) over (1) ever (1) filled (1) 
		[0,11,]
		it (41) all (24) once (18) them (17) her (15) last (15) alice (12) this (11) which (9) first (6) reply (4) him (4) fact (3) bill (3) livery (3) instance (3) one (2) trying (2) tiptoe (2) yesterday (1) 
		[0,12,]
		herself (29) her (11) say (10) me (5) itself (4) day (4) you (3) sea (3) him (2) this (1) cats (1) 
		[0,13,]
		gloves (5) 
		[0,14,]
		things (1) 
		[0,15,]
		
		[0,16,]
		
		[0,17,]
		
		[0,18,]
		
		[0,19,]
		
		[0,20,]
		
		[0,21,]
		
		[0,22,]
		me (12) him (1) all (1) 
	[1,]
	be (115) have (33) do (29) go (25) say (16) tell (15) take (13) come (12) talk (13) like (8) see (10) get (9) eat (8) make (10) begin (11) hear (9) speak (8) put (7) ask (9) think (7) find (7) give (7) keep (7) got (6) explain (7) seem (7) feel (7) change (6) grow (6) work (6) draw (6) 
		[1,0,]
		said (258) thought (23) cried (14) poor (7) added (7) shouted (6) continued (4) interrupted (3) screamed (3) exclaimed (3) pleaded (2) dropped (1) while (1) 
		[1,1,]
		of (374) for (29) than (13) eat (1) 
		[1,2,]
		do (75) would (75) could (51) 'll (41) did (40) must (33) will (27) should (26) ca (26) can (25) 'd (23) shall (22) does (17) wo (16) have (12) may (12) might (11) had (8) has (6) sha (4) need (2) 
		[1,3,]
		, (1026) 
		[1,4,]
		' (156) 
		[1,5,]
		know (56) think (32) see (21) course (16) suppose (12) like (11) wish (11) wonder (11) believe (7) beg (6) mean (5) swim (3) get (2) dare (2) breathe (2) have (1) go (1) eat (1) please (1) must (1) 
		[1,6,]
		in (251) at (149) with (131) to (101) on (73) into (54) for (53) by (43) about (36) after (35) all (34) like (27) upon (20) from (15) under (13) among (11) against (7) between (6) took (4) behind (4) have (1) eat (1) has (1) 
		[1,7,]
		as (182) that (95) if (79) when (60) how (19) till (19) before (17) perhaps (10) than (7) whether (6) have (2) suppose (2) 
		[1,8,]
		looked (33) found (19) put (11) gave (10) took (10) called (9) shook (8) opened (5) jumped (4) told (3) puzzled (3) held (3) taught (3) stretched (2) beat (1) 
		[1,9,]
		been (27) 
		[1,10,]
		went (52) go (12) 
		[1,11,]
		get (23) see (20) make (11) find (11) look (11) leave (8) put (5) turn (5) be (1) go (1) speak (1) kneel (1) 
		[1,12,]
		come (15) 
		[1,13,]
		let (15) tell (7) at (7) please (6) take (4) fetch (4) hold (3) call (2) consider (2) be (1) have (1) give (1) keep (1) mind (1) bring (1) if (1) 
		[1,14,]
		like (15) remember (5) help (5) matter (5) know (4) forgotten (4) join (4) finish (3) have (2) be (1) manage (1) 
		[1,15,]
		call (4) 
		[1,16,]
		
		[1,17,]
		
		[1,18,]
		
		[1,19,]
		
		[1,20,]
		
		[1,21,]
		
		[1,22,]
		
		[1,23,]
		
		[1,24,]
		all (5) got (3) try (2) hear (1) 
	[2,]
	but (3) she (2) that (2) so (1) this (1) you (1) here (1) even (1) or (1) really (1) never (1) seven (1) how (1) william (1) a (1) four (1) an (1) who (1) exactly (1) those (1) his (1) heavy (1) people (1) poor (1) unimportant (1) 
		[2,0,]
		to (484) is (5) 
		[2,1,]
		little (97) great (27) very (24) large (17) long (15) good (14) curious (12) old (6) many (5) bright (5) nice (4) every (4) queer (4) sharp (4) new (4) golden (4) melancholy (4) dead (2) excellent (2) wonderful (2) only (1) really (1) rather (1) the (1) heavy (1) sudden (1) puzzling (1) quiet (1) magic (1) 
		[2,2,]
		you (70) it (58) that (49) them (31) me (19) this (10) with (9) i (8) yourself (8) hearts (7) next (6) lessons (6) gone (5) sure (5) but (3) what (3) almost (3) which (2) really (1) being (1) just (1) where (1) twinkle (1) waiting (1) 
		[2,3,]
		the (530) 
		[2,4,]
		she (377) alice (122) he (78) they (65) it (57) i (8) nobody (5) please (4) but (4) nothing (3) next (3) everybody (3) you (2) ' (2) seven (1) fury (1) hurriedly (1) 
		[2,5,]
		n't (149) not (42) you (41) have (14) it (6) only (6) i (6) be (6) never (5) better (5) become (5) they (4) all (4) hardly (4) quite (4) just (3) ever (3) rather (2) nearly (2) soon (2) cross (2) almost (2) possibly (2) even (2) she (1) so (1) really (1) alice (1) well (1) he (1) your (1) 
		[2,6,]
		why (21) well (21) however (19) oh (18) no (16) now (12) yes (11) down (8) twinkle (6) wow (6) certainly (5) please (5) ah (5) here (4) two (3) five (3) sir (3) first (3) hush (3) miss (3) but (2) sure (2) never (2) serpent (2) alas (2) she (1) really (1) very (1) these (1) nothing (1) too (1) 
		[2,7,]
		i (397) you (206) they (52) we (25) of (16) 
		[2,8,]
		alice (149) five (2) two (1) seven (1) 
		[2,9,]
		and (224) but (111) so (42) for (31) just (18) while (14) because (10) which (9) or (8) now (7) even (6) though (6) until (5) exactly (4) still (4) only (3) where (3) are (2) that (1) sure (1) they (1) never (1) not (1) is (1) cross (1) yelled (1) 
		[2,10,]
		dear (27) oh (18) my (15) old (9) william (4) sir (4) bill (4) mary (4) ann (4) dinah (3) father (3) ma (3) fellow (3) pat (2) they (1) 
		[2,11,]
		and (168) who (33) which (7) being (5) feeling (4) but (1) 
		[2,12,]
		it (200) that (66) there (66) she (39) what (34) he (21) this (19) who (16) which (11) how (9) why (9) where (8) alice (7) everything (6) only (3) or (3) dinah (3) tis (3) here (2) these (2) certainly (2) ever (2) but (1) being (1) quite (1) william (1) her (1) 
		[2,13,]
		the (760) her (127) his (70) a (64) its (45) their (38) my (33) your (23) this (23) one (21) our (7) these (6) being (5) no (3) large (3) each (3) without (2) here (1) 
		[2,14,]
		a (430) an (40) another (15) two (8) four (3) nine (3) without (1) 
		[2,15,]
		and (286) or (26) without (16) half (13) is (9) but (8) very (8) turning (8) so (6) rather (6) are (4) 
		[2,16,]
		this (43) your (31) some (26) that (12) beautiful (11) any (10) every (8) great (5) yer (3) another (2) everybody (2) an (1) 
		[2,17,]
		are (38) am (12) makes (8) 
		[2,18,]
		well (18) soon (9) glad (9) sure (8) far (7) curious (5) large (4) hard (4) nearly (3) quickly (2) 
		[2,19,]
		or (15) 
		[2,20,]
		very (77) not (49) quite (44) so (39) such (26) as (19) all (18) too (14) getting (14) just (13) n't (12) only (10) rather (9) good (9) always (7) dreadfully (5) nearly (4) now (3) certainly (3) gone (3) perfectly (3) many (2) sure (1) even (1) i (1) these (1) nice (1) exact (1) 
		[2,21,]
		what (27) how (21) those (4) 
		[2,22,]
		no (40) any (15) some (14) it (6) nothing (5) hardly (3) which (2) so (1) they (1) dinah (1) exactly (1) 
		[2,23,]
		then (40) so (14) four (3) 
		[2,24,]
		not (4) 
		[2,25,]
		edwin (2) 
		[2,26,]
		she (1) 
		[2,27,]
		very (1) 
	[3,]
	had (1) saw (1) was (1) tried (1) sighed (1) fell (1) felt (1) hardly (1) 
		[3,0,]
		was (241) were (47) had (37) felt (17) did (9) stood (6) seemed (5) remembered (4) drew (4) made (3) ran (3) sounded (3) considered (3) goes (2) making (2) remained (2) nibbled (2) got (1) left (1) vanished (1) 
		[3,1,]
		got (18) made (16) turned (14) sat (13) came (11) ran (10) kept (10) walked (9) come (7) left (6) hurried (5) set (3) fell (3) goes (3) crowded (2) brightened (2) saw (1) was (1) tried (1) passed (1) offended (1) violently (1) 
		[3,2,]
		tried (16) had (14) seemed (11) ought (8) got (7) went (7) used (6) set (5) seems (4) ventured (3) wanted (2) were (1) occurred (1) 
		[3,3,]
		had (86) all (21) only (9) hastily (6) soon (4) suddenly (3) instantly (3) both (2) cautiously (2) saw (1) sat (1) generally (1) still (1) 
		[3,4,]
		said (96) thought (28) replied (18) asked (13) added (13) spoke (12) could (11) remarked (8) say (6) whispered (5) liked (4) passed (3) continued (3) answered (3) exclaimed (3) says (3) seemed (2) belongs (2) noticed (2) hurried (1) heard (1) 
		[3,5,]
		heard (21) knew (8) seen (7) saw (6) repeated (5) replied (4) watched (2) could (1) moved (1) 
		[3,6,]
		began (44) came (18) waited (9) swam (4) went (2) 
		[3,7,]
		've (37) never (24) ever (6) 
		[3,8,]
		then (30) 
		[3,9,]
		grow (4) 
		[3,10,]
		might (13) means (4) 
		[3,11,]
		were (4) 
		[3,12,]
		sighed (2) repeated (1) 
		[3,13,]
		'm (40) 're (25) were (13) 
	[4,]
	e (1) stupid (1) 
		[4,0,]
		turtle (49) hare (24) 
		[4,1,]
		sort (16) top (4) evening (2) 
		[4,2,]
		o (5) soo (4) beau (2) 
		[4,3,]
		minute (16) hour (2) day (1) 
		[4,4,]
		mouse (10) ootiful (2) minute (1) day (1) 
		[4,5,]
		first (9) own (7) last (7) simple (5) few (4) e (3) smallest (2) turtle (1) 
		[4,6,]
		minute (2) 
		[4,7,]
		
		[4,8,]
		oop (4) clock (3) 
	[5,]
	soup (8) minutes (3) way (3) rate (3) pardon (4) question (3) children (2) tongue (4) curiosity (4) day (2) hair (3) history (2) garden (1) honour (3) time (1) else (1) temper (2) direction (2) remark (1) moment (1) speech (1) verdict (2) wine (1) book (1) last (1) savage (1) nor (1) choice (1) crumbs (1) elegant (1) 
		[5,0,]
		time (37) moment (15) soup (1) minutes (1) children (1) 
		[5,1,]
		business (4) day (3) speech (1) 
		[5,2,]
		way (21) 
		[5,3,]
		yet (12) way (6) garden (2) 
		[5,4,]
		now (12) children (1) 
		[5,5,]
		idea (7) 
		[5,6,]
		garden (10) history (2) 
		[5,7,]
		
		[5,8,]
		majesty (9) morning (4) rate (3) business (2) lovely (2) minutes (1) wine (1) 
		[5,9,]
		else (2) rules (1) 
		[5,10,]
		
		[5,11,]
		remark (4) minutes (3) question (2) figure (2) 
	[6,]
	head (41) eyes (20) door (17) house (18) hand (14) words (14) court (13) voice (12) air (11) side (10) queen (9) game (10) face (10) pool (10) arm (9) life (9) cat (7) fan (9) other (7) feet (7) birds (8) baby (8) distance (8) tail (7) rest (7) sea (7) best (6) wood (7) table (6) trees (7) bottle (6) 
		[6,0,]
		queen (52) gryphon (46) hatter (43) king (42) duchess (36) dormouse (27) mouse (23) cat (19) caterpillar (19) rabbit (17) dodo (10) footman (9) jury (9) cook (9) pigeon (8) dance (5) knave (4) puppy (3) executioner (3) youth (3) lory (3) eaglet (3) trial (2) hedgehog (2) duck (2) soldiers (1) party (1) others (1) 
		[6,1,]
		kid (5) gardeners (3) 
		[6,2,]
		thing (33) reason (6) question (6) table (5) creatures (2) shrill (2) moment (1) footman (1) 
		[6,3,]
		white (23) three (6) young (4) guinea (4) 
		[6,4,]
		mock (49) march (24) 
		[6,5,]
		feet (4) jury (4) lobster (4) inches (3) mile (2) foot (1) well (1) 
		[6,6,]
		bit (9) end (8) tone (8) moral (7) opportunity (6) pair (5) shriek (5) number (4) pack (3) pattering (3) edge (2) 
		[6,7,]
		way (9) witness (7) side (6) size (6) cat (4) verse (3) pack (2) 
		[6,8,]
		other (19) same (15) next (13) poor (12) right (10) whole (7) first (5) best (4) cheshire (4) proper (3) glass (2) two (2) second (2) fish (1) most (1) 
		[6,9,]
		
		[6,10,]
		deal (8) 
		[6,11,]
		door (6) girl (4) bottle (3) key (3) dream (3) 
		[6,12,]
		low (9) deep (4) 
		[6,13,]
		moment (9) whiting (2) 
	[7,]
	rabbit (16) pigs (1) lady (2) so (1) 
		[7,0,]
		pigs (1) 
		[7,1,]
		two (12) 
		[7,2,]
		pigs (1) 
		[7,3,]
		rabbit (1) 
	[8,]
	voice (2) tone (1) high (2) 
		[8,0,]
		voice (14) tone (11) 
		[8,1,]
		tone (5) voice (2) 
		[8,2,]
		high (10) box (4) quadrille (3) 
	[9,]
	is (14) was (11) so (6) for (4) not (1) pardon (1) 
		[9,0,]
		is (58) was (24) so (5) matters (2) for (1) 
		[9,1,]
		before (12) 
		[9,2,]
		not (6) love (3) for (2) 
		[9,3,]
		is (1) 
	[10,]
	more (16) enough (11) use (6) right (4) notion (2) 
	[11,]
	
		[11,0,]
		much (13) tired (3) easily (3) long (1) 
		[11,1,]
		much (2) 
		[11,2,]
		much (6) 
		[11,3,]
		much (7) long (6) politely (5) extremely (2) 
	[12,]
	times (3) just (1) 
		[12,0,]
		just (8) 
		[12,1,]
		times (3) 
	[13,]
	from (3) 
	[14,]
	running (2) hunting (2) patiently (2) 
		[14,0,]
		skimming (1) 
		[14,1,]
		nearer (3) 
		[14,2,]
		
		[14,3,]
		running (1) 
	[15,]
	
		[15,0,]
		this (6) 
		[15,1,]
		nothing (2) 
		[15,2,]
		nothing (3) 
		[15,3,]
		
		[15,4,]
		
		[15,5,]
		upstairs (1) 
	[16,]
	
		[16,0,]
		often (2) 
[]
all	25	0.00612012
one	17	0.00425017
more	18	0.00403767
three	16	0.00361269
time	14	0.00335769
looking	14	0.0031877
that	12	0.0031452
nothing	12	0.0029752
day	10	0.00289021
here	11	0.00276271
two	10	0.00276271
tea	13	0.00276271
talking	13	0.00276271
first	8	0.00267771
indeed	12	0.00255021
done	12	0.00255021
alice	10	0.00250772
silence	12	0.00250772
then	8	0.00233772
now	9	0.00233772
something	11	0.00233772
voice	8	0.00229522
tears	10	0.00212523
as	8	0.00212523
right	8	0.00212523
look	8	0.00212523
saying	10	0.00208273
said	7	0.00191273
thinking	9	0.00191273
everything	8	0.00191273
while	7	0.00191273

[0,]
me	2	0.0462568
him	1	0.0462145
her	2	0.0462103
all	1	0.0290094
first	1	0.028667
it	1	0.0226065
,	2	0.0225473
from	2	0.0166643
itself	1	0.0165882
once	1	0.0108405
both	1	0.0108151
which	1	0.0106926
hold	1	0.0106926
dark	1	0.0106926
along	1	0.0106714
twenty	1	0.00478843

[0,0,]
<eos>	1101	0.99981

[0,1,]
s	156	0.99866

[0,2,]
what	43	0.690306
afraid	7	0.109754
anything	5	0.0776159
things	3	0.0454806
say	3	0.0453643
trying	1	0.0133407

[0,3,]
.	946	0.858253
!	95	0.0860305
?	61	0.0551811

[0,4,]
again	68	0.46762
here	20	0.136694
it	16	0.109271
her	12	0.0820142
hastily	7	0.0469595
myself	7	0.0469592
together	7	0.0469592
cats	6	0.0403092
first	1	0.00591422
trying	1	0.0058291

[0,5,]
out	42	0.516234
one	29	0.355854
hold	4	0.0470675
fond	3	0.0346347
coming	2	0.0223798
cats	1	0.0102115

[0,6,]
,	886	0.710361
!	262	0.209953
?	93	0.0744341
mice	3	0.00224998
first	2	0.00147127
here	1	0.000652135

[0,7,]
up	75	0.157849
down	69	0.145191
off	60	0.126204
out	45	0.0946743
about	37	0.0776831
in	31	0.0647175
over	30	0.0629717
round	29	0.0608059
back	24	0.0503143
away	18	0.0375984
through	13	0.0270499
things	10	0.0208374
him	7	0.0147952
near	6	0.01234
you	5	0.0102881
along	4	0.00812049
first	3	0.00618585
?	3	0.00606838
both	2	0.00390257
all	1	0.00196986
one	1	0.00191051
cats	1	0.0019066

[0,8,]
on	72	0.865027
alice	4	0.0459365
back	4	0.0459341
,	1	0.00996606
in	1	0.00988012
pale	1	0.00970725

[0,9,]
it	87	0.362086
herself	29	0.1202
them	25	0.103534
her	19	0.0775511
me	13	0.052553
us	12	0.0485363
him	11	0.0458858
anxiously	8	0.0325908
all	7	0.0288894
itself	6	0.0244843
this	5	0.0204327
from	5	0.0203192
anything	4	0.0160385
one	3	0.0121062
ready	2	0.00781863
em	2	0.00759134
looking	1	0.0036575
lying	1	0.00353817

[0,10,]
going	22	0.232147
looking	9	0.0923794
trying	8	0.0841704
close	8	0.0814145
lying	7	0.0729014
sitting	7	0.0725861
near	4	0.0409911
beginning	4	0.0406834
ready	4	0.0391787
coming	3	0.0303559
obliged	3	0.0300374
surprised	2	0.0194048
delighted	2	0.0194003
nowhere	2	0.0194003
learning	2	0.0194003
him	1	0.0109726
you	1	0.00939601
out	1	0.00939488
dark	1	0.00907956
over	1	0.00907844
ever	1	0.008771
filled	1	0.00876537

[0,11,]
it	41	0.209731
all	24	0.122701
once	18	0.0915225
them	17	0.0863912
her	15	0.0769386
last	15	0.0760029
alice	12	0.0607562
this	11	0.0548666
which	9	0.0453677
first	6	0.0303903
reply	4	0.0195937
him	4	0.0195057
fact	3	0.0144672
bill	3	0.014467
livery	3	0.0144658
instance	3	0.0144658
one	2	0.00974831
trying	2	0.00973971
tiptoe	2	0.00933782
yesterday	1	0.00421074

[0,12,]
herself	29	0.39479
her	11	0.149324
say	10	0.134554
me	5	0.0671446
itself	4	0.0525498
day	4	0.0521996
you	3	0.0388543
sea	3	0.0384955
him	2	0.0260531
this	1	0.0116405
cats	1	0.0116386

[0,13,]
gloves	5	0.958288

[0,14,]
things	1	0.795559

[0,15,]

[0,16,]

[0,17,]

[0,18,]

[0,19,]

[0,20,]

[0,21,]

[0,22,]
me	12	0.830654
him	1	0.059774
all	1	0.0587793

[1,]
be	115	0.141148
have	33	0.0465013
do	29	0.0355232
go	25	0.0345279
say	16	0.0189965
tell	15	0.0189567
take	13	0.0165392
come	12	0.0154037
talk	13	0.0153416
like	8	0.0142185
see	10	0.0141564
get	9	0.0129837
eat	8	0.0129464
make	10	0.0129464
begin	11	0.0129464
hear	9	0.0117612
speak	8	0.0106133
put	7	0.0105885
ask	9	0.0105512
think	7	0.00936606
find	7	0.00935364
give	7	0.00935364
keep	7	0.00915139
got	6	0.00817841
explain	7	0.00816847
seem	7	0.00815605
feel	7	0.00815605
change	6	0.00703299
grow	6	0.00697088
work	6	0.00697088
draw	6	0.00697088

[1,0,]
said	258	0.7765
thought	23	0.0686807
cried	14	0.0415823
poor	7	0.0204894
added	7	0.020489
shouted	6	0.0174771
continued	4	0.0114533
interrupted	3	0.0084416
screamed	3	0.00844119
exclaimed	3	0.00844119
pleaded	2	0.00542915
dropped	1	0.00242661
while	1	0.00241798

[1,1,]
of	374	0.896384
for	29	0.0685917
than	13	0.0307
eat	1	0.00194978

[1,2,]
do	75	0.135776
would	75	0.135512
could	51	0.0920349
'll	41	0.0739191
did	40	0.072108
must	33	0.0594357
will	27	0.0485574
should	26	0.0467457
ca	26	0.0467457
can	25	0.0449416
'd	23	0.0413201
shall	22	0.0394994
does	17	0.0304418
wo	16	0.0286301
have	12	0.0217311
may	12	0.0213838
might	11	0.0195724
had	8	0.014156
has	6	0.0105236
sha	4	0.00689134
need	2	0.00326821

[1,3,]
,	1026	0.999796

[1,4,]
'	156	0.998655

[1,5,]
know	56	0.276291
think	32	0.157604
see	21	0.103246
course	16	0.0782342
suppose	12	0.058456
like	11	0.0537449
wish	11	0.0535252
wonder	11	0.0534827
believe	7	0.0336812
beg	6	0.028731
mean	5	0.023828
swim	3	0.013904
get	2	0.00916818
dare	2	0.00895349
breathe	2	0.00892972
have	1	0.00488327
go	1	0.0046456
eat	1	0.00421719
please	1	0.00402726
must	1	0.00400324

[1,6,]
in	251	0.234395
at	149	0.139075
with	131	0.122252
to	101	0.0940336
on	73	0.0680415
into	54	0.0502844
for	53	0.0493613
by	43	0.0400154
about	36	0.0334626
after	35	0.0325277
all	34	0.0316059
like	27	0.0251131
upon	20	0.018509
from	15	0.0138364
under	13	0.011967
among	11	0.0100979
against	7	0.00635962
between	6	0.00542505
took	4	0.00356158
behind	4	0.00337495
have	1	0.000965385
eat	1	0.000808274
has	1	0.000757808

[1,7,]
as	182	0.365058
that	95	0.190363
if	79	0.15824
when	60	0.120083
how	19	0.0377551
till	19	0.0377549
before	17	0.0337392
perhaps	10	0.0196831
than	7	0.0136648
whether	6	0.011651
have	2	0.00383942
suppose	2	0.00362488

[1,8,]
looked	33	0.264519
found	19	0.151654
put	11	0.0873468
gave	10	0.0791366
took	10	0.0790788
called	9	0.0709865
shook	8	0.0629217
opened	5	0.0387597
jumped	4	0.0306662
told	3	0.0226549
puzzled	3	0.0226036
held	3	0.0226024
taught	3	0.0226024
stretched	2	0.0145385
beat	1	0.00656185

[1,9,]
been	27	0.992242

[1,10,]
went	52	0.809258
go	12	0.181551

[1,11,]
get	23	0.232949
see	20	0.202368
make	11	0.110511
find	11	0.110423
look	11	0.110278
leave	8	0.0796076
put	5	0.049235
turn	5	0.0489991
be	1	0.0116332
go	1	0.00901145
speak	1	0.00842341
kneel	1	0.0081863

[1,12,]
come	15	0.986225

[1,13,]
let	15	0.259861
tell	7	0.120345
at	7	0.119402
please	6	0.101927
take	4	0.0675862
fetch	4	0.0669119
hold	3	0.0493044
call	2	0.0318303
consider	2	0.0316288
be	1	0.0219801
have	1	0.0166509
give	1	0.0145593
keep	1	0.0145479
mind	1	0.0142242
bring	1	0.0142221
if	1	0.0141547

[1,14,]
like	15	0.298598
remember	5	0.0982814
help	5	0.0979885
matter	5	0.0979881
know	4	0.0777007
forgotten	4	0.0775835
join	4	0.0775834
finish	3	0.0572384
have	2	0.0390138
be	1	0.023264
manage	1	0.0166654

[1,15,]
call	4	0.94787

[1,16,]

[1,17,]

[1,18,]

[1,19,]

[1,20,]

[1,21,]

[1,22,]

[1,23,]

[1,24,]
all	5	0.43624
got	3	0.254916
try	2	0.164
hear	1	0.0735265

[2,]
but	3	0.0309293
she	2	0.0204384
that	2	0.0175735
so	1	0.0173932
this	1	0.015263
you	1	0.015243
here	1	0.0128255
even	1	0.0126773
or	1	0.0126613
really	1	0.0126453
never	1	0.0100836
seven	1	0.0074098
how	1	0.00740179
william	1	0.00738977
a	1	0.00736974
four	1	0.00736974
an	1	0.00734971
who	1	0.00734971
exactly	1	0.00734971
those	1	0.00480809
his	1	0.00475201
heavy	1	0.00473198
people	1	0.00227048
poor	1	0.0022144
unimportant	1	0.0021503

[2,0,]
to	484	0.989348
is	5	0.00982196

[2,1,]
little	97	0.364622
great	27	0.101236
very	24	0.0900933
large	17	0.0635605
long	15	0.0558957
good	14	0.0521808
curious	12	0.0446339
old	6	0.0219933
many	5	0.0182801
bright	5	0.0181605
nice	4	0.0144468
every	4	0.0144464
queer	4	0.0143898
sharp	4	0.0143875
new	4	0.014387
golden	4	0.014387
melancholy	4	0.014387
dead	2	0.0068406
excellent	2	0.00684014
wonderful	2	0.00684014
only	1	0.0033078
really	1	0.00330553
rather	1	0.00324525
the	1	0.00318543
heavy	1	0.00312607
sudden	1	0.00306761
puzzling	1	0.00306761
quiet	1	0.00306716
magic	1	0.0030667

[2,2,]
you	70	0.226884
it	58	0.187971
that	49	0.158096
them	31	0.100034
me	19	0.0610755
this	10	0.0320852
with	9	0.028609
i	8	0.0255456
yourself	8	0.0253613
hearts	7	0.021511
next	6	0.0189157
lessons	6	0.0188683
gone	5	0.0156684
sure	5	0.015202
but	3	0.00963387
what	3	0.00922104
almost	3	0.00917373
which	2	0.00611135
really	1	0.00281942
being	1	0.00277309
just	1	0.00277274
where	1	0.00277239
twinkle	1	0.00268078
waiting	1	0.00263685

[2,3,]
the	530	0.999607

[2,4,]
she	377	0.512055
alice	122	0.16554
he	78	0.105743
they	65	0.088121
it	57	0.0769939
i	8	0.010663
nobody	5	0.00653268
please	4	0.00518757
but	4	0.00505135
nothing	3	0.00384369
next	3	0.00382939
everybody	3	0.00382879
you	2	0.00252453
'	2	0.00245667
seven	1	0.0011253
fury	1	0.00109809
hurriedly	1	0.00109797

[2,5,]
n't	149	0.450999
not	42	0.126882
you	41	0.12336
have	14	0.0418639
it	6	0.0179667
only	6	0.0178537
i	6	0.0178511
be	6	0.0176235
never	5	0.0147653
better	5	0.0145938
become	5	0.0145912
they	4	0.011847
all	4	0.0116304
hardly	4	0.0116185
quite	4	0.0111265
just	3	0.0087028
ever	3	0.00864823
rather	2	0.00567258
nearly	2	0.00560439
soon	2	0.00555908
cross	2	0.00555776
almost	2	0.00555776
possibly	2	0.00550057
even	2	0.00512531
she	1	0.0028707
so	1	0.00280417
really	1	0.00270044
alice	1	0.00264666
well	1	0.00264491
he	1	0.00258474
your	1	0.00258474

[2,6,]
why	21	0.108504
well	21	0.107655
however	19	0.0979888
oh	18	0.0919317
no	16	0.0825572
now	12	0.0617308
yes	11	0.0563243
down	8	0.0407015
twinkle	6	0.0303802
wow	6	0.0302839
certainly	5	0.0252678
please	5	0.0251722
ah	5	0.0250759
here	4	0.0192173
two	3	0.0148597
five	3	0.0147566
sir	3	0.0147553
first	3	0.0146682
hush	3	0.0146598
miss	3	0.0146598
but	2	0.0105037
sure	2	0.00993028
never	2	0.00974265
serpent	2	0.00945536
alas	2	0.0094517
she	1	0.00491262
really	1	0.00462811
very	1	0.00462592
these	1	0.00453181
nothing	1	0.00444428
too	1	0.00433994

[2,7,]
i	397	0.570125
you	206	0.295708
they	52	0.0744463
we	25	0.0356385
of	16	0.022704

[2,8,]
alice	149	0.972539
five	2	0.0117892
two	1	0.0052686
seven	1	0.00526764

[2,9,]
and	224	0.434635
but	111	0.21509
so	42	0.0813529
for	31	0.0598283
just	18	0.0346712
while	14	0.0268204
because	10	0.019052
which	9	0.0168645
or	8	0.0152833
now	7	0.0132859
even	6	0.0114
though	6	0.0112853
until	5	0.00934324
exactly	4	0.00745856
still	4	0.00740258
only	3	0.00557562
where	3	0.00554536
are	2	0.00357514
that	1	0.0017448
sure	1	0.00171919
they	1	0.00171897
never	1	0.00166321
not	1	0.00166277
is	1	0.00163408
cross	1	0.00160491
yelled	1	0.0015764

[2,10,]
dear	27	0.257729
oh	18	0.171351
my	15	0.142432
old	9	0.0847442
william	4	0.0367488
sir	4	0.0366719
bill	4	0.0365976
mary	4	0.0365961
ann	4	0.0365961
dinah	3	0.0271343
father	3	0.0269834
ma	3	0.0269817
fellow	3	0.0269817
pat	2	0.0173678
they	1	0.00813153

[2,11,]
and	168	0.76973
who	33	0.150493
which	7	0.0312757
being	5	0.0220729
feeling	4	0.0174425
but	1	0.00384122

[2,12,]
it	200	0.372589
that	66	0.122963
there	66	0.122785
she	39	0.0722504
what	34	0.0631446
he	21	0.0388904
this	19	0.0352508
who	16	0.0295622
which	11	0.0203252
how	9	0.0165034
why	9	0.0164735
where	8	0.0142943
alice	7	0.0128043
everything	6	0.010847
only	3	0.00537145
or	3	0.00537047
dinah	3	0.0053094
tis	3	0.00524851
here	2	0.00350674
these	2	0.00347409
certainly	2	0.00344353
ever	2	0.00307179
but	1	0.00185084
being	1	0.00160845
quite	1	0.00160799
william	1	0.00157812
her	1	0.00154775

[2,13,]
the	760	0.61574
her	127	0.10277
his	70	0.056579
a	64	0.0517253
its	45	0.0363113
their	38	0.0306387
my	33	0.0265955
your	23	0.0185002
this	23	0.0183639
one	21	0.0167085
our	7	0.00551736
these	6	0.00473264
being	5	0.00392228
no	3	0.00229304
large	3	0.00229291
each	3	0.00227609
without	2	0.00148261
here	1	0.00068997

[2,14,]
a	430	0.859604
an	40	0.0796191
another	15	0.0296128
two	8	0.0156211
four	3	0.00562067
nine	3	0.00560591
without	1	0.00162075

[2,15,]
and	286	0.732843
or	26	0.0662239
without	16	0.0405535
half	13	0.0328325
is	9	0.0226055
but	8	0.0201747
very	8	0.0200708
turning	8	0.0200115
so	6	0.01497
rather	6	0.014928
are	4	0.00978499

[2,16,]
this	43	0.278143
your	31	0.200102
some	26	0.167596
that	12	0.0768934
beautiful	11	0.0701584
any	10	0.0637069
every	8	0.0507201
great	5	0.0312409
yer	3	0.0182137
another	2	0.0117619
everybody	2	0.0117616
an	1	0.00530948

[2,17,]
are	38	0.651689
am	12	0.203436
makes	8	0.134482

[2,18,]
well	18	0.258228
soon	9	0.127657
glad	9	0.127579
sure	8	0.11347
far	7	0.0985985
curious	5	0.069693
large	4	0.0552786
hard	4	0.0551277
nearly	3	0.0407738
quickly	2	0.0261448

[2,19,]
or	15	0.986186

[2,20,]
very	77	0.197612
not	49	0.125597
quite	44	0.112742
so	39	0.1
such	26	0.0663541
as	19	0.0483621
all	18	0.0458362
too	14	0.0355456
getting	14	0.0355074
just	13	0.0330532
n't	12	0.0304041
only	10	0.0253825
rather	9	0.0227707
good	9	0.0226922
always	7	0.0175121
dreadfully	5	0.0123709
nearly	4	0.00987094
now	3	0.00731052
certainly	3	0.00730783
gone	3	0.00726963
perfectly	3	0.00722963
many	2	0.00422337
sure	1	0.00228386
even	1	0.00224584
i	1	0.00224506
these	1	0.00220597
nice	1	0.00212742
exact	1	0.00208808

[2,21,]
what	27	0.515373
how	21	0.40001
those	4	0.0731193

[2,22,]
no	40	0.447324
any	15	0.166392
some	14	0.155156
it	6	0.0656061
nothing	5	0.0541155
hardly	3	0.031575
which	2	0.0206004
so	1	0.00941961
they	1	0.00936522
dinah	1	0.00917123
exactly	1	0.00917024

[2,23,]
then	40	0.698148
so	14	0.242249
four	3	0.049193

[2,24,]
not	4	0.948157

[2,25,]
edwin	2	0.895743

[2,26,]
she	1	0.796329

[2,27,]
very	1	0.794696

[3,]
had	1	0.0361748
saw	1	0.0285984
was	1	0.021278
tried	1	0.0210802
sighed	1	0.013591
fell	1	0.013591
felt	1	0.0135619
hardly	1	0.00606699

[3,0,]
was	241	0.612946
were	47	0.119403
had	37	0.0940422
felt	17	0.0428992
did	9	0.0224607
stood	6	0.0139767
seemed	5	0.01245
remembered	4	0.00973678
drew	4	0.00973678
made	3	0.00727831
ran	3	0.00727668
sounded	3	0.00719264
considered	3	0.00719232
goes	2	0.00473222
making	2	0.00464883
remained	2	0.00464818
nibbled	2	0.00464785
got	1	0.0022727
left	1	0.00218938
vanished	1	0.00210437

[3,1,]
got	18	0.129713
made	16	0.114962
turned	14	0.100203
sat	13	0.0932196
came	11	0.0787309
ran	10	0.071482
kept	10	0.07122
walked	9	0.0639741
come	7	0.048248
left	6	0.0425037
hurried	5	0.0340687
set	3	0.0207629
fell	3	0.0207621
goes	3	0.0207611
crowded	2	0.0132562
brightened	2	0.0132532
saw	1	0.00679341
was	1	0.00653827
tried	1	0.00653138
passed	1	0.00626935
offended	1	0.00601442
violently	1	0.00601036

[3,2,]
tried	16	0.186508
had	14	0.163444
seemed	11	0.127691
ought	8	0.0919395
got	7	0.0806394
went	7	0.0804077
used	6	0.0684146
set	5	0.0568819
seems	4	0.0448871
ventured	3	0.0331229
wanted	2	0.0213604
were	1	0.0102923
occurred	1	0.00959621

[3,3,]
had	86	0.612111
all	21	0.148698
only	9	0.0629786
hastily	6	0.0415474
soon	4	0.027264
suddenly	3	0.0201228
instantly	3	0.0201205
both	2	0.0129815
cautiously	2	0.0129775
saw	1	0.00628785
sat	1	0.00598607
generally	1	0.00583856
still	1	0.00583798

[3,4,]
said	96	0.40433
thought	28	0.116729
replied	18	0.075378
asked	13	0.0541294
added	13	0.0541293
spoke	12	0.0499102
could	11	0.0451528
remarked	8	0.0330327
say	6	0.0245971
whispered	5	0.020375
liked	4	0.0161569
passed	3	0.0120891
continued	3	0.0119377
answered	3	0.0119371
exclaimed	3	0.011937
says	3	0.0119365
seemed	2	0.00802243
belongs	2	0.00771727
noticed	2	0.00702836
hurried	1	0.00380497
heard	1	0.00365121

[3,5,]
heard	21	0.37861
knew	8	0.142014
seen	7	0.120478
saw	6	0.10648
repeated	5	0.0877524
replied	4	0.0695747
watched	2	0.0329422
could	1	0.0153152
moved	1	0.0147656

[3,6,]
began	44	0.568837
came	18	0.231318
waited	9	0.11435
swam	4	0.0494235
went	2	0.0235518

[3,7,]
've	37	0.549227
never	24	0.355227
ever	6	0.0866111

[3,8,]
then	30	0.993047

[3,9,]
grow	4	0.947949

[3,10,]
might	13	0.752645
means	4	0.223544

[3,11,]
were	4	0.949135

[3,12,]
sighed	2	0.599858
repeated	1	0.267628

[3,13,]
'm	40	0.510238
're	25	0.317955
were	13	0.164306

[4,]
e	1	0.0620814
stupid	1	0.027678

[4,0,]
turtle	49	0.668751
hare	24	0.326138

[4,1,]
sort	16	0.718621
top	4	0.173414
evening	2	0.0825479

[4,2,]
o	5	0.437499
soo	4	0.346671
beau	2	0.165017

[4,3,]
minute	16	0.83424
hour	2	0.095573
day	1	0.0440889

[4,4,]
mouse	10	0.701106
ootiful	2	0.130076
minute	1	0.0626843
day	1	0.060716

[4,5,]
first	9	0.232851
own	7	0.180217
last	7	0.176601
simple	5	0.127597
few	4	0.101288
e	3	0.0766211
smallest	2	0.0486707
turtle	1	0.0240066

[4,6,]
minute	2	0.90561

[4,7,]

[4,8,]
oop	4	0.543698
clock	3	0.401048

[5,]
soup	8	0.0927616
minutes	3	0.0610812
way	3	0.0505909
rate	3	0.0505385
pardon	4	0.0400307
question	3	0.0400307
children	2	0.0400307
tongue	4	0.0400133
curiosity	4	0.0400133
day	2	0.0297082
hair	3	0.0295055
history	2	0.029502
garden	1	0.0294881
honour	3	0.0294881
time	1	0.0192214
else	1	0.0190851
temper	2	0.0189978
direction	2	0.0189978
remark	1	0.0189943
moment	1	0.0189803
speech	1	0.0189803
verdict	2	0.0189768
wine	1	0.0189629
book	1	0.00852499
last	1	0.00850403
savage	1	0.00848657
nor	1	0.00847259
choice	1	0.00845512
crumbs	1	0.00845512
elegant	1	0.00843766

[5,0,]
time	37	0.669322
moment	15	0.26939
soup	1	0.0162459
minutes	1	0.0156643
children	1	0.0152778

[5,1,]
business	4	0.475851
day	3	0.351825
speech	1	0.101321

[5,2,]
way	21	0.99051

[5,3,]
yet	12	0.589967
way	6	0.291397
garden	2	0.090854

[5,4,]
now	12	0.907266
children	1	0.0627527

[5,5,]
idea	7	0.970297

[5,6,]
garden	10	0.816993
history	2	0.150882

[5,7,]

[5,8,]
majesty	9	0.400435
morning	4	0.173267
rate	3	0.121825
business	2	0.0831681
lovely	2	0.0823982
minutes	1	0.0408151
wine	1	0.0377342

[5,9,]
else	2	0.600606
rules	1	0.266932

[5,10,]

[5,11,]
remark	4	0.346538
minutes	3	0.258808
question	2	0.166433
figure	2	0.16411

[6,]
head	41	0.0519411
eyes	20	0.0249813
door	17	0.0226579
house	18	0.0226579
hand	14	0.0175927
words	14	0.0173597
court	13	0.0162967
voice	12	0.0148866
air	11	0.0137641
side	10	0.0137522
queen	9	0.0125156
game	10	0.0125014
face	10	0.0124918
pool	10	0.0124799
arm	9	0.0112434
life	9	0.0112172
cat	7	0.0112077
fan	9	0.0112077
other	7	0.00999256
feet	7	0.00998304
birds	8	0.00995925
baby	8	0.00995687
distance	8	0.00993545
tail	7	0.008687
rest	7	0.008687
sea	7	0.0086751
best	6	0.0086751
wood	7	0.0086751
table	6	0.0086751
trees	7	0.0086632
bottle	6	0.0086632

[6,0,]
queen	52	0.12904
gryphon	46	0.113981
hatter	43	0.106499
king	42	0.103993
duchess	36	0.0891059
dormouse	27	0.0657618
mouse	23	0.0567313
cat	19	0.0469326
caterpillar	19	0.0468184
rabbit	17	0.0418248
dodo	10	0.0244119
footman	9	0.0219625
jury	9	0.021944
cook	9	0.0219434
pigeon	8	0.0194181
dance	5	0.0119937
knave	4	0.00950611
puppy	3	0.00703763
executioner	3	0.00703745
youth	3	0.00701843
lory	3	0.00698057
eaglet	3	0.00698039
trial	2	0.00458835
hedgehog	2	0.00456915
duck	2	0.00449307
soldiers	1	0.00206313
party	1	0.00206299
others	1	0.00204375

[6,1,]
kid	5	0.599304
gardeners	3	0.349617

[6,2,]
thing	33	0.58575
reason	6	0.103656
question	6	0.103656
table	5	0.0859483
creatures	2	0.0322775
shrill	2	0.0321673
moment	1	0.0144228
footman	1	0.0144225

[6,3,]
white	23	0.616073
three	6	0.156741
young	4	0.102697
guinea	4	0.102697

[6,4,]
mock	49	0.668407
march	24	0.325989

[6,5,]
feet	4	0.200635
jury	4	0.200163
lobster	4	0.200066
inches	3	0.147369
mile	2	0.0947634
foot	1	0.0426322
well	1	0.0424486

[6,6,]
bit	9	0.146728
end	8	0.130102
tone	8	0.130065
moral	7	0.113353
opportunity	6	0.0966885
pair	5	0.0800246
shriek	5	0.0800246
number	4	0.0633607
pack	3	0.0467437
pattering	3	0.0466968
edge	2	0.0300334

[6,7,]
way	9	0.238008
witness	7	0.18387
side	6	0.157238
size	6	0.156756
cat	4	0.103102
verse	3	0.0756945
pack	2	0.0487232

[6,8,]
other	19	0.188282
same	15	0.148093
next	13	0.12802
poor	12	0.118021
right	10	0.0980627
whole	7	0.0681391
first	5	0.0480303
best	4	0.0382573
cheshire	4	0.0380278
proper	3	0.0280282
glass	2	0.0181827
two	2	0.0181101
second	2	0.0180299
fish	1	0.00810821
most	1	0.00806956

[6,9,]

[6,10,]
deal	8	0.973843

[6,11,]
door	6	0.306306
girl	4	0.19995
bottle	3	0.147751
key	3	0.147413
dream	3	0.147413

[6,12,]
low	9	0.676437
deep	4	0.292116

[6,13,]
moment	9	0.799454
whiting	2	0.163668

[7,]
rabbit	16	0.699735
pigs	1	0.116636
lady	2	0.0749777
so	1	0.0333713

[7,0,]
pigs	1	0.81633

[7,1,]
two	12	0.983099

[7,2,]
pigs	1	0.81633

[7,3,]
rabbit	1	0.937569

[8,]
voice	2	0.362418
tone	1	0.29988
high	2	0.174918

[8,0,]
voice	14	0.558418
tone	11	0.427944

[8,1,]
tone	5	0.710831
voice	2	0.259782

[8,2,]
high	10	0.582404
box	4	0.225192
quadrille	3	0.166401

[9,]
is	14	0.329377
was	11	0.270594
so	6	0.152955
for	4	0.113732
not	1	0.0353234
pardon	1	0.0156983

[9,0,]
is	58	0.646552
was	24	0.265412
so	5	0.0541812
matters	2	0.0203133
for	1	0.0111749

[9,1,]
before	12	0.982789

[9,2,]
not	6	0.528751
love	3	0.255184
for	2	0.169789

[9,3,]
is	1	0.860564

[10,]
more	16	0.405129
enough	11	0.276879
use	6	0.148707
right	4	0.0974659
notion	2	0.046153

[11,]

[11,0,]
much	13	0.65366
tired	3	0.142874
easily	3	0.142872
long	1	0.0466006

[11,1,]
much	2	0.931604

[11,2,]
much	6	0.977125

[11,3,]
much	7	0.357262
long	6	0.29811
politely	5	0.238138
extremely	2	0.0936238

[12,]
times	3	0.632294
just	1	0.29953

[12,0,]
just	8	0.981636

[12,1,]
times	3	0.974346

[13,]
from	3	0.930301

[14,]
running	2	0.310909
hunting	2	0.199826
patiently	2	0.199802

[14,0,]
skimming	1	0.810546

[14,1,]
nearer	3	0.936431

[14,2,]

[14,3,]
running	1	0.856724

[15,]

[15,0,]
this	6	0.972034

[15,1,]
nothing	2	0.942467

[15,2,]
nothing	3	0.961581

[15,3,]

[15,4,]

[15,5,]
upstairs	1	0.833566

[16,]

[16,0,]
often	2	0.978291

結果について

論文では動詞が1つの状態とその子に全て集まったと書かれていました。

$M=2$で実験を行ったところ、シード1ではそのような現象が見られなかったものの、シード2では状態$[1]$にほぼすべての動詞が集まっています。

しかしよく見ると$[1,6]$に前置詞が集まってきており、これは親の状態である$[1]$が動詞であることを考えるとあまり起きてほしくない現象です。

また実験中に記録していたパープレキシティと対数尤度は以下のようになりました。

image

image

私は素人のためパープレキシティの出し方がよくわからずネットに載っていた方法で計算しましたが、論文には300前後の値が載っていたので私の実装は何か間違っていると思います。

またIHMMでは数千イテレーション回せば収束するのですが、iTHMMは最低5万イテレーションは回さないといけません。

不思議の国のアリスはたった1,300行程度のデータですが、学習には半日以上かかりました。

また参考文献には1秒間に数千語の隠れ状態をサンプリングできるとありますが、私の実装では1秒間に十数万語の隠れ状態をサンプリングできます。

気づいた点

先行研究であるIHMMでは、状態遷移確率と各状態の事前分布が分かれており、基底分布$H$から事前分布$p(s)$を生成し、$p(s)$から状態遷移確率$p(s_j \mid s_i)$を生成するとう二段階のディリクレ過程によりモデルを作っていました。

iTHMMではルートノード$\boldsymbol s_{[]}$を特殊なノードとみなし、$\boldsymbol s_{[]}$の持つ$\boldsymbol \pi^{[]}$や$p(\cdot \mid \boldsymbol s_{[]})$に、IHMMにおける事前分布と同じような働きをさせようとしています。

ところがIHMMでは事前分布のカウントは代理客を通じてのみ更新されるようになっているのに対し、iTHMMではRetrospective samplingでルートノードがサンプリングされてしまうとそこに直接客を追加することになるため、ルートノードが事前分布に徹しきれていないように感じました。(私の理解が間違っているだけの可能性もありますが)

そもそもルートノード自体不要なのではないかと思っているので、今後コードを書いて実験しようと思います。

さらに現在のiTHMMでは縦方向のCDPでは深くなるごとに$\alpha$を小さくしていますが、横方向に対してはどの深さでも$\gamma$が一定です。

この$\gamma$も深さに応じて値を小さくし水平方向の分割を抑えれば、階層が浅い部分では状態数が多くなり(大分類が増える)、階層が深くなるごとに細かい分割がされなくなるような動作になると考えられます。

現状、細分化が進みすぎて学習が停滞している感があります。

論文に書かれている通りiTHMMは隠れ状態を階層的にとらえるための最初のステップですが、非常に興味深い手法ですので今後も注目していきます。

おわりに

iTHMMは理論自体はわかりやすく、論文を見つけた時すぐ作れそうだと思っていました。

しかし実際にコードを書いていくと予想以上に複雑なことに気づき、2週間ほど経ったところで一度心が折れて諦めそうになりました。

結局完成までに3週間くらいかかりましたが、以前に同じ階層ベイズモデルであるNPYLMを何の知識もない状態から半年かけて実装したことで得た経験が大きかったと思います。

初見でiTHMMを作れと言われたら半年は確実にかかるでしょう。

ちなみにこの記事を書くのにすでに1週間かかっていますが、まだ書きたいことの半分しか書けていませんので、今後も更新していくかもしれません。