# Kaldi thchs30手札（五）

## LDA与MLLT（line 78-85)

# LDA

LDA在此处指的是线性判别分析（Linear Discriminant Analysis， LDA）它是一种监督学习的降维技术，也就是说它的数据集的每个样本是有类别输出的。

LDA的核心思想为投影后类内方差最小，类间的方差最大。简单来说就是同类的数据集聚集的紧一点，不同类的离得远一点，其实这种思想在如今的人脸识别领域用的还蛮多的。来个图感受一下效果，其中右侧使我们想要的：

## LDA算法流程

1. 计算类内散度矩阵$S_{w}$

2. 计算类间散度矩阵$S_{b}$

3. 计算矩阵$S_{w}^{-1}S_{b}$

4. 计算$S_{w}^{-1}S_{b}$的最大d个特征值和对应的d个特征向量$(w_1,w_2,\ldots,w_d)$，得到投影矩阵。

5. 对样本集中的每一个样本特征$x_{i}$转化为新的样本$z_{i}=W^{T}x_{i}$

6. 得到输出样本集$D^{'} = {(z_{1},y_{1}), (z_{2},y_{2}),\ldots,(z_{m},y_{m})}$

# MLLT

Global Semi-tied Covariance (STC)/Maximum Likelihood Linear Transform (MLLT)即最大似然线性变换,在最大似然（ML）准则下使用一个线性变换矩阵对参数特征矢量进行解相关。它是一个平方特征变换矩阵（square feature-transformation matrix），来自于论文Semi-tied Covariance Matrices for Hidden Markov Models,用于建模方差，解决full convariance的参数量大的问题。

1. diagonal convariance $\Sigma_{diag}^{(m)}$
2. semi-tied class-dependent nondiagonal matrix $H^{r}$，可以在多个高斯分量之间共享（比如所有monophone对用状态的高斯分量）.

## MLLT流程

1. Estimate the LDA transformation matrix (we only need the first rows of this, not the full matrix). Call this matrix $\mathbf{M}$.

2. Start a normal model building process, always using features transformed with $\mathbf{M}$. At certain selected iterations (where we will update the MLLT matrix), we do the following:
a) Accumulate MLLT statistics in the current fully-transformed space (i.e., on top of features transformed with $\mathbf{M}$). For efficiency we do this using a subset of the training data.
b) Do the MLLT update; let this produce a square matrix $\mathbf{T}$.
c) Transform the model means by setting $\mu_{jm} \leftarrow \mathbf{T} \mu_{jm}$.
d) Update the current transform by setting $\mathbf{M} \leftarrow \mathbf{T} \mathbf{M}$

# Splice

Splice在网上说它的也很少，这里采用Kaldi的原话：

Frame splicing (e.g. splicing nine consecutive frames together) is typically done to the raw MFCC features prior to LDA. The program splice-feats does this. A typical line from a script that uses this is the following:

feats=”ark:splice-feats scp:data/train.scp ark:- | transform-feats \$dir/0.mat ark:- ark:- |”

and the “feats” variable would later be used as an rspecifier (c.f. Specifying Table formats: wspecifiers and rspecifiers) by some program that needs to read features. In this example we don’t specify the number of frames to splice together because we are using the defaults (–left-context=4, –right-context=4, or 9 frames in total).

# train_lda_mllt.sh流程

1. 估计出LDA变换矩阵M，特征经过LDA转换。

2. 用转换后的特征重新训练GMM。

3. 计算MLLT的统计量。

4. 更新MLLT矩阵T。

5. 更新模型的均值$\mu_{jm}\leftarrow T\mu_{jm}$

6. 更新转换矩阵$M\leftarrow TM$

# 参考

