CAT
Intoduction
After reading the basic paper about Cluster Adaptive Training ( CAT ) , I got a good Idea about the I-Vector approach and the statistics behind it.CAT did provide a method to reduce substantially the amount of Parameters for training, but increased the performance and accuracy of the system.
CAT does rely on GMM-HMM adaptation , and clusters similar speakers together, but ties the variances and prior probabilities together hence only the means vary between clusters.
component
priors and variances are tied over all the speaker clusters.
Definition
The challenge is to calculate the mean of the speakers , given a Gaussian component.
This results in the following Model:
To estimate these two unknown variables, EM is used. The Estimation formula is defined as:
The rest is simply model optimization. I did not provide the full formulas, but essentially these are sufficient to get an idea about CAT.
CAT Schemes
CAT provides a adaption scheme, called transform scheme, which maps the means into another lower dimensional space, which is done by concatenating all cluster vectors and produce a so called "canonical mean". This is when subject to multiple transforms, which map a value of the canonical mean vector to the resulting mean, but uses a fixed amount of transforms.
When is CAT really helpful?
The
primary use of CAT is as a rapid adaptation scheme for scenarios
where there is very little adaptation data available, which is essentially similar to the JFA (Joint Factor Analysis) approach.
I-Vector
Until now I read quite many papers about this topic, but I still have no clue if e.g. CAT can be used to pre process the weights of the model or if a GMM-UBM performs overall better in that case, or if both (GMM and CAT) can be used to train a model, where the I-Vector will be extracted.
- Adaptation Scheme after the training of a specific model ( e.g. GMM-UBM )
- I-Vector is another approach of the JFA ( Joint Factor Analasys ) method. In contrast to JFA, I-Vector tries to simplify all parameters into one space, the “total variability space” and extract the “total variability vector” or I-Vector (Identity Vector) out of it and measure the distances between the test and train I-Vectors.
- Reduces the large dimensional input data to a small dimensional, yet representative data set.
- The channel compensation in this new approach is carried out in low dimensional space
- Uses the value of the cosine distance computed between the target speaker factors and test total factors as a decision score
- Is defined by the total variability matrix that contains the eigenvectors with the largest eigenvalues of the total variability co-variance matrix, all other values will be scrapped.
Essentially both approaches try to decrease the amount of parameters and to just see the variances as constant / simplified.
So far, no good
No comments:
Post a Comment