SocialAnxietyNerd: Kaldi + Eclipse, a very time consuming task

Kaldi and it's "structure"

After I finally read enough code of kaldi's I-Vector implementation, I started to input my own code.

KALDI is not what I would call naturally easily extendable. It has it's flaws when it comes to separating objects.

In my case, there is already a class called IVectorExtractor. I just need to modify one or two methods of this class, since it has a strong relationship with other I/O classes, it wouldn't be wise to rewrite the whole process.

So I did implement my methods by subclassing the Ivectorextractor class and was trying to write some tests for that.

Unfortunately it is not easy to simulate the test at all, since in the usual procedure for extracting i-vectors, the perquisites are huge, we need to train a ubm, then estimate the T matrix and then we can extract the i-vector.

My thoughts were:

Train a usual model and test your own class after you estimated the T-Matrix ( naturally the easiest solution)
Write a test which dumps random data and estimates that

Either way, in the end I needed to debug the code, and boy did it take time to do so.

So recapitulate again:

Kaldi works for Gridengine, which is a batch parallel processing framework
Batch Processing needs to be done by generating your binary executable files and execute them in parallel
This again generates a lot of code, which constists of a class with an executable main method and a (usually ) class.
The dumped binary is usually just one part of a chain execution, since we can't simply do all work in one main()

The amount of dumped binaries leads to a huge influx of binaries which need to be executed in some kind of order.
Usually Kaldi does that by calling different types of bash scripts which are already shipped with the current version.

Debugging

So to debug the code, I have written, I tried to use eclipse as an IDE. Eclipse comes usually with the debugger gdb and can debug c++ code.

However, in the case of KALDI, we do not just start one c++ file, where we can just put parameters into, we need to call some functions in before.

To configure this behaviour, there are some options:

Debug the shell script, by using gdbserver. Command: gdbserver localhost:1234 <file>
In eclipse "rewrite" the scripts again, by using a launch configuration, which includes all commands which were used inside the scripts ( does take a lot of time )
Cherry pick just the part of code, where the class, which is going to be debugged, is called and hope everything works fine ( the fast way )

Finally I picked the last option.

SocialAnxietyNerd

Wednesday, May 14, 2014

Kaldi + Eclipse, a very time consuming task

Kaldi and it's "structure"

Debugging

No comments:

Post a Comment