It's cool that they provided bindings for Torch, but also interesting. Torch is obviously very widespread/popular as a deep learning framework, but I got the impression that Baidu's Silicon Valley AI Lab (SVAIL) ran mostly a custom C/C++ codebase.
Most likely the Torch bindings were added to help spur a wider variety of researchers to use their CTC implementation (i.e. it doesn't mean they've switch to Torch internally). But still interesting to see.
Edit: to add something more "helpful" to this comment, their paper links to a YouTube channel [1] that shows demos of their method, which I think is great.
During the GPU Technology Conference (GTC) 2015, Andrew Ng showed a live demo of Deep Speech (1?) [0] (demo starts ~41 minute mark). There are other videos showing Deep Speech, but I found this one the most useful/interesting (of the ones I've seen).
I think the author revised the figure(s) between the time of the parent comment (by gcr) and your comment. At least, the A * B = C figure's filename seems to imply a revision [1].
EDIT: yep, the figures were revised. Compare the corrected version [1] vs the original [2].
Yes, that was my screwup, sorry for any confusion! Despite multiple linear algebra courses, working in 3D graphics for a decade, I haven't internalized the basics of matrix notation. It doesn't help that my usual sandbox (Eigen) is column major by default, which is the wrong in-memory order for my raster-image trained brain to visualize.
Funnily enough, I find tensor notation a bit easier, despite being less familiar.
The left card has a red circle (left corner) and a purple "x" (right corner), while the right card has a blue-green triangle (lower left corner) and a green-blue circle (lower right corner).