I guess I targeted compute shaders in the browser as a good time to revisit linear algebra and ANN since I could expect improved performance, improved programming model and improved portability.
You have created an abstraction that is pretty portable. You'll probably be able to capture new performance enhancements as they occur on web runtimes. Maybe I'll try it out.
There is lots of work being done in model compression (quantization, simple factorization tricks, better conv kernels like depthwise separable convs, etc). We won’t let that happen!
I am aware of that research, but even with a 20x decrease in size some models are still too big for web (think about world wide web, not internet in US).
Often times researchers train huge models, but don't think about model size (because they don't have to). We've seen ~200MB production models get down to ~4MB and not lose much precision. I'm quite confident we'll continue that trend.
Don't forget that folks were saying this about the web when images / rich media were becoming prevalent!
200MB is still a small model and 4MB is almost the double of an average web page (including images). 10MB web pages is really bad, more for countries that are still developing their infrastructure.
I saw a talk on this paper a couple years ago. https://arxiv.org/abs/1503.02531 The method is to train a smaller model on the predictions of a large model or ensemble. I'd be interested in knowing other techniques as well.
By the way, if you'd make your interface more general than deep learning, your library could be the start of an alternative for numpy/scipy on JS, and it would be even faster than the original Python version because it uses the GPU. Just a thought ...
(One small downside is that JS doesn't have the nice operator overloading that Python has, afaik)
We call ourselves deeplearn.js, but you can use it for general linear algebra! Our NDArrayMath layer is analogous to NumPy, and we support a large subset of it (we support many of the linear algebra kernels, broadcasting, axis reduction, etc).
We store NDArrays as floating point WebGLTextures (in rgba channels). Mathematical operations are defined as fragment shaders that operate on WebGLTextures and produce new WebGLTextures.
The fragment shaders we write operate in the context of a single output value of our result NDArray, which gets parallelized by the WebGL stack. This is how we get the performance that we do.
Which... is pretty much how GPGPU started in the early 2000.
Sad/funny how we go through this cycle again.
It will be interesting to see if the industry will produce a standard for GPGPU in the browser. Giving that the desktop standard is less common than a proprietary standard.
This is still done in pretty much every game engine I've worked with (for general computation used to support rendering as much as the rendering itself). It's frankly extremely practical and better than many GPGPU apis because it matches what the hardware is doing internally better (GPU core warps, texel caches, vertex caches, etc).
> It will be interesting to see if the industry will produce a standard for GPGPU in the browser.
They did: webcl Sadly, it had multiple security issues so the browsers that had implemented it in their beta channels (just Chrome and Firefox, I believe) ended up removing it. And now, I think it's totally stalled and no one is planning on implementing it.
Also sadly, SIMD.js support is coming along extremely slowly.
And SwiftShader is a quite nice fallback for blacklisted GPUs. They simulate WebGL on the CPU and take advantage of SIMD:
https://github.com/google/swiftshader
As I understand, deeplearn.js is more of a kitchen than a prepared meal. Part of the library is referred to as “numpy for the web” with classes to run linear algebra equations efficiently, leveraging the GPU. I don’t see why you couldn’t use those pieces to set up other networks. I think the name “deeplearn.js” is moreso capitalizing on the branding momentum of “deep learning” rather than being the demonstration of one kind of network. I’m in the middle of introductory machine learning classes, so I hope someone will correct me if I’m wrong.
We wanted to do hardware accelerated deep learning on the web, but we realized there was no NumPy equivalence. Our linear algebra layer has now matured to a place where we can start building a more functional automatic differentiation layer. We're going to completely remove the Graph in favor of a much simpler API by end of January.
Once that happens, we'll continue to build higher level abstractions that folks are familiar with: layers, networks, etc.
We really started from nothing, but we're getting there :)
Thanks for the explanation! I recently have been working on my own deep learning library (for fun) and was doing something similar. Aren't GL textures sampled with floating point units inexactly? Do you just rely on floating point error to be small enough that you can reliably index weights?
I ended up switching to OpenCL since I am running this on my desktop. Just curious to see what you did. Thanks!
You can set nearest neighbor interpolation for the texture (aka no interpolation), and gl_FragCoord can be used to determine which pixel the fragment shader is operating on.
It's not really a hack, it's just using the GPU's parallel computing capabilities to compute things in parallel. This technique has been around for ages.
Languages buddy, languages.. As much as languages were a barrier for human culture to spread their ideas, it's analogous in the computing world.. JS is catching up with many concepts that were prevalent in other languages/environments. Also due to JS it is now becoming more accessible and popular to the commoners..
It works on mobile, it's just slow. Every time we read and write from memory we have to pack and unpack 32 bit floats as 4 bytes without bit shifting operators >.>
We do not send any webcam / audio data back to a server, all of the computation is totally client side. The storage API requests are just downloading weights of a pretrained model.
We're thinking about releasing a blog post explaining the technical details of this project, would people be interested?
We're using SqueezeNet (https://github.com/DeepScale/SqueezeNet), which is similar to Inception (trained on the same ImageNet dataset) but is much smaller - 5MB instead of inception's 100MB - and inference is much much quicker.
The application takes webcam frames and infers through SqueezeNet, producing a 1000D logits vector for each frame. These can be thought of as unnormalized probabilities for each of ImageNet's 1000 classes.
During the collection phase, we collect these vectors for each class in browser memory, and during inference we pass the frame through SqueezeNet and do k-nearest neighbors to find the class with the most similar logits vector. KNN is quick because we vectorize it as one large matrix multiplication.
I'm curious why you've used a different classification algorithm on top of a neural network. I would expect that a neural network on top of a pretrained network could give similar results, with the benefit of simpler code. Is performance the reason?
Training a neural network on top would require a "proper" training phase, and finding the right hyperparameters that work everywhere turned out to be tricky. Actually, this is what we did originally, in the blog post we'll try to show demos of each of the approaches and explain why they don't work.
KNN also makes training "instant", and the code much much simpler.
By the way, I think your software could become very popular on the Raspberry Pi, because it would be very cheap and fun to use it for all sorts of applications (e.g. home automation).
There's something fantastically entertaining about this. It's stupidly simple (from the outside) but interacting with the computer in such a different way is weirdly fun.
It's like when you turn on a camera and people can see themselves on a TV. A lot of people can't help but make faces at it.
Why does it not work in Edge? Please keep the web open, do not make stuff that does not work in a modern browser. Also always give an option to try it anyway.