For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more nsthorat's commentsregister

Why wait?


I guess I targeted compute shaders in the browser as a good time to revisit linear algebra and ANN since I could expect improved performance, improved programming model and improved portability.

You have created an abstraction that is pretty portable. You'll probably be able to capture new performance enhancements as they occur on web runtimes. Maybe I'll try it out.


It doesn't work in node yet, a relevant issue: https://github.com/PAIR-code/deeplearnjs/issues/234


There is lots of work being done in model compression (quantization, simple factorization tricks, better conv kernels like depthwise separable convs, etc). We won’t let that happen!


I am aware of that research, but even with a 20x decrease in size some models are still too big for web (think about world wide web, not internet in US).


Often times researchers train huge models, but don't think about model size (because they don't have to). We've seen ~200MB production models get down to ~4MB and not lose much precision. I'm quite confident we'll continue that trend.

Don't forget that folks were saying this about the web when images / rich media were becoming prevalent!


200MB is still a small model and 4MB is almost the double of an average web page (including images). 10MB web pages is really bad, more for countries that are still developing their infrastructure.


>> We've seen ~200MB production models get down to ~4MB and not lose much precision.

Details please. What techniques are used to reduce the model size?


I saw a talk on this paper a couple years ago. https://arxiv.org/abs/1503.02531 The method is to train a smaller model on the predictions of a large model or ensemble. I'd be interested in knowing other techniques as well.


This is just the beginning :)


Author of deeplearnjs here. We hear you, and we 100% agree. Stay tuned.


That's great to hear.

By the way, if you'd make your interface more general than deep learning, your library could be the start of an alternative for numpy/scipy on JS, and it would be even faster than the original Python version because it uses the GPU. Just a thought ...

(One small downside is that JS doesn't have the nice operator overloading that Python has, afaik)


We call ourselves deeplearn.js, but you can use it for general linear algebra! Our NDArrayMath layer is analogous to NumPy, and we support a large subset of it (we support many of the linear algebra kernels, broadcasting, axis reduction, etc).


Author of deeplearn.js here. A quick summary:

We store NDArrays as floating point WebGLTextures (in rgba channels). Mathematical operations are defined as fragment shaders that operate on WebGLTextures and produce new WebGLTextures.

The fragment shaders we write operate in the context of a single output value of our result NDArray, which gets parallelized by the WebGL stack. This is how we get the performance that we do.


Which... is pretty much how GPGPU started in the early 2000. Sad/funny how we go through this cycle again.

It will be interesting to see if the industry will produce a standard for GPGPU in the browser. Giving that the desktop standard is less common than a proprietary standard.


This is still done in pretty much every game engine I've worked with (for general computation used to support rendering as much as the rendering itself). It's frankly extremely practical and better than many GPGPU apis because it matches what the hardware is doing internally better (GPU core warps, texel caches, vertex caches, etc).


> It will be interesting to see if the industry will produce a standard for GPGPU in the browser.

They did: webcl Sadly, it had multiple security issues so the browsers that had implemented it in their beta channels (just Chrome and Firefox, I believe) ended up removing it. And now, I think it's totally stalled and no one is planning on implementing it.

Also sadly, SIMD.js support is coming along extremely slowly.


WebGPU conversations are ongoing: https://en.wikipedia.org/wiki/WebGPU

WebAssembly is coming along quite nicely.

And SwiftShader is a quite nice fallback for blacklisted GPUs. They simulate WebGL on the CPU and take advantage of SIMD: https://github.com/google/swiftshader


Are there plans to offer the whole zoo?

http://www.asimovinstitute.org/neural-network-zoo/


As I understand, deeplearn.js is more of a kitchen than a prepared meal. Part of the library is referred to as “numpy for the web” with classes to run linear algebra equations efficiently, leveraging the GPU. I don’t see why you couldn’t use those pieces to set up other networks. I think the name “deeplearn.js” is moreso capitalizing on the branding momentum of “deep learning” rather than being the demonstration of one kind of network. I’m in the middle of introductory machine learning classes, so I hope someone will correct me if I’m wrong.


You're right. Some history:

We wanted to do hardware accelerated deep learning on the web, but we realized there was no NumPy equivalence. Our linear algebra layer has now matured to a place where we can start building a more functional automatic differentiation layer. We're going to completely remove the Graph in favor of a much simpler API by end of January.

Once that happens, we'll continue to build higher level abstractions that folks are familiar with: layers, networks, etc.

We really started from nothing, but we're getting there :)


Thanks for the explanation! I recently have been working on my own deep learning library (for fun) and was doing something similar. Aren't GL textures sampled with floating point units inexactly? Do you just rely on floating point error to be small enough that you can reliably index weights?

I ended up switching to OpenCL since I am running this on my desktop. Just curious to see what you did. Thanks!


You can set nearest neighbor interpolation for the texture (aka no interpolation), and gl_FragCoord can be used to determine which pixel the fragment shader is operating on.


Sick! That is a world-class hack


It's not really a hack, it's just using the GPU's parallel computing capabilities to compute things in parallel. This technique has been around for ages.


Languages buddy, languages.. As much as languages were a barrier for human culture to spread their ideas, it's analogous in the computing world.. JS is catching up with many concepts that were prevalent in other languages/environments. Also due to JS it is now becoming more accessible and popular to the commoners..


kinda, ya


What do u think?


It works on mobile, it's just slow. Every time we read and write from memory we have to pack and unpack 32 bit floats as 4 bytes without bit shifting operators >.>


Isn't that what ArrayBuffers can do for you at nearly the same amortized speed as C unions?


deeplearn.js author here...

We do not send any webcam / audio data back to a server, all of the computation is totally client side. The storage API requests are just downloading weights of a pretrained model.

We're thinking about releasing a blog post explaining the technical details of this project, would people be interested?


Yes please! :)

And some quick questions:

What network topology do you use, and on what model is it based (e.g. "inception")?

What kind of data have you used to pretrain the model?


We're using SqueezeNet (https://github.com/DeepScale/SqueezeNet), which is similar to Inception (trained on the same ImageNet dataset) but is much smaller - 5MB instead of inception's 100MB - and inference is much much quicker.

The application takes webcam frames and infers through SqueezeNet, producing a 1000D logits vector for each frame. These can be thought of as unnormalized probabilities for each of ImageNet's 1000 classes.

During the collection phase, we collect these vectors for each class in browser memory, and during inference we pass the frame through SqueezeNet and do k-nearest neighbors to find the class with the most similar logits vector. KNN is quick because we vectorize it as one large matrix multiplication.

I'll go deeper in a blog post soon :)


So you're doing nearest neighbour search on the images features from the CNN. This is alluded to in Figure 4 of the DeCaf paper: https://twitter.com/eggie5/status/907120374575505408


alexnet paper not decaf paper!


Interesting!

I'm curious why you've used a different classification algorithm on top of a neural network. I would expect that a neural network on top of a pretrained network could give similar results, with the benefit of simpler code. Is performance the reason?

Anyway, I'm looking forward to your blog post.


Training a neural network on top would require a "proper" training phase, and finding the right hyperparameters that work everywhere turned out to be tricky. Actually, this is what we did originally, in the blog post we'll try to show demos of each of the approaches and explain why they don't work.

KNN also makes training "instant", and the code much much simpler.


That makes sense.

By the way, I think your software could become very popular on the Raspberry Pi, because it would be very cheap and fun to use it for all sorts of applications (e.g. home automation).




There's something fantastically entertaining about this. It's stupidly simple (from the outside) but interacting with the computer in such a different way is weirdly fun.

It's like when you turn on a camera and people can see themselves on a TV. A lot of people can't help but make faces at it.


Why does it not work in Edge? Please keep the web open, do not make stuff that does not work in a modern browser. Also always give an option to try it anyway.


A blog post on the technical details would be great, please. Thanks in advance, since I know it'll take a bit of your time to write.


To answer the question at last, yes, I am interested.


yes


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You