So I just googled "differentiable Monte Carlo" and it looks like that concept has so far only been applied to ray tracing, but I would be shocked if that (or similar) doesn't become a thing in the next year or three, such that AlphaGo et al become end-to-end differentiable.
There's more to Monte Carlo Tree Search than the Monte Carlo part though. It's a tree search algorithm using Monte Carlo methods to direct the search, so it would still not be a complete NN engine.
Also, there's no reason to expect a "naked" neural net to ever hold up against current chess engines. Engines are already far beyond the strongest humans, who can be seen as naked NN players. Except the NN is far beyond anything we could even dream of building today, both in number of parameters and the complexity of a single neuron, not to mention our ability to learn.
NNs are interesting in areas where human performance is not yet attained. Chess is not one of those. Heuristic algorithms are far stronger than humans, and so it seems strange to me to suggest making engines stronger by making them more and more like humans, just with far less compute, and weaker learning methods. The notion seems inherently ill-conceived to me.
> stronger by making them more and more like humans, just with far less compute, and weaker learning methods. The notion seems inherently ill-conceived to me.
human brain while may have larger raw compute power comparing to TPU pod has also many bottlenecks, brain works on much slower frequency per current research than 2GHz TPUs, inputs are very bottle-necked, you can't feed brain with 1TB of text in one day like you can do with neural network.
You're right of course. Mainly by compute I'm referring to number of and complexity of neurons, not frequency.
You put your finger on something though because the bottlenecks of brains are probably the reason why traditional minimax searchers are so good at chess: working memory. Humans just don't have the working memory to search to the depths computers can. frequency is interesting, but chess is inherently "working memory-bound". If you gave Stockfish a normal time control and a grandmaster two days per move, Stockfish would still win. Past about 30 minutes of thought on a position, there's not much more progress a human can make. But if you also gave the GM a second board they were allowed to make moves on, and a notebook, I bet the GM would win, because at that point you're only comparing eval functions, and a GM would still be miles ahead. I wonder if someone's done this experiment.
Being differentiable does not make something a neural network nor does it mean the differentiated function is meaningful (such as can be the case in highly branching or otherwise complex control flow). It also doesn't always escape combinatorial explosion and that fixed structure limited precision strictly bounds how far a neural net could look ahead.
By making something differentiable, you're hoping relaxations or some smooth proxy doesn't break fundamental structure, allowing you to solve an easier problem than discrete combinatorial search. Sometimes there can be barely any leveragable structure. This has proved very difficult to achieve in general and is a sort of holy grail in some fields.
That said, there are powerful game playing AIs that use either no (DeepNash) or minimal (DORA) search. But in general you can't get something for free, you're always paying something with an approximation.
it could depend on downstream task, is it differentiable or not. Ray tracing of convex textures is likely differentiable, win/loss function for a given chess move maybe not.
Completely agree! Plus, less trivially, there can be a bunch of different link weight settings (for an assumed distribution of inputs) that result in nearly-symmetric behaviors, and then that is multiplied by the permutation results you have just mentioned! So, it's complicated...
These are related to recurrent neural networks evolved to maximize fitness whilst wandering through a randomly generated maze and picking up food pellets (the advantage being to remember not to revisit where you have already been.)
Back in the early 2000's (when I first started getting into this stuff), Schmidhuber's ideas were absolutely off-the-charts revolutionary compared to anything any other lab was doing. The level of creativity coming out of IDSIA was insane. I truly believe he has not been given even slightly the level of recognition he deserves. Many of the original founders of DeepMind are his direct students. E.g. Alex Graves, who is basically the pioneer of "attention" in neural networks (and we all know what architecture that led to...)
If I remember correctly, a big advantage of particle filters over Kalman filters is that the particle filters can model complex, multi-modal distributions, whereas the Kalman filter just updates the mean and covariance of a single high-dimensional Gaussian (albeit optimally). This is nice for localization problems because it allows you to keep track of multiple competing hypotheses about your current location.
Another fun fact about high-dimensional spaces: Randomly pick k points in an n-dimensional space. Now, find their average location. As the number of dimensions increases, it becomes progressively more likely that every point will be closer to the average than it is to any other point.
One way the circumvent the need for 5000 teraflops mentioned in the article is to exploit the fact that our eye is only capable of seeing a relatively small area in great detail at any given instant. Per viewer, we only need to finely render a tiny bit of the scene. This of course precludes the same level of realism on a shared screen, but I suspect that viewer-specific devices (e.g. http://www.scientificamerican.com/article.cfm?id=virtual-rea... ) will become the norm as we move further towards virtual reality.
I doubt that would work. A system like that would have to be able to predict where saccadic land. Otherwise, you would, for a short time after every saccade, look at a part of the screen that is low in detail. Short saccades take about 20ms. So, if the graphics pipeline has higher latency, it would mean that we would have to be able to predict saccades.
I am not familiar Witt the date of the art here, but I do not think that is possible.
Just 60 FPS = 17ms per frame. Theres lag with the display response, graphics pipeline, and detecting eye movements, but it doesn't seem unreasonable for future tech.
Saccades to an unexpected stimulus normally take about 200 milliseconds (ms) to initiate, and then last from about 20–200 ms, depending on their amplitude (20–30 ms is typical in language reading).
Saccades of 20ms in duration are ones that are very near to the current center of focus (e.g. moving to the next chunk of letters while reading the words of this sentence). This just means that detailed rendering needs to extend to a slightly larger radius, but this is still significantly cheaper to render than an field of view. For larger jumps there is ~200ms during which the computer can attempt to predict the final destination of the saccade, and thus begin to do some preemptive computations. Once the saccade lands at the new location, assuming a rendering speed of 100fps, there would be at most 10ms before the high-res version kicked in, but again, with some degree of preemptive/predictive computation, perhaps a slightly better version could be available immediately.