More

aab0 · on Sept 23, 2016

This is not necessarily that surprising. Hinton's 'dark knowledge' (not cited in the paper) already showed that a remarkable amount of information is hidden in the classification probabilities emitted by a model, and that one neural net can learn a lot from and reverse-engineer another neural net given just its precise predictions.

aab0 · on Sept 22, 2016

I quipped the other day after reading it, 'In retrospect, training the first RL agents on Doom may not have been the best idea.'

aab0 · on Sept 18, 2016

All the anecdotes I've seen of being addicted to nicotine gum or patches have been of former smokers who were using them to try to quit. That's not necessarily nicotine being addictive so much as their pre-existing tobacco addiction latching onto a replacement.

aab0 · on Sept 14, 2016

Probably. You can add in 'speaker' as a bit of metadata to the samples (this is what is meant by 'conditioning on') and teach it to speak like different people, so if you have a diverse sample of speakers and you add in 'accent' as another variable, it might well learn to disentangle individual speakers from their accents and then you can control generated accents by changing the metadata.

aab0 · on Sept 14, 2016

"The researchers found that attempting to move at a superhuman pace (eg one action every frame), resulted in a subpar performance."

Moving at extremely fine-grained timesteps can make learning much more difficult, because now a reward arrives millions of timesteps delayed rather than hundreds or thousands. It's like trying to teach a NN to compose piano music by starting down at the 1ms raw audio level. This is part of why audio synthesis was so difficult up until recently with DeepMind's WaveNet. In theory, being able to move every frame should enable extremely superhuman performance, but in practice, you can't learn your way there. So often people will chunk data to make it easier to learn the higher-level concepts: operate on words, rather than characters, for example.

raus22 · on Sept 14, 2016

Why not go the other way and decrease the actions per minute so you learn the overall point of the game , And with each game the actions per minute increases.

comex · on Sept 14, 2016

Or maybe extend the traditional categories of macro and micro with another one, call it 'nano'... the micro agent indicates where each unit ought to be in 9 frames, and the nano agent figures out how to take them there. Since the timescale is so short, the agent could brute-force enumerate possible moves to some extent and figure out which is optimal, like chess AI. Or use a separate network.

I guess that's inelegant when a deep network already has its own concept of fine-grained versus coarse-grained layers, and should be able to do this on its own with the right training method.

daveguy · on Sept 14, 2016

That sounds like an interesting research angle. The thing about AI research is there are so many open ends there are essentially unlimited research options. If you can pose it as a problem and identify a reasonable programming approach then you have an avenue for AI research. Deep Learning isn't the end of AI research. It is the beginning.

aab0 · on Sept 14, 2016

They're already sold as pets. Reportedly they're pretty good pets if you don't mind the price tag.

Zyst · on Sept 14, 2016

I did see that part. However it's still being an 'early adopter'.

10 or 20 years later the risk you take has been dampened because people have had Fox pets for complete life-cycles which means more of the caveats of ownership are known. And the breeding location has had even more time to breed out undesirable behavior.

A win-win of sorts as a consumer. Of course with the caveat being that the place could go out of business because of a lack of buyers.

You did mention that they are Reportedly pretty good pets, do you have a source on that? I would enjoy reading more on the subject.

mahyarm · on Sept 14, 2016

Youtube videos are a source.

One issue I have with them is how high pitched and loud they can be, despite being nice otherwise.

aab0 · on Sept 13, 2016

Just follow the WP links.

"Researchers, interested organizations or individuals are welcome to use our scientific data library for legitimate research endeavours. This data is available free of charge, however we do ask that a memorandum of understanding is executed for access privileges to our data. Please send us a message using the Contact Us form and state the nature of your request."

http://www.haidasalmonrestoration.com/index.php/science/scie...

Fordrus · on Sept 14, 2016

Whoops!!! I think I must've opened the page of frenzy of tab-opening, but then either not return to it, or closed it immediately without realizing what it was - that link has been visited, but I clearly didn't check it for the desired content. Thank you! :)

aab0 · on Sept 13, 2016

For a 4x boost, I imagine Tensorflow and Torch will get support shortly after the GPUs start shipping in real quantities.

Eridrus · on Sept 13, 2016

It's a 2x boost for training since you can't use int8 for training and need fp16.

INT8 is a 4x for inference, but most people aren't using GPUs for inference atm.

scottlegrand · on Sept 13, 2016

Some of us are...

https://blogs.aws.amazon.com/bigdata/post/TxGEL8IJ0CAXTK/Gen...

Eridrus · on Sept 14, 2016

Fair enough; when I said most people I meant companies who are not AmaGooFaceSoft and their international equivalents. Though I can't quite tell if you're doing GPU batch predictions and storing them or doing them in realtime with Spark Streaming.

Unrelated question though: any chance you will do blog post/paper about how DSSTNE does automatic model parallelism and gets good sparse performance compared to cuSparse/etc?

aab0 · on Sept 13, 2016

The argument is also bogus from the pro-Russian perspective: if this is 'legalized cheating' and so awful and hypocritical, why weren't the Russians doing it, much less resorting to stealing urine samples? They have doctors who can write certificates too.

aab0 · on Sept 12, 2016

It's not a true power-law (assuming it's not a lognormal). It's truncated by the fact that anyone considering a startup would never have earned more than the world GDP, or more specifically, the peak tech market cap to date (Apple's $775b). Once it's truncated at a finite value, the moments become meaningful.

AstralStorm · on Sept 12, 2016

It is probably something similar to a negative binomial distribution. (Polya distribution)

You play until your VC decides not to fund you or you actually win. This can be done over either money or time.

This only works on startups as a group, for any specific one you would use a more complex method as trials are not independent. Discrete please - type distribution, derived by a Markov chain.

HN For You