More

desku · on Oct 9, 2020

Is there actually a new PyTorch version? I can't seem to find anything about it online.

desku · on Oct 8, 2020

What niche libraries do you think PyTorch is lacking? Do you have some examples of ones that exist in Tensorflow with no PyTorch equivalent?

hobofan · on Oct 8, 2020

I meant model libraries/implementations with that like e.g. Stardist[0]. Obviously there will be some research niches that are mostly implemented in TF over PyTorch or the other way around.

[0]: https://github.com/mpicbg-csbd/stardist

desku · on Sept 10, 2020

I very much enjoyed, and would recommend, White Noise and Mao II.

As for adjacent authors, I think I'd probably list: David Foster Wallace, Pynchon, Franzen and Roth.

desku · on July 17, 2018

The link to the .pdf seems to be broken though.

ranit · on July 18, 2018

GP's link is to an on-line HTML version of the book. It works fine as far as I can see.

yashevde · on July 18, 2018

yeah after going to press they seem to have taken down the links to the PDF across the internets. Mildly irritating. Does anyone have the pdf downloaded from before it went to press?

desku · on July 4, 2018

I'm not sure how to interpret the argument of the article or the results in the appendix here.

The first table shows AdamW having the best results, which follows the argument of the article. However, the following three tables all have plain Adam producing the best results.

The way the article is written it seems to be championing AdamW, but the results just seem to conclude that AMSGrad is bad and Adam is the best with AdamW having negligible performance increase over Adam in a single task.

yorwba · on July 4, 2018

From the article:

So, weight decay is always better than L2 regularization with Adam then? We haven’t found a situation where it’s significantly worse, but for either a transfer-learning problem (e.g. fine-tuning Resnet50 on Stanford cars) or RNNs, it didn’t give better results.

desku · on Aug 16, 2017

I actually found this to be one of the best explanations on this topic I've read. Fully recommend the author's book too.

Also recommend this as it follows it/is an alternative: https://arxiv.org/abs/1703.01619

wodenokoto · on Aug 16, 2017

Are you sure you linked to the right article? The linked article is about neural translations using seq-to-seq while TFA is about neural models for all kinds of language processing.

desku · on Aug 16, 2017

It's probably domain/industry/company dependent, but the vast majority (>90%) of NLP work I do nowadays is sequence-to-sequence models.

desku · on July 19, 2017

gameaibook.org

desku · on July 18, 2017

It's bloat due to 'introns' (useless statements that don't effect the output, like x = x * 1). And yes, just adding a fitness function to shorten program length isn't optimal. I've found it easier to evolve successful programs (letting the bloat happen) and then keep removing statements from correctly generated programs whilst checking if the output is the same. Probably not optimal either but I feel like it gives better results.

adrianratnapala · on July 18, 2017

I guess a lot of standard compiler optimisation techniques could be used here -- if you care.

I don't know what that would do to the learning process -- but at least it would be useful for end results.

desku · on July 14, 2017

I don't think it'll ever be finished.

nsthorat · on July 14, 2017

Yep, Andrej leads Tesla Autopilot now. Doubt he'll be following up here.

karpathy · on July 14, 2017

yep :(

ashwinp92 · on July 15, 2017

I guess ancestor commenters need only wait for a weekend now :)

desku · on July 10, 2017

I get this too. Haven't been able to get references to work for a single paper yet.

HN For You