For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more desku's commentsregister

Is there actually a new PyTorch version? I can't seem to find anything about it online.


What niche libraries do you think PyTorch is lacking? Do you have some examples of ones that exist in Tensorflow with no PyTorch equivalent?


I meant model libraries/implementations with that like e.g. Stardist[0]. Obviously there will be some research niches that are mostly implemented in TF over PyTorch or the other way around.

[0]: https://github.com/mpicbg-csbd/stardist


I very much enjoyed, and would recommend, White Noise and Mao II.

As for adjacent authors, I think I'd probably list: David Foster Wallace, Pynchon, Franzen and Roth.


The link to the .pdf seems to be broken though.


GP's link is to an on-line HTML version of the book. It works fine as far as I can see.


yeah after going to press they seem to have taken down the links to the PDF across the internets. Mildly irritating. Does anyone have the pdf downloaded from before it went to press?


I'm not sure how to interpret the argument of the article or the results in the appendix here.

The first table shows AdamW having the best results, which follows the argument of the article. However, the following three tables all have plain Adam producing the best results.

The way the article is written it seems to be championing AdamW, but the results just seem to conclude that AMSGrad is bad and Adam is the best with AdamW having negligible performance increase over Adam in a single task.


From the article:

So, weight decay is always better than L2 regularization with Adam then? We haven’t found a situation where it’s significantly worse, but for either a transfer-learning problem (e.g. fine-tuning Resnet50 on Stanford cars) or RNNs, it didn’t give better results.


I actually found this to be one of the best explanations on this topic I've read. Fully recommend the author's book too.

Also recommend this as it follows it/is an alternative: https://arxiv.org/abs/1703.01619


Are you sure you linked to the right article? The linked article is about neural translations using seq-to-seq while TFA is about neural models for all kinds of language processing.


It's probably domain/industry/company dependent, but the vast majority (>90%) of NLP work I do nowadays is sequence-to-sequence models.


gameaibook.org


It's bloat due to 'introns' (useless statements that don't effect the output, like x = x * 1). And yes, just adding a fitness function to shorten program length isn't optimal. I've found it easier to evolve successful programs (letting the bloat happen) and then keep removing statements from correctly generated programs whilst checking if the output is the same. Probably not optimal either but I feel like it gives better results.


I guess a lot of standard compiler optimisation techniques could be used here -- if you care.

I don't know what that would do to the learning process -- but at least it would be useful for end results.


I don't think it'll ever be finished.


Yep, Andrej leads Tesla Autopilot now. Doubt he'll be following up here.


yep :(


I guess ancestor commenters need only wait for a weekend now :)


I get this too. Haven't been able to get references to work for a single paper yet.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You