I meant model libraries/implementations with that like e.g. Stardist[0]. Obviously there will be some research niches that are mostly implemented in TF over PyTorch or the other way around.
yeah after going to press they seem to have taken down the links to the PDF across the internets. Mildly irritating. Does anyone have the pdf downloaded from before it went to press?
I'm not sure how to interpret the argument of the article or the results in the appendix here.
The first table shows AdamW having the best results, which follows the argument of the article. However, the following three tables all have plain Adam producing the best results.
The way the article is written it seems to be championing AdamW, but the results just seem to conclude that AMSGrad is bad and Adam is the best with AdamW having negligible performance increase over Adam in a single task.
So, weight decay is always better than L2 regularization with Adam then? We haven’t found a situation where it’s significantly worse, but for either a transfer-learning problem (e.g. fine-tuning Resnet50 on Stanford cars) or RNNs, it didn’t give better results.
Are you sure you linked to the right article? The linked article is about neural translations using seq-to-seq while TFA is about neural models for all kinds of language processing.
It's bloat due to 'introns' (useless statements that don't effect the output, like x = x * 1). And yes, just adding a fitness function to shorten program length isn't optimal. I've found it easier to evolve successful programs (letting the bloat happen) and then keep removing statements from correctly generated programs whilst checking if the output is the same. Probably not optimal either but I feel like it gives better results.