More

youngprogrammer · 2026-03-19T23:58:02 1773964682

industrialized overfitting is basically what ML researchers do

youngprogrammer · 2026-02-24T02:03:12 1771898592

should go a bit earlier with word2vec, NMT, seq2seq, attention, self attention

youngprogrammer · on Dec 16, 2023

Little late to this thread but from my list:

LLM (foundational papers)

* Attention is all you need - transformers + self attention

* BERT - first masked LM using transformers + self attention

* GPT3 - big LLM decoder (Basis of gpt4 and most LLM)

* Instruct GPT or TKInstruct (instruction tuning enables improved zero shot learning)

* Chain of Thought (improve performance via prompting)

some other papers which are become trendy depending on your interest

* RLHF - RL using human feedback

* Lora - make models smaller

* MoE - kind of ensembling

* self instruct - self label data

* constitutional ai - self alignment

* tree of thought - like CoT but a tree

* FastAttention,Longformer - optimized attention mechanisms

* React - agents

youngprogrammer · on Dec 30, 2021

It seems like most of these imperceptible changes could be addressed by something like ascii folding (https://www.elastic.co/guide/en/elasticsearch/reference/curr...) but this might not apply for non-english use cases.

If you're interested in adversarial NLP, I also recommend reading this blog post on adversarial attacks on GPT2 with universal triggers (e.g. adding "nobody" as prefix for all inputs causes all entailments to be predicted as contradiction).

youngprogrammer · on Aug 25, 2020

They probably didn't have a Gantt chart to help them figure out the dependencies to properly plan it on their roadmap

youngprogrammer · on April 6, 2020

You could do something similar to how they trained a ML model to find antibiotics compounds: https://www.cell.com/action/showPdf?pii=S0092-8674%2820%2930.... First, train a deep learning model to learn a representation of molecules from their molecule structures. Then feed in the thousand or so known compounds that produce pleasant or unpleasant smells as training data with some score of "pleasantness". We can then use this model to quickly score millions of compounds and select candidates to test.

r0b05 · on April 6, 2020

I love the explanation.

youngprogrammer · on March 11, 2020

Anecdote: Our caterer in silicon valley said there was a supply issue for tofu.

barbecue_sauce · on March 11, 2020

So I guess they import the beans, manufacture the tofu, then ship it back to us? Sounds inefficient, though I'm not sure how much demand there is for a large-scale domestic tofu manufacturing industry in the US.

tentboy · on March 11, 2020

Trans-oceanic shipping is actually incredibly cheap that companies will do things like this if they are able to save a bit of money

see: https://www.telegraph.co.uk/news/uknews/1534286/12000-mile-t...

neaden · on March 11, 2020

US Tofu is by and large for animal feed or oil, not tofu.

barbecue_sauce · on March 11, 2020

Soy, you mean?

youngprogrammer · on Aug 13, 2017

This is nothing like the sharpshooter fallacy. The analysis determined the average performance of the analysts with >100 stocks rated and 10 analysts out of 16 did better than the rest.

hinkley · on Aug 14, 2017

What? They removed all the outliers. They cooked the data to say something that made sense instead of something that was accurate.

youngprogrammer · on Aug 12, 2017

I will agree that the methodology is not as rigorous as it could be but where can you prove it is "wrong"?

My blogpost shows that stock price predictions also show a terrible track record. They are wildly off and on average higher than actual results.

cs702 · on Aug 12, 2017

You are the one making a controversial (I'd say fantastical) claim, so you are the one who has to prove it's "right."

Here are some basic questions for you, related to the points made above:

* WHY did you remove outliers in the 10th and 90th percentiles? What happens if you don't remove them?

* WHY did you use a 10-day window centered on dates of recommendation? What happens if you use the price on the same day?

* Why did you choose those return horizons? What happens if you choose different ones?

* WHY did you pick out only the top 10 analysts? What happens if you don't?

* WHY did you not do statistical tests relating to removing outliers, significance, etc.?

* WHY did you choose those cutoffs for price, marketcap, and minimum analyst rating?

youngprogrammer · on Aug 13, 2017

Outliers were removed to get a better measure of the "accuracy" of the price targets.

10 day windows were used to reduce the amount of volatility/noise in a time frame

Return horizons for 1 years was used because price targets are for one year.

Theres only 15 or so analysts I looked at.

I was doing this as an exploratory data analysis and didn't want to pull out my old stats textbook.

Cutoffs were chosen to reduce volatility of measurements since I was looking at percentages. A stock going from $1.5 to $2.0 is a 33% increase whereas the movement of $100 to $133 is significantly more impactful. Stock with lower market cap have more volatility. The minimum analyst rating was chosen to eliminate analysts with very small number of ratings as they would be unreliable.

youngprogrammer · on Aug 12, 2017

In my analysis, I do hypothesize that analyst opinions become a self-fulfilling prophecy as you described. However, I would like to believe that analysts do some sort of sophisticated breakdown and analysis of a company's financial statements and market outlook when releasing a justifiable rating.

HN For You