I'm surprised more attention isn't paid to this research direction, that nobody has tried to generalize it for example by combining the recurrence concept with next token prediction.
That said despite the considerable gains this seems to just be some hyperparameter tweaking rather than a foundational improvement.
Not just hyper parameter tweaking. Not foundational research either. But rather engineering improvements that compound with each other (conswiglu layers, muon optimizer)
This is one of the reasons it is my daily go-to LLM.
It shows that the x.ai team is responsive and moves quickly.
x.ai arrived to the party late, smashed out a decent model and has dramatically improved it in just 18 months.
They have the talent, the infra, the funds and real-time access to X posts. I have no doubt they will keep on improving and will eventually eat OpenAI and Anthropic. Google is the only other big player who really is a threat.
Green flag that he references the I Ching, most original ideas come through analogy. Paul Werbos claims he invented backprop to formalize Freud's theory of “psychic energy” into an algorithm.
If you read the section of South Africa's Application Instituting Proceedings in the International Court of Justice entitled Expressions of Genocidal Intent against the Palestinian People by Israeli State Officials
and Others, you'll find a compelling case that genocidal intent was clearly expressed. There are over seven pages of quotes and citations of Israeli leaders expressing that intent.
This needs a citation. Israel developed their nukes 50 years ago with the assistance of Jewish nuclear physicists from around the world and french materials. They didn't need to steal nuclear secrets.