Also from the tutorial:
"Unlike Eff, Koka, Links, and other languages that support effect handlers, effects in Multicore OCaml are unchecked currently. A program that does not handle a performed effect fails with a runtime error."
and they are Samsung Galaxy S's, a couple of Asus ZenFone's, and Google Pixel 5.
If you're willing to add another 5mm, there are also a couple of Sony Xperia's and Sharp Aquous, and Google Pixel 8. And if you want to cap the height at 145 mm - it's just Google Pixel 5.
Flix FAQ (https://flix.dev/faq/) starts normal, but becomes increasingly more hilarious towards the end :D
Some gems:
---
Q: Wait, division by zero is zero, really?
A: Yes. But focusing on this is a bit like focusing on the color of the seats in a spacecraft.
---
Q: "This site requires JavaScript"
A: People who have criticized the website for using JavaScript: [1], [2], [3], [4], [5].
People who have offered to help refactor the site to use static html: 0.
---
Q: I was disappointed to learn that Flix has feature X instead of my favorite feature Y.
A: We are deeply sorry to have let you down.
---
Q: This is – by far – the worst syntax I have ever seen in a functional language. Semicolons, braces, symbolic soup, et al. It is like if Scala, Java and Haskell had a one night stand in the center of Chernobyl.
We have a partial understanding of why distillation works—it is explained by The Lottery Ticket Hypothesis (https://arxiv.org/abs/1803.03635). But if I am understanding correctly, that doesn't mean you can train a smaller network from scratch. You need a lot of randomness in the initial large network, for some neurons to have "winning" states. Then you can distill those winning subsystems to a smaller network.
Note that similar process happens with human brain, it is called Synaptic pruning (https://en.wikipedia.org/wiki/Synaptic_pruning). Relevant quote from Wikipedia (https://en.wikipedia.org/wiki/Neuron#Connectivity):
"It has been estimated that the brain of a three-year-old child has about 10^15 synapses (1 quadrillion). This number declines with age, stabilizing by adulthood. Estimates vary for an adult, ranging from 10^14 to 5x10^14 synapses (100 to 500 trillion)."
So, can a distilled 8B model (say, the Deepseek-R1-Distil-Llama-8B or whatever) be "trained up" to a higher parameter 16B Parameter model after distillation from a superior model, or is it forever stuck at the 8B parameters that can just be fine tuned?
How is that relevant? A few examples do not disprove anything. It's pretty common knowledge that the more successful/rich etc. your parents were, the more likely you'll be successful/rich etc.
This does not directly prove the theory your parent comment posits, being that better circumstances during a child's development improve the development of that child's brain. That would require success being a good predictor of brain development, which I'm somewhat uncertain about.
Also from the tutorial: "Unlike Eff, Koka, Links, and other languages that support effect handlers, effects in Multicore OCaml are unchecked currently. A program that does not handle a performed effect fails with a runtime error."