SknCode's comments

sigmoid10 · 2026-03-20T11:00:12 1774004412

Same way you distill any model. Training data efficiency matters only while you train the source model/ensemble. Once you have that you are purely compute bound during distillation.

SknCode · 2025-10-12T18:20:00 1760293200

I am not sure if I understand things correctly.

I came to believe the LLMs work with token embeddings. Is then the REFRAG only "something" in front of the LLM, and the decoder is the RL policy which expands only some token chunk embeddings into token embeddings feedable to LLM? Or the REFRAG needs you to 'tune' the LLM to be able to work with both token embeddings and token chunk embeddings?

HN For You