We got early access to Tinker from Thinking Machines and spent the week putting it through its paces here at Ramp Labs.
We used it to explore how RL post training performance changes when splitting data by domain and training an ensemble of specialized models versus a single model trained on everything at once.
Tinker handled the heavy lifting like infrastructure, async rewards, and GPU orchestration so we could focus on the experimentation loop instead of wrangling configs or pipelines. It’s one of the smoother experiences we’ve had running large scale RL workflows. Check out our findings.
We used it to explore how RL post training performance changes when splitting data by domain and training an ensemble of specialized models versus a single model trained on everything at once.
Tinker handled the heavy lifting like infrastructure, async rewards, and GPU orchestration so we could focus on the experimentation loop instead of wrangling configs or pipelines. It’s one of the smoother experiences we’ve had running large scale RL workflows. Check out our findings.