It depends on the difficulties of the puzzles. If you read the article of the Leela author, (link cited by other comment), then you will see a much different picture: the new DeepMind is better then AlphaZero but much worse than the best open source model. Not specially trained transformer like ChatGPT4o (and even 5o) has absolutely no chance to solve the more difficult puzzles.
Bit confused what the value add is over a framework like DSPy. This still requires you to create an eval dataset with ground truth, basically the only hard part of using DSPy. Easily getting the optimized prompt and having some metrics out of the box is not worth nearly $1k/mo IMO
Side note: I’ve had a lot of luck combining automatic prompt optimization with finetuning. There is definitely some synergy https://raw.sh/posts/chess_puzzles
Thanks for the feedback, love your article diving deep into DSPy! Here's how our platform is different:
1. You are absolutely right, the dataset is a big hurdle for using DSPy. That's why we offer a synthetic dataset generation pipeline for RAG, agents, and a variety of LLM pipelines. More here: https://docs.relari.ai/getting-started/datasets/synthetic
2. Relari is an end-to-end evaluation and optimization toolkit. Real-time optimization is just one part of our data-driven package for building robust and reliable LLM applications.
3. Our tools are framework agnostic. If you can build your entire application on DSPy, that's great! But often we see AI developers hoping to maintain the flexibility and transparency to have their prompts / LLM modules work with different environments.
4. We provide well-designed metrics and/or custom metrics learned from user feedback. We find good metrics very key to making any optimization process (including prompts and fine-tuning) work.
Is it possible to use this for hybrid search in combination with pg_embedding? My understanding is that hybrid search currently requires syncing with Postgres
Nope, it’s a serious project; I mostly made it for personal use during my last semester of college. I rewrote it a few times and packaged it up because I think it’s genuinely useful. Langchain gets you 80% of the way there but you run into issues with it very quickly.
DankGPT is able to draw context from a library of documents (textbook, papers, class slides) to explain any topic and answer complicated reasoning problems.
It’s very similar to ChatPDF, but you can include multiple documents and it has much better context selection. This leads to better answers in practice (less “the source does not contain information on…” and hallucinations)
On lichess puzzles gpt4o with the compiled prompt is around 70%, I think the 270M transformer is around 95%