Try this trick that I learned from Cohere:
- Fetch top 10*k (i.e. 100) results using the hamming distance
- Rerank by taking dot product between query embedding (full precision) and binary doc embeddings
- Show top-10 results after re-ranking
OpenAI recommends using o1 to generate the verbose plan and then chain the verbose output to a cheaper model (e.g. gpt-4o-mini) to convert it into structured data / function calls / summary etc. They call it planner-executor pattern. [1]
1. Key ideas: I basically skim through blog posts and videos from conferences to be aware of key ideas in Machine Learning
2. Fundamentals: Parallelly, I take MOOC course every once in a while to learn fundamental concepts from first principles
3. Implementation: I either use the ideas I learned from (1) and (2) in a work project or write a blog post to make it concrete on my personal blog: https://amitness.com/
You have to set 'strict' to True manually to use the same grammar-based sampling they use for structured outputs.
https://platform.openai.com/docs/guides/function-calling?api...