I'm happy to share that today we are releasing the Mockingbird LLM.
At <10B parameters it's an LLM trained to provide optimal results for RAG and structured outputs. Although significantly smaller (and thus faster) than GPT-4 or Gemini-1.5 Pro, it performs at a comparable level for generation, citations, and structured outputs.
I think in addition to all the benchmarks used right now for LLM evaluation (HumanEval and the like). It would be interesting to have a 'hallucination benchmark' with a summarization based hallucination dataset.
At <10B parameters it's an LLM trained to provide optimal results for RAG and structured outputs. Although significantly smaller (and thus faster) than GPT-4 or Gemini-1.5 Pro, it performs at a comparable level for generation, citations, and structured outputs.