Following up on my last post about optimizing tool selection with differentiable programming, I’ve been thinking about how to extend those ideas to full agent workflows. This post shares some early experiments using DSPy to optimize routing and structure end-to-end for a sample customer service agent workflow. Feedback welcome!
for sure, there's a way here where I think we ought to be able to learn multiple tool calls and prompts together with real world data. investigating that next.
re: different tools (apis vs mcps). in my mind, there should be no real difference at what kind of tools is called at this moment since I model this as a softmax over a label set of tools.
that said, an idea I want to investigate is whether tools can live in a learned embedding space, where selection isn’t a softmax over discrete labels but a nearest-neighbor or attention mechanism over continuous vectors.
this is the intuition I'm developing as we speak and in some of my other comments on this thread (see differentiable state machine comment).
+1 - you can propagate the loss for a workflow across prompts + tools, which would make it much better to do resilient workflows. or "agents" as everyone calls them now ;)
+1 - the biggest issue is not being able to fine tune the llm to learn the specifics of how to make a tool call better over time, which an approach like this can bring to the table.