kvptkr's comments

kvptkr · on March 22, 2024

Ooh, interesting - starred and going to dig into this later today!

kvptkr · on March 22, 2024

Interesting! I think you’re right in saying that the middleman you’re talking about has to be really good for something like this to actually be useful, especially for people very comfortable with debugging tools + their codebase. If I understand correctly, you’re saying that the most productive tool for you would be one that can present you with more relevant data, in a structured way (Redux, etc). At first, we actually did think of making a nice IDE with all that data, but found out it kind of already exists - https://pytrace.com/, and we found it to be more cumbersome to use than anything! Our belief is that these tools can help, but there’s an irreducible amount of reasoning that needs to happen, which takes time and effort, and we think a tool like this might be able to reduce that by offloading the reasoning to LLMs. I guess what I’m saying is that there’s a cap on how much useful information a tool can give a developer to enable them to reason better, and I’m really interested in seeing if it’s possible to reduce how much reasoning is needed in the first place. Curious to hear what you think - thanks for the thoughtful comment!

danShumway · on March 22, 2024

:shrug: I'm one person on the Internet, so if using the LLM makes it work better for more of your users, go with that.

I do think data filtering and visualization is an important value add, Pytrace looks cool, but it doesn't look much different from what I get if I debug Javascript in a web browser, so I think there's a ton of room to improve. Visual representations of data transformations and code paths are a relatively unexplored area across the entire software industry imo.

If an LLM could flat-out reduce my need to reason and fix the bug for me, great -- but I've worked with coders that I respect a lot, and remote debugging with them has always been a pain and made narrowing down issues harder. I've never enjoyed debugging something remotely where I was working through a middleperson and couldn't look at what was going on; it's helpful to have multiple eyes on the code, but not if I have to use someone else's eyes to look at what's happening.

So for me, in order to successfully reduce the amount of reasoning I need to do and overcome the downside of me not being able to visualize the timeline/data, the LLM would need to be better at fixing these bugs than professional developers in industry: developers who are already intimately familiar with the codebases I'm debugging because they wrote a significant portion of the code. It would need to be better at coding than professional humans. And I just don't think there's anyone who would say that GPT-4 is close to that level yet.

What I could see is, maybe -- if I have access to that data, and the LLM is just kind of on-the-side, maybe at that point it can offer helpful advice and there wouldn't be a downside because I would still be able to debug as fast as I can using all of the available data, and if the LLM can occasionally find something I missed, then great. Peer-debugging sessions with multiple coders are great, so at least in theory I could see some value from an LLM on that stuff, even if I'm a little skeptical about potential performance. And if the LLM wasn't in front of the entire data, if it didn't work then no worries, the data is still there.

But again, if people like it, then it doesn't matter what I think. Why I wouldn't use the tool is less important than why someone would use the tool, and if integration with the LLM makes people want to use the tool, then... I mean, not everyone has identical work styles. Different things might work for different people.

kvptkr · on March 22, 2024

Oof, I'm sorry to hear that - I don't think we had any Django projects in the set of projects we were testing this out on. I just filed an issue here and hopefully fix it asap - https://github.com/leapingio/leaping/issues/2

drcongo · on March 23, 2024

No worries! Good luck with the project.

kvptkr · on March 22, 2024

So interestingly enough, we first tried letting GPT interact with pdb, through just a set of directed prompts, but we found that it kept hallucinating commands, not responding with the correct syntax and really struggling with line numbers. That's why we pivoted to just getting all the relevant data upfront GPT could need and letting GPT synthesize that data into a singular root cause.

I think we're going to explore the local model approach though - you raise some really great points about having more granular control over the state of the model.

pedrovhb · on March 22, 2024

Interesting! Did you try the function calling API? I feel you with the line number troubles, it's hard to get something consistent there. Using diffs with GPT-4 isn't much better in my experience; I didn't extensively test that, but from what I did it rarely produced synctatically valid diffs that could just be sent to `patch`. One approach I started playing with was using tree-sitter to add markers to code and let the LLM specify marker ranges for deletion/insertion/replacement, but alas, I got distracted before fully going through with it.

In any case, I'll keep an eye on the project, good luck! Let me know if you ever need an extra set of hands, I find this stuff pretty interesting to think about :)

kvptkr · on March 22, 2024

Haha thanks! Yeah, I think that's definitely a logical next step. We do something similar for the larger bug resolution platform we've been working on, so it shouldn't be too hard to port over!

HN For You