Partly true. I think the consensus was it wasn't comparable because Mythos swept the entire codebase and found the vulnerabilities, whereas the open models were told where to look for said vulnerabilities.
Not really. The models were pointed specifically at the location of the vulnerability and given some extra guidance. That's an easier problem than simply being pointed at the entire code base.
Surely the Anthropic model also only looked at one chunk of code at a time. Cannot fit the entire code base into context. So supplying an identical chunk size (per file, function, whatever) and seeing if the open source model can find anything seems fair. Deliberately prompting with the problem is not.
Nobody said anything about an API with thousands of endpoints. Does that even exist? I've never seen it. Wouldn't work on it if I had seen it. Such is the life of a strawman argument.
Further, isn't a decorator in Python (like @mcp.tool) the easy way to expose what is needed to an API, if even if all we are doing is building a bridge to another API? That becomes a simple abstraction layer, which most people (and LLMs) get.
Writing a CLI for an existing API is a fool's errand.
> Writing a CLI for an existing API is a fool's errand.
I don't think your opinion is reasonable or well grounded. A CLI app can be anything including a script that calls Curl. With a CLI app you can omit a lot of noise from the context things like authentication, request and response headers, status codes, response body parsing, etc. you call the tool, you get a response, done. You'd feel foolish to waste tokens parsing irrelevant content that a deterministic script can handle very easily.
reply