For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | karolist's commentsregister

can you link to that gist? I'd be interested to read through it

I forked pi-mono to freeze it.

Here is a session of pi analyzing coding-agent package itself.

https://ontouchstart.github.io/.pi/agent/sessions/--pi-mono-...

It was created via a non-interactive CLI command in a docker container connected to a local llama.cpp server with a very limited model. $0 token cost.



My reading of that isn't that the harness matters so much as the overall platform environment that agents operate in and the approach taken by the team.

    > Before Blitzy starts any work on code generation, the platform launches collaborative agents to deeply analyze the repository – mapping dependencies, understanding conventions, and capturing domain logic. This documentation process can take hours or days. When prompted to add a feature, refactor code or fix bugs, Blitzy replies with a highly detailed technical specification.
The same approach could be taken with any harness with a skill to perform this step first before starting work.


What exactly are you pointing out? I read the link and the linked thread and it's not clear what position is being presented.

I don't see evidence that the harness -- rather than the approach to information indexing and agent tooling -- makes much of a difference.

You can make a case "this harness bakes X in" (or in the case of pi "this harness bakes nothing in; you choose your own adventure"), but at the end of the day, skills are just markdown files and CLIs and shell scripts can be used by any harness; they are portable. CC allows override of the system prompt[0] and I would guess most harnesses have similar facilities. I don't see how the harness is going to be the bigger impact versus the configured tooling (skills, scripts, plugins).

The extraordinary claim here is that if I configured pi and CC, Codex, etc. with the same system prompt, same tools, same skills, that pi would outperform CC, Codex. That's what it means to say the harness matters. That just doesn't seem right; rather its the configuration of tools, skills, and default prompt that matters.

[0] https://code.claude.com/docs/en/cli-reference#system-prompt-...


My point is pi-coding-agent [1] is a very well designed and implemented open source project that we all can learn from as software engineers. His blog post about his decision making [2] is also very well written.

I should've given original links instead of noisy HN threads.

[1] https://github.com/badlogic/pi-mono/blob/main/packages/codin...

[2] https://mariozechner.at/posts/2025-11-30-pi-coding-agent/

pi-conding-agent itself is not a product yet and it won't have much value to end users other than those using the products built on top of it.


You can have an opinion about a tool as a user, without ever having ability to create such a tool yourself, that's literally what every tech and auto reviewer does.

Sure, and the less you understand about the tool’s fundamental capabilities, the less useful your opinion is. The best reviewers have deep knowledge.

You can use this logic to say all products are perfect and any criticisms of them by users are moot because their creator knows them best.

This, at best bullet talking points were fed to the prompt and given and output length restriction, it's padded to fit the space diluting the message to the point only an LLM can


Ditto, I don't see myself upgrading in the near future, the 64GB M1 Max I paid 2499 at the end of 2023 still feels like a new machine, nothing I do can slow it down. Apple kept OS updated for around 6 years in Intel times, I don't see how they can drop support for this one tbh. I'm still paying for apple care since I depend on it so much


To replace Kubernetes, you inevitably have to reinvent Kubernetes. By the time you build in canaries, blue/green deployments, and rolling updates with precise availability controls, you've just built a bespoke version of k8s. I'll take the industry standard over a homegrown orchestration tool any day.


We've used ECS back when we were on AWS, and now GCE.

We didn't have to invent any homegrown orchestration tool. Our infra is hundreds of VMs across 4 regions.

Can you give an example of what you needed to do?


Really? What deploys your code now? I'm SRE, walk me through high level. How do I roll back?


It used be Google Deployment Manager but that's dead soon so terraform.

To roll back you tell GCE to use the previous image. It does all the rolling over for you.

Our deployment process looks like this:

- Jenkins: build the code to debian packages hosted on JFrog

- Jenkins: build a machine image with ansible and packer

- Jenkins: deploy the new image either to test or prod.

Test deployments create a new Instance Group that isn't automatically attached to any load balancer. You do that manually once you've confirmed everything has started ok.


ECS deployments. Automatically rolls back on failure. Not sexy but it works reliably.


I've moved my SaaS I'm developing to SeaweedFS, it was rather painless to do it. I should also move away from minio-go SDK to just use the generic AWS one, one day. No hard feelings from my side to MinIO team though.


The amount of times benchmarks of competitors said something is close to Claude and it was remotely close in practice in the past year: 0


I honestly feel like people are brainwashed by anthropic propaganda when it comes to claude, I think codex is just way better and kimi 2.5 (and I think glm 5 now) are perfectly fine for a claude replacement.


So much money is on the line for US super scalers that they probably pay for ‘pushes’ on social media. Maybe Chinese companies are doing the same.


I would say that’s more certain than just a “probably“. I would bet that some of the ridiculous fear mongering about language models trying to escape their server, blackmail their developers, or spontaneously participating in a social network are all clandestine marketing campaigns. The technology is certainly amazing and very useful, but I don’t think any of these terminator stories were boosted by the algorithms on their own.


> I think codex is just way better

Codex was super slow till 5.2 codex. Claude models were noticeably faster.


Was this text run through LLM before posting? I recognize that writing style honestly; or did we simply speak to machines enough to now speak like machines?


Yes. This is absolutely chatgpt-speak. I see it everywhere now. It's inescapable. At least this appears to be largely human authored and have some substance, which is generally not the case when I see these LLM-isms.


Interesting, though I never had enough custom scripts to justify this, I prefer oh-my-zsh plugin style short aliases instead, i.e. https://github.com/ohmyzsh/ohmyzsh/tree/master/plugins/git


it's a parody of the infamous https://news.ycombinator.com/item?id=9224


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You