For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | gwking's commentsregister

I tried this with Claude code on macOS. I created a new agent user and a wrapper do run Claude has that user, along with some scripts to set permissions and ownership so that I could run simple allow/deny commands. The only problem was that the fancy oauth flow broke. I filed an issue with Anthropic and their ticket bot auto closed it “for lack of interest” or whatever.

I fiddled with transferring the saved token from my keychain to the agent user keychain but it was not straightforward.

If someone knows how to get a subscription to Claude to work on another user via command line I’d love to know about it.


Someone tried this earlier this year but they ended up going with bubblewrap (what Anthropic uses for the sandbox). Here's the blog if you're interested. https://patrickmccanna.net/a-better-way-to-limit-claude-code...

I ended up creating an LXC on my homelab and providing it access there, with a self-hosted gitea server but that's only for side projects that I want to host, not develop actively.


My understanding is that many extension modules are already written to take advantage of multithreading by releasing the GIL when calling into C code. This allows true concurrency in the extension, and also invites all the hazards of multithreading. I wonder how many bugs will be uncovered in such extensions by the free threaded builds, but it seems like the “nuts” choice actually happened a long time ago.


From one “old guy” to another:

PJ = private jet

Diamond hands and to the moon are crypto trader slang

Makes me laugh answering this question, because the meaning is all there if you skip over all the meme words.


I am not that old, but I interpreted that as working in pajamas in Dubai. I guess yours makes more sense.


I wrote a very rudimentary schema and automatic migration system for SQLite. One problem that I ran into recently was deploying changes that spanned two migrations, because the tool doesn’t know how to step through commits to do successive auto-migrations between schema versions. I guess there are rather obvious ways to handle this if you generate and then commit the full migration sql for each schema change. Nonetheless I’m curious if this is a problem you have had to think about, if you find it interesting or if it sounds like a bad path to go down, and if atlas does anything smart in this department. Thanks in advance!


For the record, I started using Xcode before it was called that and people have said this almost every year since. As I recall there was a big hit to its quality when they converted it to obj-c’s short lived garbage collection, and it felt like it never got back to reliable after that.


  > converted it to obj-c’s short lived garbage collection
that was around xcode 4 iirc, that was when interface builder was ducktaped (or maybe i should say intermixed) with xcode (née project builder) to disastrous results in terms of performance... its never really recovered imo...


Ahhh ProjectBuilder...


Speaking for myself, managing a team of 3, the simpler management interface on Hetzner compared to AWS is a major professional advantage.


This may be outdated because git’s defaults have improved a lot over the years. When I first used git on a team was in 2011. As I recall, there were various commands like git log -p that would show nothing for a merge commit. So without extra knowledge of the git flags you would not find what you were looking for if it was in a side path of the merge history. This caused a lot of confusion at times. We switched to a rebase approach because linear history is easier for people to use.

To answer your question directly, if somewhat glibly, I’m glad I rebased every time I go looking for something in the history because I don’t have to think about the history as a graph. It’s easier.

More to your point, there are times when blame on a line does not show the culprit. If you move code, or do anything else to that line, then you have to keep searching. Sometimes it’s easier to look at the entire patch history of a file. If there is a way to repeatedly/recursively blame on a line, that’s cool and I’d love to know about it.

I now manage two junior engineers and I insist that they squash and rebase their work. I’ve seen what happens if they don’t. The merges get tangled and crazy, they include stuff from other branches they didn’t mean to, etc. the squash/rebase flow has been a way to make them responsible for what they put into the history, in a way that is simple enough that they got up to speed and own it.


I’ve idly wondered about this sort of thing quite a bit. The next step would seem to be taking a project’s implementation dependent tests, converting them to an independent format and verifying them against the original project, then conducting the port.


Give coding agent some software. Ask it to write tests that maximise code coverage (source coverage if you have source code; if not, binary coverage). Consider using concolic fuzzing. Then give another agent the generated test suite, and ask it to write an implementation that passes. Automated software cloning. I wonder what results you might get?


> Ask it to write tests that maximise code coverage

That is significantly harder to do than writing an implementation from tests, especially for codebases that previously didn't have any testing infrastructure.


Give a coding agent a codebase with no tests, and tell it to write some, it will - if you don’t tell it which framework to use, it will just pick one. No denying you’ll get much better results if an experienced developer provides it with some prompting on how to test than if you just let it decide for itself.


This is a hilariously naive take.

If you’ve actually tried this, and actually read the results, you’d know this does not work well. It might write a few decent tests but get ready for an impressive number of tests and cases but no real coverage.

I did this literally 2 days ago and it churned for a while and spit out hundreds of tests! Great news right? Well, no, they did stupid things like “Create an instance of the class (new MyClass), now make sure it’s the right class type”. It also created multiple tests that created maps then asserted the values existed and matched… matched the maps it created in the test… without ever touching the underlying code it was supposed to be testing.

I’ve tested this on new codebases, old codebases, and vibe coded codebases, the results vary slightly and you absolutely can use LLMs to help with writing tests, no doubt, but “Just throw an agent at it” does not work.


This highlights something that I wish was more prevalent, Path Coverage. I'm not sure of what testing suites handle path coverage, but I know XDebug for PHP could manage it back when I was doing PHP work. Simple line coverage doesn't tell you enough of the story while path coverage should let you be sure you've tested all code paths of a unit. Mix that with input fuzzing and you should be able to develop comprehensive unit tests for critical units in your codebase. Yes, I'm aware that's just one part of a large puzzle.


But, did you actually give the agent access to a tool to measure code coverage?

If it can't measure whether it is succeeding in increasing code coverage, no wonder it doesn't do that great a job in increasing it.

Also, it can help if you have a pair of agents (which could even be just two different instances of the same agent with different prompting) – one to write tests, and one to review them. The test-writing agent writes tests, and submits them as a PR; the PR-reviewing agent read the PR and provides feedback; the test-writing agent updates the tests in response to the feedback; iterate until the PR-reviewing agent is satisfied. This can produce much better tests than just an agent writing tests without any automated review process.


Have you tried? Beyond the first tests, going all the way up to decent coverage.


I think I've asked this before on HN but is there a language-independent test format? There are multiple libraries (think date/time manipulation for a good example) where the tests should be the same across all languages, but every library has developed its own test suite.

Having a standard test input/output format would let test definitions be shared between libraries.




Maybe tape?


I’ve got to imagine a suite of end to end tests (probably most common is fixture file in, assert against output fixture file) would be very hard to nail all of the possible branches and paths. Like the example here, thousands of well made tests are required.


I appreciate the even tempered question. I’ve been using mypy since its early days, and when pyright was added to vs code I was forced to reckon with their differences. For the most part I found mypy was able to infer more accurately and flexibly. At various times I had to turn pyright off entirely because of false positives. But perhaps someone else would say that I’m leaning on weaknesses of mypy; I think I’m pretty strict but who knows. And like yourself, mine is a rather dated opinion. It used to be that every mypy release was an event, where I’d have a bunch of new errors to fix, but that lessened over the years.

I suspect pyright has caught up a lot but I turned it off again rather recently.

For what it’s worth I did give up on cursor mostly because basedpyright was very counterproductive for me.

I will say that I’ve seen a lot more vehement trash talking about mypy and gushing about pyright than vice versa for quite a few years. It doesn’t quite add up in my mind.


I’ve added ecosystem regression checks to every Python type checker and typeshed via https://github.com/hauntsaninja/mypy_primer. This helped a tonne with preventing unintended or overly burdensome regressions in mypy, so glad to hear upgrades are less of an Event for you


> I will say that I’ve seen a lot more vehement trash talking about mypy and gushing about pyright than vice versa for quite a few years. It doesn’t quite add up in my mind.

agreed! mypy's been good to us over the years.

The biggest problem we're looking to solve now is raw speed, type checking is by far the slowest part of our precommit stack which is what got us interested in Ty.


I jumped through a bunch of hoops to get claude code to run as a dedicated user on macOS. This allowed me to set the group ownership and permissions of my work to control exactly what claude can see. With a few one-liner bash scripts to recursively set permissions it worked quite well. Getting the oauth token token into that user's keychain was an utter pain though. Claude Code does a fancy authorization flow that puts the token into the current user's login keychain, and getting it into the other user's login keychain took a lot of futzing. Maybe there is a cleaner way that I missed.

When that token expired I didn't have the patience to go through it again. Using an API key looked like it would be easier.

If this is of interest to anyone else, I filed an issue that has so far gone unacknowledged. Their ticket bot tried to auto-close it after 30 days which I find obnoxious. https://github.com/anthropics/claude-code/issues/9102#issuec...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You