I have the $100/mo Claude plan, I've used 5% of my weekly and it resets this evening. I'm not a heavy user, but I also feel like I'm not a slouch either. I don't get how people are rolling through their usage so fast.
I should probably drop to that plan. I'm averaging around $800/mo in token usage based on ccusage, but I never hit plan limits and am told to wait. I've used it quite extensively this week with a lot of changes to local infrastructure, but still showing 0% utilization across current and weekly sessions according to /usage.
I don't know about the author, but I recently saw an article where the author of Claude code apparently spins up multiple instances at once (note that it could have just been a marketing ploy to get people to use more tokens)
Just use Git worktrees and a lightweight VM environment (I like macOS native sandbox-exec) and you can spawn as many sessions as you want. I've run upwards of 30 at once on my M2 Pro with no noticeable resource impact.
I would love to know more, but am so ignorant about working with multiple sessions running in parallel that I don't know where to start - do you have a blog or published article to share that might get people like me into a more educated space
I tend to have multiple sessions running when doing stuff with CC. But rarely in the same space / domain. I'll have one working on some programming task, another making changes to my NAS and another doing research for some other topic (most recently networking gear upgrade paths).
I went from an M1 16GB to M5 Pro 48GB. I'm running Qwen 3.5 with it locally. I've been sending it and Opus 4.6 the same prompts in identical copies of codebases, using Claude Code for both (using ollama to launch with Qwen). It is about 4x slower than sending the request to Opus. The results are not nearly as good either.
One task that I sent to both was to make a website to search transcription files generated from video files that were also provided. I wanted to have the transcriptions display and be clickable. When clicked have the video skip to that point in play. The Opus website looked nice, and worked well. Qwen couldn't get the videos to play.
Now, for day-to-day tasks, the M1 wasn't a slouch, but the M5 Pro is still a big step forward in terms of performance.
That's helpful insight. My prediction is that as it keeps getting more expensive for the big players to run these models, we will start to see some kind of hybrid workload where they offload some of the work to your computer for smaller agents while keeping the orchestration and planning running in the data centers.
So I think the investment in the extra hardware is worth it, even if you don't currently plan on running LLMs locally.
I mean I get you but, I also know that there are better ways to operationalize local AI. Your POV still remains as super helpful context. I feel like a lot of local-vs-cloud discussion stops at “slower and worse,” but the useful part is understanding where it broke down like model quality, tool use, runtime setup (and not stop at the task performance between the two in it self).
Cmd+Space to open spotlight, type in the first 3 or 4 letters of whatever you're trying to do (an application to open, or a system setting to change) and then Return gets me about where I need to go most of the time. Cmd+Tab and Cmd+` for window selection. I don't do much else on the OS itself so my bases are covered.
I had a fun one where Opus 4.6 could not properly export a 3D model to a 3MF for multi-color 3D printing. Ultimately I ended up having it output each color individually and I just import them together into the slicer.
reply