For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more pawelduda's commentsregister

This screams AI, 100%


Did anyone test it on 5090? I saw some 30xx reports and it seemed very fast


Incredibly fast, on my 5090 with CUDA 13 (& the latest diffusers, xformers, transformers, etc...), 9 samplig steps and the "Tongyi-MAI/Z-Image-Turbo" model I get:

- 1.5s to generate an image at 512x512

- 3.5s to generate an image at 1024x1024

- 26.s to generate an image at 2048x2048

It uses almost all the 32Gb Gb of VRAM and GPU usage. I'm using the script from the HF post: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo


Weird, even at 2048 I don’t think it should be using all your 32GB VRAM.


It stays around 26Gb at 512x512. I still haven't profiled the execution or looked much into the details of the architecture but I would assume it trades off memory for speed by creating caches for each inference step


IDK. Seems odd. It’s an 11GB model, I don’t know what it could caching in ram.


Even on my 4080 it's extremely fast, it takes ~15 seconds per image.


Did you use PyTorch Native or Diffusers Inference? I couldn't get the former working yet so I used Diffusers, but it's terribly slow on my 4080 (4 min/image). Trying again with PyTorch now, seems like Diffusers is expected to be slow.


Uh, not sure? I downloaded the portable build of ComfyUI and ran the CUDA-specific batch file it comes with.

(I'm not used to using Windows and I don't know how to do anything complicated on that OS. Unfortunately, the computer with the big GPU also runs Windows.)


Haha, I know how it goes. Thanks, I'll give that a try!

Update: works great and much faster via ComfyUI + the provided workflow file.


Sounds plausible but I guess it's something that they would've confirmed, had it been true

Or it was ABS-CF but they forgot to dry the filament /s


Unless you know and trust person X, you don't want to authorize and interact with such contracts. Scammers will leave loopholes in code so they can, for example, grab all funds deposited to the contract.

Normal contracts that involve money operations would have safeguards that disallow the owner to touch balance that is not theirs. But there's billion of creative attack vectors to bypass that, either by that person X, or any 3rd party


The end effect certainly gives off "understanding" vibe. Even if method of achieving it is different. The commenter obviously didn't mean the way human brain understands


Why is this particular benchmark important?


Thus far, this is one of the best objective evaluations of real world software engineering...


I concur with the other commenters, 4.5 is a clear improvement over 4.


Idk, Sonnet 4.5 score better than Sonnet 4.0 on that benchmark, but is markedly worse in my usage. The utility of the benchmark is fading as it is gamed.


I think I and many others have found Sonnet 4.5 to generally be better than Sonnet 4 for coding.


Maybe if you confirm to its expectations for how you use it. 4.5 is absolutely terrible for following directions, thinks it knows better than you, and will gaslight you until specifically called out on its mistake.

I have scripted prompts for long duration automated coding workflows of the fire and forget, issue description -> pull request variety. Sonnet 4 does better than you’d expect: it generates high quality mergable code about half the time. Sonnet 4.5 fails literally every time.


I'm very happy with it TBH, it has some things that annoy me a little bit:

- slower compared to other models that will also do the job just fine (but excels at more complex tasks),

- it's very insistent on creating loads of .MD files with overly verbose documentation on what it just did (not really what I ask it to do),

- it actually deleted a file twice and went "oops, I accidentaly deleted the file, let me see if I can restore it!", I haven't seen this happen with any other agent. The task wasn't even remotely about removing anything


The last point is how it usually fails in my testing, fwiw. It usually ends up borking something up, and rather than back out and fix it, it does a 'git restore' on the file - wiping out thousands of lines of unrelated, unstaged code. It then somehow thinks it can recover this code by looking in the git history (??).

And yes, I have hooks to disable 'git reset', 'git checkout', etc., and warn the model not to use these commands and why. So it writes them to a bash script and calls that to circumvent the hook, successfully shooting itself in the foot.

Sonnet 4.5 will not follow directions. Because of this, you can't prevent it like you could with earlier models from doing something that destroys the worktree state. For longer-running tasks the probability of it doing this at some point approaches 100%.


> The last point is how it usually fails in my testing, fwiw. It usually ends up borking something up, and rather than back out and fix it, it does a 'git restore' on the file - wiping out thousands of lines of unrelated, unstaged code. It then somehow thinks it can recover this code by looking in the git history (??).

Man I've had this exact thing happen recently with Sonnet 4.5 in Claude Code!

With Claude I asked it to try tweaking the font weight of a heading to put the finishing touches on a new page we were iterating on. Looked at it and said, "Never mind, undo that" and it nuked 45 minutes worth of work by running git restore.

It immediately realized it fucked up and started running all sorts of git commands and reading its own log trying to reverse what it did and then came back 5 minutes later saying "Welp I lost everything, do you want me to manually rebuild the entire page from our conversation history?

In my CLAUDE.md I have instructions to commit unstaged changes frequently but it often forgets and sure enough, it forgot this time too. I had it read its log and write a post-mortem of WTF led it to run dangerous git commands to remove one line of CSS and then used that to write more specific rules about using git in the project CLAUDE.md, and blocked it from running "git restore" at all.

We'll see if that did the trick but it was a good reminder that even "SOTA" models in 2025 can still go insane at the drop of a hat.


The problem is that I'm trying to build workflows for generating sequences of good, high quality semantically grouped changes for pull requests. This requires having a bunch of unrelated changes existing in the work tree at the same time, doing dependency analysis on the sequence of commits, and then pulling out / staging just certain features at a time and committing those separately. It is sooo much easier to do this by explicitly avoiding the commit-every-2-seconds workaround and keeping things uncommitted in the work tree.

I have a custom checkpointing skill that I've written that it is usually good about using, making it easier to rewind state. But that requires a careful sequence of operations, and I haven't been able to get 4.5 to not go insane when it screws up.

As I said though, watch out for it learning that it can't run git restore, so it immediately jumps to Bash(echo "git restore" >file.sh && chmod +x file.sh && ./file.sh).


I think this is probably just a matter of noise. That's not been my experience with Sonnet 4.5 too often.

Every model from every provider at every version I've used has intermingled brilliant perfect instruction-following and weird mistaken divergence.


What do you mean by noise?

In this case I can't get 4.5 to follow directions. Neither can anyone else, aparantly. Search for "Sonnet 4.5 follow instructions" and you'll find plenty of examples. The current top 2:

https://www.reddit.com/r/ClaudeCode/comments/1nu1o17/45_47_5...

https://theagentarchitect.substack.com/p/claude-sonnet-4-pro...


Not my experience at all, 4.5 is leagues ahead the previous models albeit not as good as Gemini 2.5.


I find 4.5 a much better model FWIW.


1. Set up SSH access to your PC (I recommend tailscale)

2. Install and start tmux session on your PC so it stays synced and survives disconnects and what not.

3. Install Termux, SSH into your PC, attach to the tmux session.

Bonus: tmux layout will scale nicely to your phone aspect ratio

You can also setup mosh connection if you expect signal to be frequently lost due to poor network quality, etc.


Don't forget to intensely shake your head after consumption for a proper brain flush


Thoughts on possible implications for users in foreseeable future? We built a lot using dbt and can't really think of going back or switching to alternatives


Fivetran isn't really much of a transformation layer so this is likely just a move to lock-in customers of both companies by upselling an ingestion/transformation layer to existing customers.

The bigger question mark to me is that Fivetran recently acquired Tobiko, the company behind a dbt competitor SQLMesh. The Tobiko team said their focus has been on dbt-compatibility because a lot of Fivetran customers use dbt for their transformation layer. I fear it may have just been a way to get rid of competition leading up to this deal. I can't imagine Fivetran spent a ton of money just to have 2 products that do very similar things.

We use both open-source SQLMesh as well as their cloud offering Tobiko Cloud. Following the acquisition, we were annoyed that focus was going to go to dbt compatibility because there was a bunch of stuff on their roadmap that would help us that was now deprioritized. Thankfully, they still offer great support to us and delivered a few features that have given us some quality of life improvements. With this announcement, I'm worried we're going to end up being forced to migrate to dbt...


Full disclosure: I am a PM at Fivetran who is very excited about this.

We are fully committed to open source dbt and don't want to build a 'walled garden'. Interoperability is one of the key value propositions of both Fivetran and dbt. While I'm biased, I think the main implications for users is that their favorite tooling will be with one vendor who cares about what makes them great.

You can read a bit more here: https://www.fivetran.com/blog/the-era-of-open-data-infrastru...


Thanks!


What's the lock-in?


I'm wondering too. We run dbt on-prem. Worst that could happen is we don't get any more free updates. But we have the software and it will continue to run.


the concern is that dbt-core will become stagnant.


It basically already has since they started developing dbt Fusion, so in that respect this probably doesn’t change much.

I expect they’ll keep developing Fusion but possibly as even more of a commercial-only offering than it already was.


What's the problem specifically? Are you banking on some future features? Can't fix the bugs yourself? Worried it won't be compatible with future data warehouses?

I know people don't like it these days, but you can just continue to run old software.


dbt in particular is effectively useless without maintained and up-to-date connectors to your particular database.


Do your database vendors often do breaking changes to their protocols? dbt just generates SQL and that's not going anywhere.


The cloud component is probably sticky if you have come to rely on those parts.


Are you using their cloud offering or just the software itself?


Just the software


LLMs have brought back my excitement with coding.

At work, they help me to kickstart a task - taking the first step is very often the hardest part. It helps me grok new codebases and write boring parts.

But side projects is where the real fun starts - I can materialize random ideas extremely quickly. No more hours spent on writing boilerplate or fighting the tooling. I can delegate the parts I'm not good at the agent. Or one-prompt a feature, if I don't like the result or it doesn't work, I roll it back.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You