For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more abelanger's commentsregister

Definitely understand the frustration, the difficulty of Hatchet being general-purpose is that being performant for every use-case can be tricky, particularly when combining many features (concurrency, rate limiting, priority queueing, retries with backoff, etc). We should be more transparent about which combinations of use-cases we're focused on optimizing.

We spent a long time optimizing the single-task FIFO use-case, which is what we typically benchmark against. Performance for that pattern is i/o-bound at > 10k/s which is a good sign (just need better disks). So a pure durable-execution workload should run very performantly.

We're focused on improving multi-task and concurrency use-cases now. Our benchmarking setup recently added support for those patterns. More on this soon!


Hatchet is not stable.


Thanks! Would love to hear more about what type of agent you're building.

We've heard pretty often that durable execution is difficult to wrap your head around, and we've also seen more of our users (including experienced engineers) relying on Cursor and Claude Code while building. So one of the experiments we've been running is ensuring that the agent code is durable when written by LLMs by using our MCP server so the agents can follow best practices while generating code: https://pickaxe.hatchet.run/development/developing-agents#pi...

Our MCP server is super lightweight and basically just tells the LLM to read the docs here: https://pickaxe.hatchet.run/mcp/mcp-instructions.md (along with some tool calls for scaffolding)

I have no idea if this is useful or not, but we were able to get Claude to generate complex agents which were written with durable execution best practices (no side effects or non-determinism between retries), which we viewed as a good sign.


Thanks! Our favorite resources on this (both have been posted on HN a few times):

- https://www.anthropic.com/engineering/building-effective-age...

- https://github.com/humanlayer/12-factor-agents

That's also why we implemented pretty much all relevant patterns in the docs (i.e. https://pickaxe.hatchet.run/patterns/prompt-chaining).

If there's an example or pattern that you'd like to see, let me know and we can get it released.


For an agent that executes locally, or an agent that doesn't execute very often, I'd agree it's arbitrary.

But programming languages make tradeoffs on those very paths (particularly spawning child processes and communicating with them, how underlying memory is accessed and modified, garbage collection).

Agents often involve a specific architecture that's useful for a language with powerful concurrency features. These features differentiate the language as you hit scale.

Not every language is equally suited to every task.


OP here - this type of "checkpoint-based state machine" is exactly what platforms which offer durable execution primitives like Hatchet (https://hatchet.run/) and Temporal (https://temporal.io/) are offering. Disclaimer: am a founder of Hatchet.

These platforms store an event history of the functions which have run as part of the same workflow, and automatically replay those when your function gets interrupted.

I imagine synchronizing memory contents at the language level would be much more overhead than synchronizing at the output level.


This is also how our orchestrator (written in Go) is structured. JP describes it pretty well here (it's a durable log implemented with BoltDB).

https://fly.io/blog/the-exit-interview-jp/


Nice! It makes a lot of sense for orchestrating infra deployments -- we also started exploring Temporal at my previous startup for many of the same reasons, though at one level higher to orchestrate deployment into cloud providers.


Yep, though I haven’t used them, I’m vaguely aware that such things exist. I think they have a long way to go to become mainstream, though? Typical Go code isn’t written to be replayable like that.


I think there's a gap between people familiar with durable execution and those who use it in practice; it comes with a lot of overhead.

Adding a durable boundary (via a task queue) in between steps is typically the first step, because you at least get persistence and retries, and for a lot of apps that's enough. It's usually where we recommend people start with Hatchet, since it's just a matter of adding a simple wrapper or declaration on top of the existing code.

Durable execution is often the third evolution of your system (after the first pass with no durability, then adding a durable boundary).


What are the main differences between temporal and hatchet?


The primary difference is that Hatchet is an all-purpose platform for async jobs, so while durable execution is a pattern that we support, we have a lot of other features like concurrency and fairness control, event ingestion, custom queues, dynamic rate limiting, streaming from a background job, monitoring, alerting, DAG-based executions, etc. There's a bit more on this/our architecture here: https://news.ycombinator.com/item?id=43572733.

The reason I started working on Hatchet was because I'm a huge advocate of durable execution, but didn't enjoy using Temporal. So we try to make the development experience as good as possible.

On the underlying durable execution layer, it's the exact same core feature set.


This reads more like a pitch for open-source than anything else.

> Switching out something, even if it's open source and self-hosted, means that you're rewriting a lot of code.

The point of something open-source and self-hosted is that it resolves nearly all of the "taxes" mentioned in the article. What the article refers to as the discovery, sign-up, integration, and local development tax are all easily solved by a good open-source local development story.

The "production tax" (is tax the right word?) can be resolved by contributions or a good plugin/module ecosystem.


Open source is free if your time is worth nothing.


some people just don't understand business

people is gonna find out why companies pays top dollar for close source alternative vs open source product


I'm a big fan of https://github.com/humanlayer/12-factor-agents because I think it gets at the heart of engineering these systems for usage in your app rather than a completely unconstrained demo or MCP-based solution.

In particular you can reduce most concerns around security and reliability when you treat your LLM call as a library method with structured output (Factor 4) and own your own control flow (Factor 8). There should never be a case where your agent is calling a tool with unconstrained input.


I guess I’ve got some reading and research ahead of me. I definitely would rather support the idea of treating LLM calls more like structured library functions, rather than letting them run wild.

Definitely bookmarking this for reference. Appreciate you sharing it.


> program building is an entropy-decreasing process...program maintenance is an entropy-increasing process, and even its most skillful execution only delays the subsidence of the system into unfixable obsolescence

> Only humans can decrease or resist complexity.

For a simple program, maintenance is naturally entropy-increasing: you add an `if` statement for an edge case, and the total number of paths/states of your program increases, which increases entropy.

But in very large codebases, it's more fluid, and I think LLMs have the potentially to massively _reduce_ the complexity by recommending places where state or logic should be decoupled into a separate package (for example, calling a similar method in multiple places in the codebase). This is something that can be difficult to do "as a human" unless you happen to have worked in those packages recently and are cognizant of the pattern.


Congrats on the launch! I have a few questions (though I know very little about this space):

1. How often is the cause of a denied insurance claim a documentation error vs an intentional denial from an insurance company (either an automated system or medical reviewer)?

2. This feels very conceptually similar to an AI review bot, but the threshold for false positives feels higher. What does the process look like for double checking a false positive in the agent orchestration layer?


Thank you!

1. It really depends on the clinical specialty, but the average is around 25% (e.g. 250M claims denied a year because of documentation mistakes). We work with rehabs where this ratio is above 50%

2. It's triple checking -tun the analysis twice and then verify the conclusion, 3+ separate agent calls


> We work with rehabs where this ratio is above 50%

Medical offices where more than 50% of the denied claims are because of documentation mistakes? I'm confused why they are still operating. Is this not malpractice of some kind?


For them it's a cost of doing business. Many of those claims will be paid after resubmissions (upon fixes the mistakes if it is possible) but the office operates with higher amount of working capital in this case


> For them it's a cost of doing business

"a cost of doing business" reads like they do not care that over half their denials are not real. You are working with business that do not care about humans and will use your tool to more extract profit. I'd rather them go out of business if they cannot get 50% of their claims done in a way that doesn't get them auto rejected.


Be careful what you wish for. We already have a shortage of healthcare providers. Do you want to make it harder to get an appointment? There's nothing wrong with billing at the maximum level legally and contractually allowed.


> There's nothing wrong with billing at the maximum level legally and contractually allowed.

I completely disagree with that statement. By nothing wrong, do you mean it isn't illegal?

It is horrible that some health care providers are denying more than half their claims incorrectly. A strawman argument about making it harder to get an appointment does not change that.


You seem to be mixing up providers and payers (insurers). Providers don't deny claims, but sometimes payers sent claims because the providers coded them incorrectly. There's nothing wrong with coding a claim with all services rendered and charging the maximum agreed rate. If the payers want a lower rate then they can negotiate it next time the contract is up for renewal.

Obviously it would be wrong for providers to bill for things they didn't actually do. That would be fraud. This part isn't in dispute.


Medical malpractice only applies to patient care, not administration. In other words, a provider can't be found liable for malpractice just because there's an error in the patient chart: there has to be some evidence of patient harm as well. Claim denials aren't a legal issue unless there is some sort of fraud or contact violation involved.


Now I understand your question. I can provide and example where a small documentation error may cause patient harm - wrong copy-pasting in a discharge note, copying from other personal medical record and accidentally pulling their diagnosis into the record, and then factoring this diagnosis by the other doctor in their treatment plan


I don't know then, gross negligence? ianal, but why is this an acceptable and profitable way to operate is beyond me.


Errors on a medical chart by themselves wouldn't legally constitute gross negligence (or even regular negligence). But if there was some sort of preventable patient harm then such errors could be evidence for establishing malpractice liability and/or professional sanctions.

In general patients should always check their own chart notes for errors. Most providers now make those available online for free through a secure patient portal.


> Most execution environments are stateful (e.g., they may rely on running Jupyter kernels for each user session). This is hard to manage and expensive if users expect to be able to come back to AI task sessions later. A stateless-but-persistent execution environment is paramount for long running (multi-day) task sessions.

It's interesting how architectural patterns built at large tech companies (for completely different use-cases than AI) have become so relevant to the AI execution space.

You see a lot of AI startups learning the hard way that value of event sourcing and (eventually) durable execution, but these patterns aren't commonly adopted on Day 1. I blame the AI frameworks.

(disclaimer - currently working on a durable execution platform)


I see all of this as a constant negotiation of what is and isn't needed out of traditional computing. Eventually they find that what they want from any of it is determinism, unfortunately for LLMs.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You