More

lmeyerov · 2026-04-09T14:22:44 1775744564

I'm not too familiar with etsy, but presumably most etsy sellers are closer to being lemonade stands than they are to being ikea

And yes, sometimes it's nice to support a local lemonade stand. For my family's income, I know which segment I'd feel more confident to work for..

kranner · 2026-04-09T14:32:02 1775745122

Quality indie software in a niche that Ikea is not addressing can make a decent income unlike a lemonade stand.

And unlike at (this hypothetical) Ikea, you wouldn't have to maintain the impression of 20x AI-augmented output to avoid being fired. Well, you could still use AI as much as you want, but you wouldn't have to keep proving you're not underusing it.

lmeyerov · 2026-04-06T03:44:12 1775447052

Evals or GTFO

volatilityfund · 2026-04-06T19:22:35 1775503355

5x productivity boost in merged PRs (lots of open PR & merge rate goes down, but net positive)

Starting to build custom tooling around new "friction" points in dev cycle

(eng IC perspective)

lmeyerov · 2026-04-08T06:31:55 1775629915

Evals let us agree on the baseline, measurement, etc, and compare if simple things others do perform just as well. For same reason, instead of 'works on my box' and 'my coding style', use one of the many community evals vs making up your own benchmark.

That helps head off much of many of the unfalsifiable discussions & claims happening and moves everyone forward.

aid-ninja · 2026-04-09T03:11:33 1775704293

a rust version of that compiler (that the project runs on) ran at 480k claims/sec and it was able to deterministically resolve 83% of conflicts across 1 million concurrent agents (also 393,275x compression reduction @ 1m agents on input vs output, but different topics can make the compression vary)

natively claude (and other LLM) will resolve conflicting claims at about 51% rate (based on internal research)

the built in byzantine fault tolerance (again, in the compiler) is also pretty remarkable, it can correctly find the right answer even if 93% of the agents/data are malicious (with only 7% of agents/data telling us the correct information)

basically the idea here is if you want to build autonomous at scale, you need to be able to resolve disagreement at scale and this project does a pretty nice job at doing that

lmeyerov · 2026-04-09T04:02:35 1775707355

My question was on claims like "5x productivity boost in merged PRs (lots of open PR & merge rate goes down, but net positive)", eg, does this change anything on swe-bench or any other standard coding eval?

volatilityfund · 2026-04-09T06:38:24 1775716704

The ecosystem is 8 tools plus a claude code plugin, the unlock was composing those tools (I don't regularly use all 9). The 5x claim was from /insights (claude code)

Not for everyone, but it radically changed how I build. Senior engineer, 10+ years

Now it's trivial to run multiple projects in parallel across claude sessions (this was not really manageable before using wheat)

Genuinely don't remember the last time I opened a file locally

lmeyerov · 2026-04-09T20:33:46 1775766826

It sounds like the answer is "No, there is no repeatable eval of the core AI coding productivity claim, definitely not on one of the many AI coding benchmarks in the community used for understanding & comparison, and there will not be"

volatilityfund · 2026-04-10T03:58:20 1775793500

My data is from Anthropic

Not sure how it works under the hood, probably a better question for them

Perhaps you are misunderstanding the entire premise of this project, this is not an LLM

lmeyerov · 2026-04-11T08:45:53 1775897153

Maybe there's a fundamental miscommunication here of what evals are?

Evals apply not just to LLMs but to skills, prompts, tools, and most things changing the behavior of compound AI systems, and especially like the productivity claims being put forth in this thread.

The features in the post relate directly to heavily researched areas of agents that are regularly benchmarked and evaluated. They're not obscure, eg, another recent HN frontpage item benchmarked on research and planning.

aid-ninja · 2026-04-11T21:57:13 1775944633

your question makes sense, it's just not in current scope

we are still benchmarking the compiler at scale and the LLM tools that were made were created as functional prototypes to showcase a single example of the compiler's use case

since much of the unlock here is finding different applications for the compiler itself, we simply don't have the bandwidth to do much benchmarking on these projects on top of maintaining the repos themselves

all the code is open source and there is nothing stopping anyone from running their own benchmarks if they were curious

btw

https://news.ycombinator.com/item?id=47733217

lmeyerov · 2026-03-21T20:26:44 1774124804

Speaking of embeddable, we just announced cypher syntax for gfql, so the first OSS CPU/GPU cypher query engine you can use on dataframes

Typically used with scaleout DBs like databricks & splunk for analytical apps: security/fraud/event/social data analysis pipelines, ML+AI embedding & enrichment pipelines, etc. We originally built it for the compute-tier gap here to help Graphistry users making embeddable interactive GPU graph viz apps and dashboards and not wanting to add an external graph DB phase into their interactive analytics flows.

Single GPU can do 1B+ edges/s, no need for a DB install, and can work straight on your dataframes / apache arrow / parquet: https://pygraphistry.readthedocs.io/en/latest/gfql/benchmark...

We took a multilayer approach to the GPU & vectorization acceleration, including a more parallelism-friendly core algorithm. This makes fancy features pay-as-you-go vs dragging everything down as in most columnar engines that are appearing. Our vectorized core conforms to over half of TCK already, and we are working to add trickier bits on different layers now that flow is established.

The core GFQL engine has been in production for a year or two now with a lot of analyst teams around the world (NATO, banks, US gov, ...) because it is part of Graphistry. The open-source cypher support is us starting to make it easy for others to directly use as well, including LLMs :)

lmeyerov · 2026-03-19T00:02:40 1773878560

*legal in the US

lmeyerov · 2026-03-18T20:48:49 1773866929

Apple and Google are facilitating the data sales

Specifically, these big companies revenue share with app companies who in turn increase monetization via selling your private information, esp via free apps. In exchange for Apple etc super high app store rake percentage fees, they claim to run security vetting programs and ToS that vet who they do business with and tell users & courts that things are safe, even when they know they're not.

It's not rocket science for phone OS's to figure out who these companies are and, as iOS / android os users already get tracked by apple/google/etc, triangulate to which apps are participating

cheriot · 2026-03-18T23:24:59 1773876299

I'm game for throwing rocks at Apple and Google, but I don't get this one.

> consumer apps embed ad SDKs → those SDKs feed location signals into RTB ad exchanges → surveillance-oriented firms sit in the RTB pipeline and harvest bid request data even without winning auctions

Would you ban ad supported apps? Assuming the comment you're responding to is realistic, I'm not sure how the OS is to blame.

godelski · 2026-03-19T01:03:42 1773882222

Neither big players have refined enough permissions. These set users up for giving away more data than they think.

Maybe one clear example is needing a permission once for setup and then it remaining persistent.

An easy demonstration is just looking at what Graphene has done. It's open source and you wana say Google can't protect their users better? Certainly Graphene has some advanced features but not everything can be dismissed so easily. Besides, just throw advanced features behind a hidden menu (which they already have!). There's no reason you can't many most users happy while also catering to power users (they'll always complain, but that's their job)

https://grapheneos.org/features

autoexec · 2026-03-19T00:08:18 1773878898

> Would you ban ad supported apps?

There's no need to ban ad supported apps when you can just ban the practice of using ads targeting users based on individual characteristics.

hsbauauvhabzb · 2026-03-19T03:55:11 1773892511

You trust the adtech companies to pinky promise to totally not do that anymore?

interactivecode · 2026-03-19T04:46:19 1773895579

how about jailing CEO's of companies who do this?

hsbauauvhabzb · 2026-03-19T05:22:57 1773897777

I’m not sure that’s how corporate blame works. The ceo signed off on the CIOs proposal to streamline data analytics logs via WeTotallyWontSiphonOffYourDataAndSellIt incorporated for user improvement purposes, which happens to be owned by the CFO’s brother in law. How were the CIO and CEO to know that a third party was selling off the data, and how was that third party to know that the sale of the data to another party who then onsold the data to the fbi would be illegal?

lcnPylGDnU4H9OF · 2026-03-19T11:54:10 1773921250

> How were the CIO and CEO to know that a third party was selling off the data, and how was that third party to know that the sale of the data to another party who then onsold the data to the fbi would be illegal?

Ask yourself the same question about personal health data and the answer reveals itself: the CEO and CIO know (or should know) that the vendor needs to be HIPAA-compliant or it's their necks (the CEO's and CIO's), so they look for a vendor who advertises as being HIPAA-compliant.

Pass legislation to the same effect for all PII and the CEO and CIO will then make requirements of the vendor. If the vendor lies, they get fired because the company hiring them is culpable. The vendor may also be subject to civil and/or criminal penalties. It seems simple, other than the fact that we have a federal legislature with no apparent interest in solving this problem, alongside a populace which either doesn't notice or doesn't care about that.

To answer the question more pithily: communication.

fwn · 2026-03-20T14:34:37 1774017277

> I’m not sure that’s how corporate blame works.

In regulated industries, like finance and taxation, regulators deliberately assign responsibility to individuals, so misconduct doesn’t get lost inside the company or within its corporate stakeholder network. That removes a lot of friction once you want to hold someone liable.

I've read our parents comment as an implicit proposal to establish similar structures in tech.

lmeyerov · 2026-03-19T15:27:41 1773934061

I would ban apps using unsafe ad platforms

If I was simultaneously also the owner of the ad platform, I'd fix it & knock out the bad players, or get ready to be sued for a decade+ of knowing malpractice

And if I was a US citizen seeing the companies being involved be sued for being monopolies and abusing their position, and then seeing them cry security in court yet knowingly do this for a decade+, I'd feel frustrated by successive left + right US administrations & voters

y1n0 · 2026-03-19T19:14:48 1773947688

They are all unsafe. It’s a huge source of revenue for ad companies.

lmeyerov · 2026-03-19T00:00:15 1773878415

You can trace the big players

If Google & Apple & friends refused to take a rake and opened distribution, then I'd agree, net neutrality etc, not their problem

But they own so much, and so deep into the pipeline, and explain their fees to courts because "security"... and then don't do investigations. They employ some of the best security analysts in the world and have $10-30B/yr revenue tied to just the app store fees, so they very much can take a big bite out of this if they wanted.

godelski · 2026-03-19T01:08:03 1773882483

  > They employ some of the best security analysts in the world and have $10-30B/yr revenue

I'll never not be impressed by how many people will defend trillion dollar organizations and say that things are too expensive. Especially when open source projects (including forks!) implement such features.

I'm completely with you, they could do these things if they wanted to. They have the money. They have the manpower. It is just a matter of priority. And we need to be honest, they're spending larger amounts on slop than actual fixes or even making their products better (for the user).

antonvs · 2026-03-19T07:35:52 1773905752

“Priorities” is far too soft a term in this context. These are anti-priorities: not just things they choose not to work on, but things they’ll spend big money to prevent, up to and including bribing, uh I mean lobbying, lawmakers.

m463 · 2026-03-19T04:13:15 1773893595

This is really simple to explain:

Apple does not let you restrict app network access[1]

You have no ability to know who your app is connecting to, and you cannot select or prevent it.

[1] except maybe the cellular data toggle

catgirlinspace · 2026-03-19T05:32:10 1773898330

Settings > Privacy & Security > App Privacy Report will at least show domains contacted by each app.

jaybrendansmith · 2026-03-19T23:00:13 1773961213

But you cannot block them.

Obscurity4340 · 2026-03-20T18:14:57 1774030497

The only way Im aware of is if you do it thru Settings > Cellular and always use data for internet on your phone

UncleMeat · 2026-03-19T15:52:03 1773935523

Ultimately the fact that ad sdks have such wide access to location information is a choice by the platforms. I've long wanted meaningful process isolation between the app and its ad sdks, but right now there's oodles of them that just squat on location data when the app requests it.

hedora · 2026-03-19T00:09:27 1773878967

Apple supposedly does this with the privacy report cards.

However, I'd be shocked if a cursory audit comparing SDKs embedded in apps and disclosed data sales showed they were effectively enforcing anything at all.

inetknght · 2026-03-19T00:15:54 1773879354

> Would you ban ad supported apps?

Yes, I absolutely would. Advertisements are a scourge upon people's wellbeing on top of being ugly and intrusive.

If you want to build a free product, that's great. Build a free product.

If you want to make money from your product, then charge for your product.

NewsaHackO · 2026-03-19T00:20:57 1773879657

>Yes, I absolutely would.

And then you will get fired by the end of day.

inetknght · 2026-03-19T00:31:31 1773880291

Luckily I don't work for an ad-supported business.

anonymars · 2026-03-19T00:52:10 1773881530

How did your company and its customers find each other?

wolvoleo · 2026-03-19T01:07:35 1773882455

Do people really still think advertising has a legitimate function?

Really these days it's 95% psychological manipulation to get people to buy inferior quality stuff they don't need. And 5% of people actually finding what they're looking for.

Don't forget, most advertising can work fine in a "pull" mode. I need something so I go out and look for it. These days something like Google (not ideal because results also manipulated by the highest bidder). Or I look for dedicated forums or a subreddit for real people's experiences. In the old days it would have been yellow pages or ask a friend.

hulitu · 2026-03-19T07:32:09 1773905529

> I'm not sure how the OS is to blame.

Read the TOS.

adrr · 2026-03-18T22:22:25 1773872545

If I have a free app that hits location services on the device and I sell this data, how does Apple and Google make money from me?

GeekyBear · 2026-03-18T23:19:51 1773875991

Apple doesn't even allow apps to know whose device they are running on without the user's explicit opt-in permission.

Just as importantly, apps aren't allowed to remove functionality if the user says no.

You need additional permissions to do things like access location data or scan local networks for device fingerprinting.

quantified · 2026-03-18T22:41:05 1773873665

And Facebook/Meta. Their trackers are everywhere.

quickthoughts · 2026-03-19T00:58:22 1773881902

It's everyone. Especially google, but all the big tech companies play in the same pool. Amazon, Google, Apple, Meta etc make money selling ads, which ultimate enables the tools that result data harvesting from everyone across the internet. I wrote a little data investigation [1] (mostly finished) that show cases how every major news organization across the globe I scanned had some level of data collection integrated. This is just one industry, but its important (as it connects back to the incentives these media organizations have, which is to make money by selling ads at any cost). The eff also released an angle in how the bidding process to buy ads is itself a massive privacy nightmare[2]

[1] https://quickthoughts.ca/autotracko/ [2] https://www.eff.org/deeplinks/2026/03/targeted-advertising-g...

autoexec · 2026-03-19T00:09:19 1773878959

cloudflare is more everywhere than facebook

hedora · 2026-03-19T00:12:40 1773879160

Yeah, but unlike facebook, they weren't just caught making videos of people having sex then paying people to watch the videos.

Also, unlike facebook, they also weren't just caught running a dark money lobbyist network with the goal of forcing more collection of minors' private information.

autoexec · 2026-03-19T00:20:24 1773879624

facebook is evil for many different reasons, but for a government looking to spy on its own citizens cloudflare is much more attractive target. That said, I have no doubt that they're collecting copious amounts of data from both companies, either by sale or by force.

Tempest1981 · 2026-03-19T00:00:20 1773878420

Not Experian, TransUnion, and Equifax?

Or for location, the cellular providers?

lmeyerov · 2026-03-19T00:05:19 1773878719

There are plenty of bad actors

The interesting part is Google & Apple, as part of explaining to courts why their large app store fees are legit and not proof of monopoly positions, hid behind the security argument that they need to be the clearing house of what software runs on the devices. Except... they've knowingly punted on this one for 10+ years.

I would 100% agree that losing privacy through any utility-level carrier (credit cards, phone, OS provider, etc) should be default disallowed, and any opt-ins have a clear transparency mode with easy opt-out. At least two areas the US can learn from the EU on digital policy is digital marketplaces and consumer privacy protection, and this topic is at the intersection of both.

lmeyerov · 2026-03-12T13:04:40 1773320680

Once my code exists and passes test, I generally move on to having it iteratively hunt for bugs, security issues, and DRY code reduction opportunities until it stops finding worthwhile ones.

This doesn't always work as well as I'd like, but largely does enough. Conversely, doing as I go has been a waste of time.

lmeyerov · 2026-03-08T15:33:06 1772983986

The phenomena you're describing is why Cobol programmers still exist, and simultaneously, why it's increasingly irrelevant to most programmers

The killer feature is ecosystem: Easily and reliably reusing other libraries and tools that work out-of-the-box with other Python code written in the last few years . There are individually neato features motivating the efforts involved in upgrading a widely-used language & engine as well, but that kind of thinking misses the forest for the trees unfortunately.

It's a bit surprising to me, in the age of AI coding, for this to be a problem. Most features seem friendly to bootstrapping with automation (ex: f-strings that support ' not just "), and it's interesting if any don't fall in that camp. The main discussion seems to still be framed by the 2024 comments, before Claude Code etc became widespread: https://github.com/orgs/pypy/discussions/5145 .

cozzyd · 2026-03-08T18:53:32 1772996012

The alternative is when you run a script that you last used a few years ago and now need it again for some reason (very common in research) and you might end up spending way too much time making it work with your now upgraded stack.

Sure you can were you should have pinned dependencies but that's a lot of overhead for a random script...

SiempreViernes · 2026-03-09T15:20:07 1773069607

Most programmers aren't writing scientific software, which you can tell by claims that nicer f-strings is a pressing concern.

lmeyerov · 2026-03-10T00:28:41 1773102521

We can play that game - items like GIL-free interpreters and memory views are pretty relevant to folks on the more demanding side of scientific computing. But my point is this is a head-in-sand game when the community vastly outweighs any individual feature. My experience with the scientific computing community is that the non-pypy portion of it is much bigger.

I'm not a pypy maintainer, so my only horse in this race is believing cpython folks benefit from seeing the pypy community prove Things Can Be Better. Part of that means I rather pypy live on by avoiding unforced errors.

lmeyerov · 2026-03-08T09:38:48 1772962728

I liked they did this work + its sister paper, but disliked how it was positioned basically opposite of the truth.

The good: It shows on one kind of benchmark, some flavors of agentically-generated docs don't help on that task. So naively generating these, for one kind of task, doesn't work. Thank you, useful to know!

The bad: Some people assume this means in general these don't work, or automation can't generate useful ones.

The truth: Instruction files help measurably, and just a bit of engineering enables you to guarantee high scores for the typical cases. As soon as you have an objective function, you can flip it into an eval, and set an AI coder to editing these files until they work.

Ex: We recently released https://github.com/graphistry/graphistry-skills for more easily using graphistry via AI coding, and by having our authoring AI loop a bit with our evals, we jumped the scores from 30-50% success rate to 90%+. As we encounter more scenarios (and mine them from our chats etc), it's pretty straight forward to flip them into evals and ask Claude/Codex to loop until those work well too.

We do these kind of eval-driven AI coding loops all the time , and IMO how to engineer these should be the message, not that they don't work on average. Deeper example near the middle/end of the talk here: https://media.ccc.de/v/39c3-breaking-bots-cheating-at-blue-t...

lmeyerov · 2026-03-06T20:55:25 1772830525

We split our work:

* Specification extraction. We have security.md and policy.md, often per module. Threat model, mechanisms, etc. This is collaborative and gets checked in for ourselves and the AI. Policy is often tricky & malleable product/business/ux decision stuff, while security is technical layers more independent of that or broader threat model.

* Bug mining. It is driven by the above. It is iterative, where we keep running it to surface findings, adverserially analyze them, and prioritize them. We keep repeating until diminishing returns wrt priority levels. Likely leads to policy & security spec refinements. We use this pattern not just for security , but general bugs and other iterative quality & performance improvement flows - it's just a simple skill file with tweaks like parallel subagents to make it fast and reliable.

This lets the AI drive itself more easily and in ways you explicitly care about vs noise

lmeyerov · 2026-03-05T20:39:00 1772743140

In our evals for answering cybersecurity incident investigation questions and even autonomously doing the full investigation, gpt-5.2-codex with low reasoning was the clear winner over non-codex or higher reasoning. 2X+ faster, higher completion rates, etc.

It was generally smarter than pre-5.2 so strategically better, and codex likewise wrote better database queries than non-codex, and as it needs to iteratively hunt down the answer, didn't run out the clock by drowning in reasoning.

Video: https://media.ccc.de/v/39c3-breaking-bots-cheating-at-blue-t...

We'll be updating numbers on 5.3 and claude, but basically same thing there. Early, but we were surprised to see codex outperform opus here.

HN For You