More

jacob019 · 2025-07-12T14:01:33 1752328893

Grok, please drive me to synagogue. Doors lock. I'm sorry, Dave, I'm afraid I can't do that.

jacob019 · 2025-07-12T13:57:41 1752328661

Destroy all humans.

jacob019 · 2025-06-10T19:22:39 1749583359

Do we know parameter counts? The reasoning models have typically been cheaper per token, but use more tokens. Latency is annoying. I'll keep using gpt-4.1 for day-to-day.

jacob019 · 2025-06-10T19:15:27 1749582927

I break out Gemini 2.5 pro when Claude gets stuck, it's just so slow and verbose. Claude follows instructions better and seems to better understand it's role in agentic workflows. Gemini does something different with the context, it has a deeper understanding of the control flow and can uncover edge case bugs that Claude misses. o3 seems better at high level thinking and planning, questioning if it should it be done and whether the challenge actually matches the need. They're kind of like colleagues with unique strengths. o3 does well with a lot of things, I just haven't used it as much because of the cost. Will probably use it more now.

jacob019 · 2025-06-07T17:54:22 1749318862

Wild. What is the rationale?

sunshine-o · 2025-06-07T19:24:31 1749324271

Apparently to "keep the peace" and to "protect the children" but I couldn't find any good source on this.

Intuitively it seems to me this is the most counterproductive law ever as living with this doubt is the best way to destroy a family.

swat535 · 2025-06-07T19:55:59 1749326159

Right but I think it's mainly about saving tax payer money from child support by shifting the burden to men.

cocoto · 2025-06-07T20:11:34 1749327094

Protecting the kids I think, because if the dad is not known then the mother will have to pay for the child alone (subsidized by the government). In France around 3% of kids are raised from dads not knowing that they are not the biological father. Personally I think this law is completely unfair but in practice I think the judges will not believe the one opposing the test.

seszett · 2025-06-07T18:53:36 1749322416

You just can't order a test for someone else (your child) without their consent (so both parents, and a judge because parents don't have absolute rights over their children).

Courts order paternity tests just fine though when there is a reasonable doubt.

The people concerned can always refuse to be tested though.

jacob019 · 2025-06-07T16:44:02 1749314642

I had this thought as well and find it a bit surprising. For my own agentic applications, I have found it necessary to carefully curate the context. Instead of including an instruction that we "may automatically attach", only include an instruction WHEN something is attached. Instead of "may or may not be relevant to the coding task, it is up for you to decide"; provide explicit instruction to consider the relevance and what to do when it is relevant and when it is not relevant. When the context is short, it doesn't matter as much, but when there is a difficult problem with long context length, fine tuned instructions make all the difference. Cursor may be keeping instructions more generic to take advantage of cached token pricing, but the phrasing does seem rather sloppy. This is all still relatively new, I'm sure both the models and the prompts will see a lot more change before things settle down.

jacob019 · 2025-06-06T21:50:45 1749246645

That's true for Flash 2.0 at $0.40/mtok output. GPT-4.1-nano is the same price and also surprisingly capable. I can spend real money with 2.5 flash, with those $3.50/mtok thinking tokens, worth it though. OP is an inference provider, so there may be some bias. Open source can't compete on context length either, nothing touches 2.5 flash for the price with long context--I've experimented with this a lot for my agentic pricing system. Open source models are improving, but they aren't really any cheaper right now, R1 for example does quite well performance wise, but it uses a LOT of tokens to get there, further limiting the shorter context window. There's still value in the open source models, each model has unique strengths and they're advancing quickly, but the frontier labs are moving fast too and have very compelling "workhorse" offers.

jacob019 · 2025-06-05T17:42:34 1749145354

I rather like date codes as versions.

atom058 · 2025-06-11T19:46:26 1749671186

But it's not clear how to interpret the date code: 05-06 could be 5th June or 6th May; same sorry for 06-05. Very confusing due to American-style date formatting. Versions number are at least sequential, with a bigger number being a later version.

jacob019 · 2025-06-05T01:59:16 1749088756

I think the court overstepped by ordering OpenAI to save all user chats. Private conversations with AI should be protected - people have a reasonable expectation that deleted chats stay deleted, and knowing everything is preserved will chill free expression. Congress needs to write clear rules about what companies can and can't do with our data when we use AI. But honestly, I don't have much faith that Congress can get their act together to pass anything useful, even when it's obvious and most people would support it.

ethagnawl · 2025-06-05T02:37:50 1749091070

Why is AI special in this regard? Why is my exchange with ChatGPT any more privileged than my DuckDuckGo search for _HIV test margin of error_?

jacob019 · 2025-06-05T02:51:44 1749091904

You're right, it's not special.

This is from DuckDuckGo's privacy policy: "We don’t track you. That’s our Privacy Policy in a nutshell. We don’t save or share your search or browsing history when you search on DuckDuckGo or use our apps and extensions."

If the court compelled DuckDuckGo to log all searches, I would be equally concerned.

sib · 2025-06-05T16:36:43 1749141403

That's a pretty significant difference, though.

OpenAI (and other services) log and preserve your interactions, in order to either improve their service or to provide features to you (e.g., your chat history, personalized answers, etc., from OpenAI). If a court says "preserve all your user interaction logs," they exist and need to be preserved.

DDG explicitly does not track you or retain any data about your usage. If a court says "preserve all your users interaction logs," there is nothing to be preserved.

It is a very different thing - and a much higher bar - for a court to say "write code to begin logging user interaction data and then preserve those logs."

webstrand · 2025-06-05T20:57:49 1749157069

OpenAI also claims to delete logs after 30 days if you've deleted them. Anything that you've deleted but hasn't been processed by OpenAI yet will now be open to introspection by the court.

ethagnawl · 2025-06-05T17:04:20 1749143060

I should have said "web search", as that's really what I meant -- DDG was just a convenient counterexample.

robocat · 2025-06-05T16:33:41 1749141221

DuckDuckGo uses Bing.

It would be interesting to know how much Microsoft logs or tracks.

raincole · 2025-06-05T08:43:12 1749112992

AI is not special and that's the exact issue. The court made a precedence here. If OpenAI can be ordered to preserve all the logs, then DuckDuckGo can face the same issue even if they don't want to do that.

energy123 · 2025-06-05T05:52:25 1749102745

People upload about 100x more information about themselves to ChatGPT than search engines.

nradov · 2025-06-05T02:11:32 1749089492

How did the court overstep? Orders to preserve evidence are routine in civil cases. Customer expectations about privacy have zero legal relevance.

jacob019 · 2025-06-05T02:15:14 1749089714

Sure, preservation orders are routine - but this would be like ordering phone companies to record ALL calls just in case some might become evidence later. There's a huge difference between preserving specific communications in a targeted case and mass surveillance of every private conversation. The government shouldn't have that kind of blanket power over private communications.

charonn0 · 2025-06-05T02:41:27 1749091287

> but this would be like ordering phone companies to record ALL calls just in case some might become evidence later

That's not a good analogy. They're ordered to preserve records they would otherwise delete, not create records they wouldn't otherwise have.

jacob019 · 2025-06-05T02:46:24 1749091584

They are requiring OpenAI to log API calls that would otherwise not be logged. I trust when OpenAI says they will not log or train on my sensitive business API calls. I trust them less to guard and protect logs of those API calls.

jjk166 · 2025-06-05T05:53:14 1749102794

Change calls to text messages. The important thing is the keeping records of things unrelated to an open case which affect millions of people's privacy.

Spivak · 2025-06-05T18:16:28 1749147388

I mean to be fair it is related to a current open case but the order is pretty ridiculous on its surface. It's feels different when the company and the employees thereof have to retain their own comms and documents, and that company must do the same for 3rd parties who are related but not actually involved in the lawsuit is a bit of a stretch.

Why the NYT cares about a random ChatGPT user bypassing their paywall when an archive.ph link is posted on every thread is beyond me.

__turbobrew__ · 2025-06-06T01:07:56 1749172076

> I mean to be fair

protocolture · 2025-06-05T23:27:15 1749166035

No its pretty good. To refine it further, its why you put a single user under scrutiny on litigation hold rather than the whole exchange server.

nradov · 2025-06-05T06:59:46 1749106786

No, it wouldn't be like that at all. Phone companies and telephone calls are covered under a different legal regime so your analogy is invalid.

pjc50 · 2025-06-05T10:21:08 1749118868

Consider the opposite prevailing, where I can legally protect my warez site simply by saying "sorry, the conversation where I sent them a copy of a Disney movie was private".

riskable · 2025-06-05T15:54:19 1749138859

The legal situation you describe is a matter of impossibility and unrelated to the OpenAI case.

In the case of a warez site they would never have logged such a "conversation" to begin with. So if the court requested that they produce all such communications the warez site would simply declare that as, "Impossibility of Performance".

In the case of OpenAI the courts are demanding that they preserve all future communications from all their end users—regardless of whether or not those end users are parties (or even relevant) to the case. The court is literally demanding that they re-engineer their product to record all communications where none existed previously.

I'm not a lawyer but that seems like it would violate FRCP 26(b)(1) which covers "proportionality". Meaning: The effort required to record the evidence is not proportional relative to the value of the information sought.

Also—generally speaking—courts recognize that a party is not required to create new documents or re-engineer systems to satisfy a discovery request. Yet that is exactly what the court has requested of OpenAI.

lcnPylGDnU4H9OF · 2025-06-05T13:12:00 1749129120

If specific users are violating the law, then a court can and should order their data to be retained.

BrtByte · 2025-06-05T07:04:11 1749107051

The preservation order feels like a blunt instrument in a situation that needs surgical precision

marcyb5st · 2025-06-05T07:33:28 1749108808

Would it be possible to comply with the order by anonymizing the data?

The court is after evidence that users use ChatGPT to bypass paywalls. Anonymizing the data in a way that makes it impossible to 1) pinpoint the users and 2) reconstruct the generic user conversation history would preserve privacy and allow OpenAI to comply in good faith with the order.

The fact that they are blaring sirens and hide behind the "we can't, think about users' privacy" feels akin to willingful negligence or that they know they have something to hide.

lcnPylGDnU4H9OF · 2025-06-05T13:16:47 1749129407

> feels akin to willingful negligence or that they know they have something to hide

Not at all; there is a presumption of innocence. Unless a given user is plausibly believed to be violating the law, there is no reason to search their data.

Miraltar · 2025-06-05T08:14:05 1749111245

Anonymizing data is really hard and I'm not sure they'd be allowed to do it. I mean they're accused of deleting evidences, why would they be allowed to alter it ?

amanaplanacanal · 2025-06-05T02:06:56 1749089216

If it's possible evidence as part of a lawsuit, of course they can't delete it.

jacob019 · 2025-06-05T02:10:47 1749089447

A targeted order is one thing, but this applies to ALL data. My data is not possible evidence as part of a lawsuit, unless you know something I don't know.

artursapek · 2025-06-05T02:14:07 1749089647

That’s… not how discovery works

jacob019 · 2025-06-05T02:17:42 1749089862

The government's power to compel private companies to preserve citizens' communications needs clear limits. When the law is ambiguous about these boundaries, courts end up making policy decisions that should come from Congress. We need legislative clarity that defines exactly when and how government can access private digital communications, not case-by-case judicial expansion of government power.

artursapek · 2025-06-05T12:11:49 1749125509

My point is lawsuits make your data part of discovery retroactively. You aren’t being sued right now, but perhaps you will be.

lcnPylGDnU4H9OF · 2025-06-05T13:18:26 1749129506

Their point is that the discovery is asking for data of unrelated users. Necessarily so unless the claim is that all users who delete their chats are infringing.

jacob019 · 2025-06-05T13:25:17 1749129917

Your point illustrates exactly why the tension between due process and privacy rights can't be fairly resolved by courts alone, since they have an inherent bias toward preserving their own discovery powers.

jacob019 · 2025-06-04T20:04:50 1749067490

This is correct. We have seen this over the years in our ecommerce business. I suggest using threat levels, you are under attack so the threat level increases until they go away. When the threat level is high, you require an exact match AVS. You might have more agressive filtering at the IP level, real users generally won't be datacenter IPs. Pay attention to the ASN, sometimes you'll get an attack from a network that legit customers never use, so you can just block the whole network. Keep an eye on your logs, you'll notice patterns. The attack is likely coming from a single entity, if you make it difficult to abuse your service, then they will move on.

HN For You