dinakernel's comments

dinakernel · 2026-03-31T16:56:14 1774976174

My worry is that ASR will end up like OCR. If the multi modal large AI system is good enough (latency wise), the advantage of domain understanding eats the other technlogies alive.

In OCR, even when the characters are poorly scanned, the deep domain understanding these large multi modal AIs have allows it to understand what the document actually meant - this is going to be order id because in the million invoices I have seen before order id is normally below order date - etc. The same issue is going to be there in ASR also is my worry.

progbits · 2026-03-31T17:31:46 1774978306

This is both good and bad. Good ASR can often understand low quality / garbled speech that I could not figure out, but it also "over corrects" sometimes and replaces correct but low prior words with incorrect but much more common ones.

With OCR the risk is you get another xerox[1] incident where all your data looks plausible but is incorrect. Hope you kept the originals!

(This is why for my personal doc scans, I use OCR only for full text search, but retain the original raw scans forever)

[1] https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...

corlinp · 2026-03-31T18:49:52 1774982992

This is exactly the case today. Multimodal LLMs like gpt-4o-transcribe are way better than traditional ASR, not only because of deeper understanding but because of the ability to actually prompt it with your company's specific terminology, org chart, etc.

For example, if the prompt includes that Caitlin is an accountant and Kaitlyn is an engineer, if you transcribe "Tell Kaitlyn to review my PR" it will know who you're referring to. That's something WER doesn't really capture.

BTW, I built an open-source Mac tool for using gpt-4o-transcribe with an OpenAI API key and custom prompts: https://github.com/corlinp/voibe

Bolwin · 2026-03-31T21:24:09 1774992249

Many ASR models already support prompts/adding your own terminology. This one doesn't, but full LLMs especially such expensive ones aren't needed for that.

nkzd · 2026-03-31T17:50:19 1774979419

Why are you 'worried' about it? Shouldn't we strive for better technology even if it means some will 'lose'?

yorwba · 2026-03-31T18:09:16 1774980556

"Better" isn't just about increasing benchmark numbers. Often, it's more important that a system fails safely than how often it fails. Automatic speech recognition that guesses when the input is unclear will occasionally be right and therefore have a lower word error rate, but if it's important that the output be correct, it might be better to insert "[unintelligible]" and have a human double-check.

IshKebab · 2026-03-31T19:13:19 1774984399

It's better in terms of WER. It's not better in terms of not making shit up that sounds plausible.

Probably the answer is simply to tweak the metric so it's a bit more smart than WER - allow "unclear" output which is penalised less than actually incorrect answers. I'd be surprised if nobody has done that.

ks2048 · 2026-03-31T20:08:41 1774987721

Ideally, you'd be able to specify exactly what you want - do you want to write-out filled pauses ("aaah", "umm")? Do you want to get a transcription of the the disfluencies - re-starts, etc. or just get out a cleaned up version?

Tsarp · 2026-04-01T12:34:25 1775046865

ASR has already proved its usefulness. Dictation tools are a prime example. Ever since whisper came out, usefulness for AST models running locally suddenly became a thing. Opened up soo many variants

https://superwhisper.com

https://carelesswhisper.app

https://macwhisper.com

regularfry · 2026-03-31T22:41:05 1774996865

For quite a long time there will be a greater advantage to local processing for STT than for TTT chat, or even OCR. Being able to do STT on the device that owns the microphone means that the bandwidth off that device can be dramatically reduced, if it's even necessary for the task at hand.

dinakernel · 2026-03-31T13:58:47 1774965527

This turned out to be a bug. https://x.com/om_patel5/status/2038754906715066444?s=20

One reddit user reverse engineered the binary and found that it was a cache invalidation issue.

They are doing some hidden string replacement if the claude code conversation talks about billing or tokens. Looks like that invalidates the cache at that point.

If that string appears anywhere in the conversation history, I think the starting text is replaced, your entire cache rebuilds from scratch.

So, nothing devious, just a bug.

davesque · 2026-03-31T20:34:48 1774989288

I'm not sure this is the issue. I asked Claude Code a simple question yesterday. No sub agents. No web fetches. Relatively small context. Outside of peak hours. Burned 8% of my Max 5x 5hr usage limit. I've never seen anything like this before, even when the cache is cold.

ibejoeb · 2026-03-31T15:00:18 1774969218

> BUG 2: every time you use --resume, your entire conversation cache rebuilds from scratch. one resume on a large conversation costs $0.15 that should cost near zero.

I use it with an api key, so I can use /cost. When I did a resume, it showed the cost from what I thought was first go. I don't think it's clear what the difference is between api key and subscription, but am I believe that simply resuming cost me $5? The UI really make it look like that was the original $5.

orf · 2026-03-31T22:02:29 1774994549

You have to actually send something

replwoacause · 2026-03-31T14:10:35 1774966235

Nothing devious, but is Anthropic crediting users? In a sense, this is _like_ stealing from your customer, if they paid for something they never got.

arvid-lind · 2026-03-31T14:58:24 1774969104

Not seeing any quota returned on my Pro account. My weekly usage went up to 20% in about one hour yesterday before I panicked and stopped the task. It was outside of the prime hours too which are supposed to run up your quota at a slower rate.

esperent · 2026-04-01T02:01:08 1775008868

Outside of prime hours is the normal rate. Prime is at a fast rate, as of about two weeks ago.

novaleaf · 2026-03-31T14:59:40 1774969180

your linked bug is a cherry pick of the worst case scenario for the first request after a resume.

While it should be fixed, this isn't the same usage issue everyone is complaining about.

TazeTSchnitzel · 2026-03-31T14:10:36 1774966236

That bug would only affect a conversation where that magic string is mentioned, which shouldn't be common.

dinakernel · 2026-03-31T14:51:01 1774968661

I guess so - but for people working on billing section of a project or even if they include things like - add billing capability etc in Claude MD - it might be an issue, I think

kif · 2026-03-31T14:04:53 1774965893

Anecdotally when Claude was error 500'ing a few days ago, its retries would never succeed, but cancelling and retrying manually worked most of the time.

mook · 2026-03-31T14:46:15 1774968375

That is a summary and a picture of https://old.reddit.com/r/ClaudeAI/comments/1s7mkn3/psa_claud... it looks like?

pier25 · 2026-03-31T14:13:02 1774966382

https://xcancel.com/om_patel5/status/2038754906715066444

RyujiYasukochi · 2026-04-01T04:01:24 1775016084

[flagged]

TranquilMarmot · 2026-04-01T07:24:02 1775028242

Whoa. Is Claude coming in here and generating responses about itself.

https://stopsloppypasta.ai/en/

ifwinterco · 2026-04-01T06:32:29 1775025149

Yep I was going to say - this is just bad design. This kind of approach is inherently fragile, you are unavoidably destroying information in some sense by mixing things together

dinakernel · 2026-03-31T08:30:05 1774945805

Default setting latest should be caught in every static code scanner. How many times has this issue been raised.

dinakernel · 2026-03-30T05:24:47 1774848287

Seriously? Dont they want their system to succeed? I cant think of a better way of alienating the target customer than this.

dinakernel · 2026-03-30T05:18:45 1774847925

The best part was the Doom running over AT Protocol. Jetstream is a bit patchy, but running Doom - I would never have thought it possible

dinakernel · 2026-03-30T03:49:11 1774842551

Have you read mike masnicks ? https://www.techdirt.com/2026/03/25/ai-might-be-our-best-sho...

It actually points out the completely opposite and I liked that quite a bit That AI allows us to get back the open web in in a way.

dinakernel · 2026-03-29T12:38:36 1774787916

This has been my issue from long. AI CANNOT ever act as Emotional Crutch. This is something companies develop for engagement, and I believe that this is actively harmful in the long run.

HN For You