For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | goodroot's commentsregister

Whisper is very good in many languages.

It's also in many flavours, from tiny to turbo, and so can fit many system profiles.

That's what makes it unique and hard to replace.


Nice one! For Linux folks, I developed https://github.com/goodroot/hyprwhspr.

On Linux, there's access to the latest Cohere Transcribe model and it works very, very well. Requires a GPU though. Larger local models generally shouldn't require a subordinate model for clean up.

Have you compared WhisperKit to faster-whisper or similar? You might be able to run turbov3 successfully and negate the need for cleanup.

Incidentally, waiting for Apple to blow this all up with native STT any day now. :)


How does it compare to the more well established https://github.com/cjpais/handy? Are there any stand out features (for either option)? What was the reason for writing your own rather than using or improving existing software?

Not sure I know what you mean by IR...

But in this case I built hyprwhspr for Linux (Arch at first).

The goal was (is) the absolute best performance, in both accuracy & speed.

Python, via CUDA, on a NVIDIA GPU, is where that exists.

For example:

The #1 model on the ASR (automatic speech recognition) hugging face board is Cohere Transcribe and it is not yet 2 weeks old.

The ecosystem choices allowed me to hook it up in a night.

Other hardware types also work great on Linux due to its adaptability.

In short, the local stt peak is Linux/Wayland.


IR was a typo, meant "it" (fixed it). I blame the phone keyboard plus insufficient proof reading on my part.

If this needs nvidia CPU acceleration for good performance it is not useful to me, I have Intel graphics and handy works fine.


It works well with anything. :)

That said: If handy works, no need whatsoever to change.


I've been running whisper large-v3 on an m2 max through a self-hosted endpoint and honestly the accuracy is good enough that i stopped bothering with cleanup models. The bigger annoyance for me was latency on longer chunks, like anything over 30 seconds starts feeling sluggish even with metal acceleration. Haven't tried whisperkit specifically but curious how it handles longer audio compared to the full model.

Ah yeah, longform is interesting.

Not sure how you're running it, via whichever "app thing", but...

On resource limited machines: "Continuous recording" mode outputs when silence is detected via a configurable threshold.

This outputs as you speak in more reasonable chunks; in aggregate "the same output" just chunked efficiently.

Maybe you can try hackin' that up?


Yeah that makes sense, chunking on silence would sidestep the latency issue pretty cleanly. I've been running it through a basic fastapi wrapper so it just takes whatever audio blob gets thrown at it, no chunking logic on the server side. Might be worth adding a vad pass before sending to whisper though, would cut down on processing dead air too.

Maintainer of WhisperKit here, confirming we do exactly that for longform. We search for the longest "low energy" silence in the second half of the audio window and set the chunking point to the middle of that silence. It uses a version of the webrtc vad algorithm, and significantly speeds up longform because we can run a large amount of concurrent inference requests through CoreML's async prediction api. Whisper is also pretty smart with silent portions since the encoder will tell it if there are any words at all in the chunk, and simply stop predicting tokens after the prefill step - although you could save the ~100ms encoder run entirely with a good vad model, which our recently opensourced pyannote CoreML pipeline can do.

Thanks for sharing! I was literally getting ready to build, essentially, this. Now it looks like I don't have to!

Have you ever considered using a foot-pedal for PTT?

Apple incidentally already has native STT, but for some reason they just don't use a decent model yet.


They do, and they even have that nice microphone F5 key for it, and an ideal OS level API making the input experience >perfect<.

Apparently they do have a better model, they just haven't exposed it in their own OS yet!

https://developer.apple.com/documentation/speech/bringing-ad...

Wonder what's the hold up...

For footpedal:

Yes, conceptually it’s just another evdev-trigger source, assuming the pedal exposes usable key/button events.

Otherwise we’d bridge it into the existing external control interface. Either way, hooks are there. :)


The only issue with Apple models is that they do not detect languages automatically, nor switch if you do between sentences.

Parakeet does both just fine.


sorry, PTT?

push-to-talk.


Nice, I've been using Hyprwhspr on Omarchy daily for a while now, it's been awesome, thanks very much.

Thanks ericd! Glad to hear.

looks like there's a nearly identically named one for Hyprland

Also, wish it was on nixpkgs, where at least it will be almost guaranteed to build forever =)


Mark Carney's book "Values" pitches a system such as this.

In better times, perhaps we have the collective will to try.


Source? Rationale?

This is - at best - ignorant hyperbole.


The QuestDB team are among the best doing it.

Love the people and their software.

Great blog Jaromir!


The same is true for database rankings (db-engines).

If entrants are not artificially inflating "organic" signals via fake content spam (Twitter/X), then the criteria themselves are losing their signal strength (StackOverflow/GitHub).

The diffusion makes it increasingly difficult to understand which channels are important and which correlate to strength in the market.

Unfortunately, these can be more than vanity metrics.

Some VCs or financial markets may use these as methods towards valuation.


Hey! Thanks for upvoting.

Happy to answer any questions about deduplication. One thing that's not included in the write-up is that we also address out-of-order indexing alongside deduplication.


The dataset link seems to be dead. Do you have a mirror?


Edit: Updated!

https://mega.nz/folder/A1BjnSYQ#NQe5qhYLVBqiRwhWRmcVtg

Article is updating too.


Thanks. Is Dedup supported on SQL COPY too?


Not for CSV import via SQL COPY sadly


Gosh that's sad.

Zoomers are in this very forum - hi Zoomers.

Hang in there.

Whatever precipitating causes led to such suffering, know that we're _here_, _now_, together.

You aren't as alone as it might seem.

And hey, try to relax a little. We'll figure it out.


i think part of the problem is these kind of messages are alienating exactly because they appear on screen. the meat-space sentiments rarely match the "thoughts and prayers" type online speech-acts, or at least, they are basically never extended as readily.


Nothing personal (I mean, seriously, nothing personal)

Little (probably hard) advice for if/when you're going to say something like that to a zoomer irl (based on personal experience from the receiving end):

The "you aren't as alone as it might seem" gets the "what you're saying is just factually incorrect and what you're trying to do is to bullshit me and maybe possibly yourself" thing going. I have never heard something like that from a person "in the weeds".

Same for "We'll figure it out". How much time have you personally spent "figuring it out" and how much time have you spent playing hot potato with the problem? How important is it compared to your own problems? I guess, not very, so there is no "us" figuring it out.

Basically, don't be a disingenuous dense motherfucker and don't bullshit other people and yourself. Not saying you personally are doing it, but there are definitely more people that do, than that don't.


For clarity, this response is personal.

Attitude is a potion or a poison.

Make the choice.

Want demons? You'll find them.

Want help? You'll find it.

Many, many people have spent time figuring it out.

Many, MANY people have went into professions or made life style choices to help.

The will to overcome your own narcissism and self pity are key to any healing.


> Whatever precipitating causes led to such suffering, know that we're _here_, _now_, together.

The article comments on this though:

"All the things that have traditionally made life worth living — love, community, country, faith, work, and family — have been “debunked.”

This is absolutely true and no wonder young folks are feeling down. I think the counter-culture types starting 50+ years ago wanted to tear down the old, but forgot to put something constructive in its place. (Well the leftist/Marxist types tried, but then the USSR imploded)


It's the Internet: All the "debunkings" have also been "debunked".

From the article...

Monogamy is the corollary for the debunking of love.

That's very silly. Love is much more than marriage.

"Church" foibles is the corollary for the debunking of Faith.

That's also very silly. Faith is much more than church or religion.

In other times and other cultures, spiritual insight removed the roots of suffering.


Danny is clearly upset, and I would be too. We all love the Internet. Imagine being fingered in an article as some mad SEO guy who is among "the people who ruined the Internet". The Verge is a big platform...

Also, SEO is of the more voodoo & charlatan filled branches of technical esoterica. Very little of it is falsifiable or clear. And there is quackery as far as the eye can see.

Transparency into search algos would be better for us all. But the cost of algo transparency is transparency into adtech. And that's a hill Google will die on, and why we need alternatives.


> SEO is of the more voodoo & charlatan filled branches of technical esoterica.

It's very, very hard for me to avoid thinking of the entire SEO industry in the same light as I think of the adtech industry: a plague that is helping to destroy everything that makes the internet good.


I much prefer adtech to SEO.

Adtech is at least trying to be a non-zero-sum game. They bring dollars to the Internet to try to get your attention off the Internet, to buy a real-world product (even that product is itself delivered online). That allows the Internet to provide a lot of creativity for "free".

SEO is purely zero-sum, or negative-sum. There's a fixed amount of attention and they want to drag it from wherever it would naturally be to some place you don't really want it to be.

Advertising also does a ton of privacy violation and other shenanigans, because wherever there is money there is evil. But at least there's a baby somewhere in all that bathwater. SEO makes the Internet worse without improving anything at all.


> There's a fixed amount of attention and they want to drag it from wherever it would naturally be to some place you don't really want it to be.

That is exactly what ads are trying to do. It is the very essence of advertising: get your attention. This is ingrained to the extent that everyone knows "there's no such thing as bad publicity".

And it's just as much if not more 0-sum as SEO. The stated purpose of advertising is to make you spend your money on something that you otherwise wouldn't have. That's sometimes about spending your money on product A instead of A's competitors, and sometimes just to spend your money on X instead of saving/investing it.

Even worse, advertising is trying to convince you to spend irrationally: instead of doing your own cost/benefit analysis, advertising's purpose is to convince you to act out of emotion, or to outright lie about the cost and benefit of the product.


> I much prefer adtech to SEO.

I'm exactly the opposite, actually. I have to actively and constantly defend against the attacks of adtech stuff. SEO only really affects how web pages are designed.

But the two fields are pretty closely linked.


You can install an adblocker to filter out adtech.

What kind of blocker should we install to filter out the thousand enshittifications publishers would add to win the SEO game?

Worth noting in this conversation: there's a philosophy that these two techs go hand-in-glove because adtech is the alternative to spending money on SEO. Instead of trying to game the machine, just pay to show up in the "People who thought they were so important, they paid money to get your attention" slot. Much like the notion that in the absence of copyright and patents, you don't get free information but guilds and hitmen... In the absence of adtech you don't get a bright, attention-optimized, clean web but an enshittified web where companies like Proctor & Gamble are trying to SEO their way into showing up above Unilever in searches for 'toilet paper'.


> You can install an adblocker to filter out adtech

If only it were that simple. If you want to avoid adtech spying on you, you have to do a whole lot more than that.


>What kind of blocker should we install to filter out the thousand enshittifications publishers would add to win the SEO game?

well that's what Kagi is trying to do for you. But you can definitely spend a lot of time homespinning some unholy middleware filter on google results to try and cull down the most frequent offenders.


How about neither? Neither would be good.


> Transparency into search algos would be better for us all

I think that the jury's out on that topic. Danny's absolutely right in asserting that if the full algo were known, people would write to optimize against the algo, which would defeat the purpose of the algo.

Goodhart's Law is in full effect: “When a measure becomes a target, it ceases to be a good measure."


So then the algorithm is crap and that is why they don’t want to show it. If the “full algo” actually prioritizes high quality content we would certainly want everyone to optimize against it!


In general, these models are approximations of an ideal, or some kind of statistical summary across systems that are too complex to completely model.

There's a wide gap between "this algorithm is crap" and "this algorithm stops working if we publish the whole thing publicly and people can explicitly tune data to make number-go-up." That's like claiming a machine learning algorithm is crap because it's possible to build bespoke counter-inputs that maximize badness in the output; that's possible with most ML algorithms, but when someone's not trying to break the machine on purpose, those algorithms often work great.


>but when someone's not trying to break the machine on purpose, those algorithms often work great.

To be fair, that's the exact thing that's wrong here. Creative tools for professionals can assume good faith; no one is trying to break an IDE unless their job is QA for said IDE.

Tools for advertising almost always have bad faith actors, or those actors are the largest presence. The problem becomes untenable when the tool creator has a symbiotic relation with the bad actor.


How about sites that are shown to use SEO or game the algorithm simply cease to exist on Google after a warning period? Change the incentive structure entirely.


We know it's clickbait. But we click it anyways.

Alas!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You