For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | ipotapov's commentsregister

Been building this for a few months now and it's turned into a complete on-device audio pipeline for Apple Silicon:

ASR (Qwen3) → TTS (Qwen3 + CosyVoice, 10 languages) → Speech-to-Speech (PersonaPlex 7B, full-duplex) → Speaker Diarization (pyannote + WeSpeaker) → Voice Activity Detection (Silero, real-time streaming) → Forced Alignment (word-level timestamps)

No Python, no server, no CoreML — pure Swift through MLX. Models download automatically from HuggingFace on first run. The whole diarization stack is ~32 MB.

Everything is protocol-based and composable — VAD gates ASR, diarization feeds into transcription, embeddings enable speaker verification. Mix and match.

Repo: github.com/ivan-digital/qwen3-asr-swift (Apache 2.0)

Blog post with architecture details: blog.ivan.digital

There's a lot of surface area here and contributions are very welcome — whether it's new model ports, iOS integration, performance work, or just filing issues. If you've been wanting to do anything with audio or MLX in Swift, come build with us.


You mention using MathJax for LaTeX rendering, which is great for web compatibility. Have you explored the potential limitations of rendering text due to the lack of Pango? This might affect clarity in complex equations. Also, any thoughts on how it performs with large animations compared to traditional Manim: does the browser handle it smoothly?


Regarding the user instruction aggregation process in the agent loop, I'm curious how you manage context retention in multi-turn interactions. Have you explored any techniques for dynamically adjusting the context based on the evolving user requirements?


The 'Broad Safety' guideline seems vague at first, but it might be beneficial to incorporate user feedback loops where the AI adjusts based on real-world outcomes. This could enhance its adaptability and ethics over time, rather than depending solely on the initial constitution.


Been using GitHub Copilot to handle the tedious webpack/babel config files and it's a game changer for modern web dev. No more spending hours debugging build pipeline issues - it generates 90% correct configs that just need minor tweaks.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You