For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | warangal's commentsregister

VITS is such a cool model (and paper), fast, minimal, trainable. Meta took it to extreme for about 1000 languges.

It seems like you have been working on this application for sometime, i will go through your code , but could you provide some context about upgradations/changes you have made, or some post describing your efforts.

Cool nonetheless!


I'll explain in detail once I've got the big release, but everything's been thoroughly modernized. Transformer, HiFi-GAN (now iSTFTNet w/Snake) vocoder, et al, plus a few additions.


I may be wrong here, but blog-post seems AI written, with repetition of sequences like "the inference pipeline was rebuilt using architecture-aware fused kernels, optimized scheduling, and dis-aggregated serving". I don't know what that means without some code and proper context.

Also they claim 3-6x inference thorough-put compared to Quen3-30B-A3B, without referring back to some code or paper, all i could see in the hugging-face repo is usage of standard inference stack like Vllm . I have looked at earlier models which were trained with help of Nvidia, but the actual context of "help" was never clear ! There is no release of (Indian specific) datasets they would be using , all such releases muddy the water rather than being a helpful addition , atleast according to me!


Disagree, the post makes punctuation mistakes that only an Indian can make. So does your own comment.


Not a given. We've already seen LLMs that got SFT'd by "national teams" adopt ESL speech patterns.


They won’t make punctuation mistakes though.


Wouldn't they do exactly that if they were trained on enough text with punctuation mistakes?


No because of post training


pretty cool work! Though it leaves me wondering if coreboot/bios code can directly interface with thermal-management and battery controller , shouldn't it be feasible to improve upon battery life by exposing some interface to OS, like apple laptops ?


I was also reading through lobsters Memory management, which (i think) currently implements "borrow first" semantics, to do away with a lot of run-time reference counting logic, which i think is a very practical approach. Also i have doubts if reference counting overhead ever becomes too much for some languages to never consider RC ?

Tangentially, i was experimenting with a runtime library to expose such "borrow-first" semantics, such "lents" can be easily copied on a new thread stack to access shared memory, and are not involved in RC . Race-conditions detection helps to share memory without any explicit move to a new thread. It seems to work well for simpler data-structures like sequence/vectors/strings/dictionary, but have not figured a proper way to handle recursive/dynamic data-structures!


How do you market it, through social-media or are there dedicated channels for sharing awareness for such Mac Apps, if you don't mind sharing?


I mostly shared the launch post and ran promo campaigns on Reddit, ProductHunt, LinkedIn, Discord and even tried HN (got no replies here -‿-"). Since then, its mostly word of mouth.

Thanks to my customers' feedback, I've made a lot of improvements to the app as well. Feels good getting positive feedback and hearing from people about their use-cases. :)

Links: [1]: Launch Post: https://www.reddit.com/r/macapps/comments/1ok5zaq/comment/nm... [2]: ProductHunt: https://www.producthunt.com/products/dedupx [3]: Promo campaign on Reddit for Black Friday: https://www.reddit.com/r/macapps/comments/1paarc2/dedupx_50_... [4]: https://news.ycombinator.com/item?id=45763117


I work on an image search engine[0], main idea has been to preserve all the original meta-data and directory structure while allowing semantic and meta-data search from a single interface. All meta-data is stored in a single json file, with original Path and filenames, in case ever to create backups. Instead of uploading photos to a server, you could host it on a cheap VPS with enough space, and instead Index there. (by default it a local app). It is an engine though and don't provide any Auth or specific features like sharing albums!

[0] https://github.com/eagledot/hachi


Currently (Semantic) ML model is the weakest (minorly fine-tuned) ViT B/32 variant, and more like acting as a placeholder i.e very easy to swap with a desired model. (DINO models have been pretty great, being trained on much cleaner and larger Dataset, CLIP was one of first of Image-text type models !).

For point about "girl drinking water", "girl" is the person/tagged name , "drinking water" is just re-ranking all of "girl"s photos ! (Rather than finding all photos of a (generic) girl drinking water) .

I have been more focussed on making indexing pipeline more peformant by reducing copies, speeding up bottleneck portions by writing in Nim. Fusion of semantic features with meta-data is more interesting and challenging part, in comparison to choosing an embedding model !


Hi, Author here!

I have been working on this project for quite some time now. Even though for such search engines, basic ideas remain the same i.e extracting meta-data or semantic info, and providing an interface to query it. Lots of effort have gone into making those modules performant while keeping dependencies minimal. Current version is down to only 3 dependencies i.e numpy, markupsafe, ftfy and a python installation with no hard dependence on any version. A lot of code is written from scratch including a meta-indexing engine and minimal vector database. Being able to index any personal data from multiple devices or service without duplicating has been the main them of the project so far!

We (My friend) have already tested it on around 180gb of Pexels dataset and upto 500k of flickr 10M dataset. Machine learning models are powered by a framework completely written in Nim (which is currently not open-source) and has ONEDNN as only dependency (which has to be do away to make it run on ARM machines!)

I have been mainly looking for feedback to improve upon some rough edges, but it has been worthwhile to work upon this project and includes code written in assembly to html !


Using `zig-cc (clang)` to set a particular `LibC` version is one of the best decisions i have made, and saved me from those meaningless libC mismatch errors!


It's insane that Zig was needed to achieve this, instead of a preprocessor definition like "#define GLIBC_MIN_COMPATIBLE_VERSION GLIBC_2_40".


I know it may be not what you are looking for, but most of such models generate multiple-scale image features through an image encoder, and those can be very easily fine-tuned for a particular task, like some polygon prediction for your use case. I understand the main benefit of such promptable models to reduce/remove this kind of work in the first place, but could be worth and much more accurate if you have a specific high-load task !


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You