For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | fagerhult's commentsregister

Claudemacs looks really good -- thanks for the pointer!

My main design goal was to have the whole chat described by a text buffer, that can be converted back and forth to the Claude dialog format. I love Emacs' built-in `shell-mode` because I can jump around in the shell buffer, kill and yank, etc. I wanted that same interaction model in the agent interface. I ended up writing a tree-sitter grammar and maybe going a bit over the top...

Also, I wanted something very hackable, where the entire codebase is in elisp. Claudemacs has the benefit of building on Claude Code, so it gets all the Claude Code features for free, whereas I have to implement everything from scratch. But I think of Greger as an experiment surface for novel agent patterns.

I don't think anyone has really figured out how to get the most of out agents yet, so it's great that we're all attacking it from different angles and taking inspiration from each other.


I vibed up a little chat interface https://kontext-chat.vercel.app/


Does not seem to want to deal with faces at all, flags everything with humans as 'sensitive' and declines.


Oops sorry about that, fixed now!


Here is the MusicGen paper from Facebook research: https://arxiv.org/abs/2306.05284

MusicGen is an LLM on top of EnCodec tokens, instead of working directly with audio. EnCodec is neural audio compression algorithm that encodes audio as tokens from a codebook. It's a really clever trick!


The samples are outstanding. Even if they are cherry picked (not saying they are but even if) the output seems incredible.

https://ai.honu.io/papers/musicgen/


There was a typo in the readme, thanks for pointing this out! I add 8 channels (4 mask + 4 masked chorales). The chorales are transformed into 4-dimensional arrays, each channel representing a part of the piece. I've added some example plots to the readme to illustrate.


I'm sure your script runs a lot faster than my model :D A well-tuned heuristic script can probably do as good harmonizing as any black box deep learning model. I was mostly just curious how diffusion models would handle symbolic music data. This model does reasonably well on short time scales but has no idea about long-term context.


Andreas, author of the Replicate model here -- though "author" feels wrong since I basically just stitched two amazing models together.

The thing that really strikes me is that open source ML is starting to behave like open source software. I was able to take a pretrained text-to-image model and combine it with a pretrained video frame interpolation model and the two actually fit together! I didn't have to re-train or fine tune or map between incompatible embedding spaces, because these models can generalize to basically any image. I could treat these models as modular building blocks.

It just makes your creative mind spin. What if you generate some speech with https://replicate.com/afiaka87/tortoise-tts, generate an image of an alien with Stable Diffusion, and then feed those two into https://replicate.com/wyhsirius/lia. Talking alien! Machine learning is starting to become really fun, even if you don't know anything about partial derivatives.


For the moment at least I'm personally more interested in the image applications than video use cases, but even so this is just fantastic for helping to develop an intuition about how the diffusion mechanism works.

It's admirable that you're so modest regarding the antecedent work, but sometimes it's the "obvious in hindsight" compositional insights that really open up the possibility space. Top work!


It's a nifty piece of work. Often when you're trying to get an answer from a regression model or a neural net you have to try to craft your inputs so carefully that you already sort of know, intuitively, what it will figure out. In some way the thought process of the refining the input is more valuable in a lot of quantitative cases than the actual output.

This is simply very impressive... whether or not it was humbly stitched together, you were sort of the first to do it, so take pride.

The next real magic will be reading its net and figuring out how to get [vfx/film] effects from it... which if I were you would probably occupy 22 hours of my day now.


>Talking alien!

Maybe that's what we were supposed to do all along. Not find or be found by aliens, but to invent them.


that’s exactly what we’ve been doing


This looks brilliant, I've been recording scraps of music with voice memos for years and I've been missing exactly the features you have. Great work!

I'd love a way to import my existing voice memo library into Tape It!


Glad you like it :).

Regarding import: Unfortunately, Apple doesn’t provide access to the Voice Memos library. If you have a suggestion on how to overcome that (in a way that passes the App Store review), please let us know. We’d be super grateful.


Great questions! At the moment we recommend passing dataset URIs as params to replicate.init(): https://replicate.ai/docs/guides/training-data, but of course this assumes immutable and stable URIs.

DVC would definitely be a good fit, and we have a ticket on our roadmap to integrate Replicate with DVC, Tecton, etc. https://github.com/replicate/replicate/issues/294

We also have a roadmap ticket for grouping experiments: https://github.com/replicate/replicate/issues/297, but for now we're recommending params for tags as well.

If you have ideas for the design of these features, we really appreciate feedback and comments on these Github issues!


Thank you! We have an issue on the roadmap for adding a web GUI: https://github.com/replicate/replicate/issues/295

We haven't thought about it in great detail yet, so I'd be curious to hear your thoughts and ideas if you'd like to add a comment to that issue!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You