Claudemacs looks really good -- thanks for the pointer!
My main design goal was to have the whole chat described by a text buffer, that can be converted back and forth to the Claude dialog format. I love Emacs' built-in `shell-mode` because I can jump around in the shell buffer, kill and yank, etc. I wanted that same interaction model in the agent interface. I ended up writing a tree-sitter grammar and maybe going a bit over the top...
Also, I wanted something very hackable, where the entire codebase is in elisp. Claudemacs has the benefit of building on Claude Code, so it gets all the Claude Code features for free, whereas I have to implement everything from scratch. But I think of Greger as an experiment surface for novel agent patterns.
I don't think anyone has really figured out how to get the most of out agents yet, so it's great that we're all attacking it from different angles and taking inspiration from each other.
MusicGen is an LLM on top of EnCodec tokens, instead of working directly with audio. EnCodec is neural audio compression algorithm that encodes audio as tokens from a codebook. It's a really clever trick!
There was a typo in the readme, thanks for pointing this out! I add 8 channels (4 mask + 4 masked chorales). The chorales are transformed into 4-dimensional arrays, each channel representing a part of the piece. I've added some example plots to the readme to illustrate.
I'm sure your script runs a lot faster than my model :D A well-tuned heuristic script can probably do as good harmonizing as any black box deep learning model. I was mostly just curious how diffusion models would handle symbolic music data. This model does reasonably well on short time scales but has no idea about long-term context.
Andreas, author of the Replicate model here -- though "author" feels wrong since I basically just stitched two amazing models together.
The thing that really strikes me is that open source ML is starting to behave like open source software. I was able to take a pretrained text-to-image model and combine it with a pretrained video frame interpolation model and the two actually fit together! I didn't have to re-train or fine tune or map between incompatible embedding spaces, because these models can generalize to basically any image. I could treat these models as modular building blocks.
It just makes your creative mind spin. What if you generate some speech with https://replicate.com/afiaka87/tortoise-tts, generate an image of an alien with Stable Diffusion, and then feed those two into https://replicate.com/wyhsirius/lia. Talking alien! Machine learning is starting to become really fun, even if you don't know anything about partial derivatives.
For the moment at least I'm personally more interested in the image applications than video use cases, but even so this is just fantastic for helping to develop an intuition about how the diffusion mechanism works.
It's admirable that you're so modest regarding the antecedent work, but sometimes it's the "obvious in hindsight" compositional insights that really open up the possibility space. Top work!
It's a nifty piece of work. Often when you're trying to get an answer from a regression model or a neural net you have to try to craft your inputs so carefully that you already sort of know, intuitively, what it will figure out. In some way the thought process of the refining the input is more valuable in a lot of quantitative cases than the actual output.
This is simply very impressive... whether or not it was humbly stitched together, you were sort of the first to do it, so take pride.
The next real magic will be reading its net and figuring out how to get [vfx/film] effects from it... which if I were you would probably occupy 22 hours of my day now.
Regarding import: Unfortunately, Apple doesn’t provide access to the Voice Memos library. If you have a suggestion on how to overcome that (in a way that passes the App Store review), please let us know. We’d be super grateful.
Great questions! At the moment we recommend passing dataset URIs as params to replicate.init(): https://replicate.ai/docs/guides/training-data, but of course this assumes immutable and stable URIs.
My main design goal was to have the whole chat described by a text buffer, that can be converted back and forth to the Claude dialog format. I love Emacs' built-in `shell-mode` because I can jump around in the shell buffer, kill and yank, etc. I wanted that same interaction model in the agent interface. I ended up writing a tree-sitter grammar and maybe going a bit over the top...
Also, I wanted something very hackable, where the entire codebase is in elisp. Claudemacs has the benefit of building on Claude Code, so it gets all the Claude Code features for free, whereas I have to implement everything from scratch. But I think of Greger as an experiment surface for novel agent patterns.
I don't think anyone has really figured out how to get the most of out agents yet, so it's great that we're all attacking it from different angles and taking inspiration from each other.