More

turmeric_root · on April 14, 2023

Hell, what about Skynet?

wayeq · on April 15, 2023

and what about my shredder?

turmeric_root · on April 15, 2023

exactly! I gave a heartfelt letter to my shredder the other day and it simply destroyed it. issues like these are why AI alignment research is so critical.

turmeric_root · on April 11, 2023

The model weights were only shared by FB to people who applied for research access. Github repos containing links to the model weights have been taken down by FB.

turmeric_root · on April 11, 2023

I like using them for memeing

turmeric_root · on April 11, 2023

More VRAM => larger models. IME it is absolutely worth maxing out VRAM for the significant improvement in quality, especially with LLaMA (though even with a 4090, you won't be able to run the largest 65-billion parameter model even with 4-bit quantization).

That said, I recommend renting a cloud GPU for a few hours and trying the larger models on them before buying a GPU of your own, just to see if the models meet your requirements.

sliken · on April 11, 2023

But should fit easily on a Apple MBP or Studio with 96GB or 128GB of unified memory.

turmeric_root · on April 5, 2023

they're just microdosing it's ok

turmeric_root · on April 5, 2023

A lot of the 'look what I made with AI' images that get shared around also don't include the creator's workflow. There's usually lots of trial-and-error, manual painting/inpainting, multiple models involved etc. and explaining all that is a lot harder than just saying 'I used stable diffusion'.

turmeric_root · on April 3, 2023

ugh, that's so shitty. so many people in this space seem to be absurdly demanding and angry at devs, but one thing I've noticed is that every text AI project discord I've hung out in has this sleazy, obsessive 4chan /g/ vibe hiding somewhere in it.

turmeric_root · on April 2, 2023

> the "number B" stands for "number of billions" of parameters... trained on?

No, it's just the size of the network (i.e. number of learnable parameters). The 13/30/65B models were each trained on ~1.4 trillion tokens of training data (each token is around half a word).

turmeric_root · on March 25, 2023

'accuracy' and 'truth' are legacy 0.1X concepts, move fast and break things

turmeric_root · on March 23, 2023

yeah when getting DL up and running on AMD requires using a datacentre card then it's no wonder CUDA is more popular. AMD is enabling ROCm for commercial GPUs now but it's still a pain to get it up and running, because of the inertia that CUDA has.

HN For You