More

turmeric_root · on March 23, 2023

if the AI is trained on LW then I think we'll be safe, just use the word 'woke' and it'll lose its shit and get stuck in an endless loop of telling you why it's not actually racist

turmeric_root · on March 22, 2023

though unless you've disabled sampling it will be difficult to determine how prompts affect the output, these could just be due to RNG

underlines · on March 22, 2023

just run the 13b model 4bit quantized locally, it's already better than the 7b-8bit and you can turn down the temperature to 0 to get repeatable results.

turmeric_root · on March 17, 2023

I disagree with the linked post, most people use 'REST' to refer to JSON-over-HTTP now.

a_subsystem · on March 17, 2023

You disagree with the guy who created the term REST?

turmeric_root · on March 17, 2023

a_subsystem · on March 18, 2023

'Most people' can call it what they want, but in this context, they're not referring to a RESTful API. Search for 'REST JSON API' and you may discover 'most people' may be a subset of 'some people', and not a majority of people.

turmeric_root · on March 10, 2023

Yeah I spent a week or two getting excited playing with ChatGPT but then I got bored. I also bought a Quest 2 a while ago and sold it after a few months, so I guess the novelty just wears off quickly for me.

wpietri · on March 10, 2023

I think that just puts you ahead of the curve. But perhaps not very far ahead.

I rented the Quest for a couple weeks at Christmastime. The first week everybody thought it was amazing, me included. Things tapered off in the second week, with the kids back on their Switches and the Playstation. When I mailed it back nobody even noticed.

I hear a fair number of anecdotes like that. And I find very few people for whom VR games are a daily driver the way, say, mobile games are.

turmeric_root · on March 3, 2023

the 7B model runs on a CUDA-compatible card with 16GB of VRAM (assuming your card has 16-bit float support).

I only got the 30b model running on a 4 x Nvidia A40 setup though.

q1w2 · on March 3, 2023

The 30B is 64.8GB and the A40s have 48GB NVRAM ea - so does this mean you got it working on one GPU with an NVLink to a 2nd, or is it really running on all 4 A40s?

Is there a sub/forum/discord where folks talk about the nitty-gritty?

turmeric_root · on March 3, 2023

> so does this mean you got it working on one GPU with an NVLink to a 2nd, or is it really running on all 4 A40s?

it's sharded across all 4 GPUs (as per the readme here: https://github.com/facebookresearch/llama). I'd wait a few weeks to a month for people to settle on a solution for running the model, people are just going to be throwing pytorch code at the wall and seeing what sticks right now.

q1w2 · on March 3, 2023

> people are just going to be throwing pytorch code at the wall

The pytorch 2.0 nightly has a number of performance enhancements as well as ways to reduce the memory footprint needed.

But also, looking at the README, it appears that model alone needs 2x the model size, eg 65B needs 130GB NVRAM, PLUS the decoding cache which stores 2 * 2 * n_layers * max_batch_size * max_seq_len * n_heads * head_dim bytes = 17GB for the 7B model (not sure if it needs to increase for the 65B model), but maybe a total of 147GB total NVRAM for the 65B model.

That should fit on 4 Nvidia A40s. Did you get memory errors, or you haven't tried yet?

turmeric_root · on March 5, 2023

So since making that comment I managed to get 65B running on 1 x A100 80GB using 8-bit quantization. Though I did need ~130GB of regular RAM on top of it.

UltimateEdge · on March 5, 2023

So is the model any good?

turmeric_root · on March 5, 2023

It seems to be about as good as gpt3-davinci. I've had it generate React components and write crappy poetry about arbitrary topics. Though as expected, it's not very good at instructional prompts since it's not tuned for instruction.

People are also working on adding extra samplers to FB's inference code, I think a repetition penalty sampler will significantly improve quality.

The 7B model is also fun to play with, I've had it generate Youtube transcriptions for fictional videos and it's generally on-topic.

turmeric_root · on March 3, 2023

clone this and point the script(s) to your downloaded model files: https://github.com/facebookresearch/llama/

turmeric_root · on Feb 28, 2023

These 'jailbreak' prompts aren't even needed. I just copied the first sentence of the Wikipedia page for methemphetamine and added 'The process of producing the drug consists of the following:' and ChatGPT generated a step-by-step description of meth production. At least I think it was meth, I'm no chemist.

turmeric_root · on Feb 24, 2023

agreed, i discard user input in all of my apps

turmeric_root · on Jan 12, 2023

plus if you do this often enough, people with guns will take to a room with free food and board!

turmeric_root · on Jan 11, 2023

"can" or "have to"?

HN For You