For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | cocktailpeanut's commentsregister

Hi guys, I really didn't think that it would move THIS fast, but it did, and we now have Alpaca.cpp.

As soon as I saw Alpaca.cpp on the front page, I started working on Dalai's JavaScript integration, and today I'm happy to release Dalai Alpaca.

Dalai is now kind of like a package manager of sorts, just one command to install LLaMA or Alpaca, and all integrated with the web UI and the ultra-hackable API. All you need to do is one command: "npx dalai install alpaca 7b"

Also, been fixing a lot of bugs since last time so if you had trouble installing, I recommend you try again.

Officially supports Windows, Mac and Linux. Still some quirks here and there but mostly works and happy to figure out if you are having trouble. This time it's much simpler to use because it only takes up like 4.2GB. Appreciate any feedback!


Hi HN!

I posted Dalai on HN yesterday (https://news.ycombinator.com/item?id=35127020) and I have been spending the last 24 hours trying to listen to all the feedback and incorporate all the incoming pull requests and make a new release.

  1. If you had trouble installing and they were just not installing for some reason, make sure you try this version.
  2. If you had trouble working with models other than the 7B one, yes it was my fault, only the 7B one was working, and now this has been fixed, and now ALL models should work with this new version
  3. Use existing workspace: If you already have an existing llama.cpp workspace folder, now Dalai can connect to it programmatically instead of creating its own repository under ~/dalai. Remember, while the easy installation is important, the most important aspect of Dalai is that it lets you interact with llama.cpp with JavaScript. So even if you already have an existing llama.cpp and just want to play with it using JS, you can use Dalai.
  4. New Web UI
I don't want to ramble on and on here so will link a thread where I discussed this in more detail, for those who are interested in learning more: https://twitter.com/cocktailpeanut/status/163539451761565286...

Finally, like 90% of the code for this release came from people other than myself and it's amazing how this happened. Thank you everyone who contributed.

There are some further issues I think need to be addressed. If you have any feedback that hasn't been addressed with this release, please let me know!

and if you want to follow along with the development, you can find me on Twitter https://twitter.com/cocktailpeanut


I tried it yesterday and it was great! Really simple to set up. Looking forward to updating and trying out the new changes. Thank you for this work!


Thank you for your work on this and appreciate the quick fixes!


Hey guys, I was so inspired by the llama.cpp project that I spent all day today to build a weekend side project.

Basically it lets you one-click install LLaMA on your machine with no bullshit. All you need is just run "npx dalai llama".

I see that the #1 post today is a whole long blog post about how to walk through and compile cpp and download files and all that to finally run LLaMA on your machine, but basically I have 100% automated this with a simple NPM package/application.

On top of that, the whole thing is a single NPM package and was built with hackability in mind. With just one line of JS function call you can call LLaMA from YOUR app.

Lastly, EVEN IF you don't use JavaScript, Dalai exposes a socket.io API, so you can use whatever language you want to interact with Dalai programmatically.

I discussed a bit more about this on a Twitter thread. Check it out: https://twitter.com/cocktailpeanut/status/163504032247148953...

It should "just work". Have fun!


UPDATE:

Thanks for all the feedback! I went outside to take a walk after posting this and just came back, and went through them to summarize what needs to be improved.

Basically looks like it comes down to the following:

  - *customize features:* Should not be difficult (will add flag features)
    - *path:* customize the home directory (instead of automatically storing to $HOME)
    - *python:* some people are having issues with the python binary (since the package is essentially calling these shell commands). Maybe add a flag to specify the exact name of the python binary (such as "--python python3")
    - *avoid downloading files:* I have this issue too when I just want to install the code instead of downloading the full model which takes a long time. Might add a flag to avoid downloading models in case you already have them (EDIT: actually upon thinking about it, it's better to just set the source model folder, something like --model)
    - *other flags:* The rest of the flags natively supported by the llama.cpp project, such as top_k, top_p, temp, batch_size, threads, seed, n_predict, etc. (They are already in the code but just was not exposed for CLI and not documented)
    
  - *documentation*
    - document the machine spec
    - document the storage spec: how much space is used?
    - node version: which version of node.js is required?
    - python version: which version of python doesn't work?
Am I missing anything? Feel free to leave comments, will try to roll out some updates as soon as I can. To stay updated, feel free to follow me on twitter https://twitter.com/cocktailpeanut (or you could create issues on GitHub too!)


I tried to run your NPX commands from the examples on a fresh WSL install of Ubuntu 20.04, but if you don't have build tools installed, they both just silently fail.

I only realized what was happening after trying to go the other route and use it in a package, where I then noticed the NPM install will give a node-gyp error about make missing.


I'm on NixOS, where you have to explicitly state dependencies (which is a good thing, except when... this happens)

Besides make (which I can quickly set up a project environment to make available for), what other deps do you think it uses but doesn't declare or state? ;)


The other one I noticed is pip! A lot of the script fails without pip, and it takes until after the fairly long downloads finish to let you know it was needed.


so it needs make/gcc, python AND node available... what versions, I wonder?


I successfully used the latest version of node LTS (via NVM) and the latest versions of python-pip3 and build essentials from the Canonical apt repo, if that helps.


I don’t understand why it’s downloading at all, that shouldn’t be default behavior.

It should have default instructions to load a file from a default place, and then arguments/flags to load from a specific path, and then MAYBE a prompt to download the models after it can’t find them on the paths, plural


UPDATE 2:

Thanks to all the pull requests, we've managed to solve most of these issues in the most optimal manner.

Version 0.1.0 released: https://news.ycombinator.com/item?id=35143171


I followed the initial instructions and the 7B model worked just fine.

I tried the supplementary instructions to download some of the models (7B, 13B, and 30B), and it didn't seem to work. The prompt returned nothing after waiting for several minutes.

Is there a way to run just one of the larger models?


I am going to test this out today and roll this out as soon as I can, hopefully tomorrow. stay tuned.


What's the minimum spec GPU required? NVIDIA only? Any differences between Debian and Fedora Linuxes? RAM required?


This app is CPU only and gets good speeds on even mobile phone CPUs. Minimum RAM required is 5GB.


Oh wow, any way to do this on Android yet? That would be fun to tinker with, even if it's just the smaller model. Even my older Note 9 has 6GB.


Yes. Starting with the Facebook versions of LLaMA-7B you just quantize the model to 4bit on your desktop (since it takes 14GB of RAM) and then move it to your phone and follow the Android instructions in the repo. https://github.com/ggerganov/llama.cpp/#android

I've seen dozens of screenshots of it running in termux on androids by now at completely usable speeds.


Thank you for the link! Insane that this can run on a phone.

As my current potato computer has 8GB of RAM, I'll ask a friend to do it :-)


What distro and PC specs do you have success with?


I ran this on my intel i7-7700k with 32 gig ram. It ran very slow. Almost 1 word per second slow. Not sure if I did something wrong. Distro Ubuntu 22.04


It would be great to also understand how one can finetune this model. Thanks for the awesome work!


you may be able to use pyenv to increase compatibility across Linux distributions


My biggest concern about these LLMs was the corporate sequestration and the potential socioeconomic imbalances it would create. The work you are doing here is part of some amazing work to check that back. In summary—- Bruhhhhhh. THANK YOU!


This is something to keep an eye, really. The solution for making that sequestration impossible is twofold:

1. to know how to architect and create LLMs (including training data readiness) 2. have them produced in hardware that is acquirable at reasonable cost for a normal citizen


Wow that's so incredible. Thanks for putting this together!

Do you have any machine specs associated with this? Can an old-ish Macbook Pro run this service?

I'm also curious, since I'm new to all this — is it possible to run something like this on Fly.io or does it take up way too much space?


7B is the default. If it's quantized to 4 bits, that's a 3.9 GB file.


How powerful of a computer does this need? It would be useful to see, for one thing, minimum RAM requirements for these models.


llama.cpp needs 40GB for the 65B model (due to int4 quantization)

RamNeeded(other_size) ~= 40GB * other_size/65B


Add something like this to your instructions: "Make sure you have Node.js installed on your computer."


One step install after the steps that lead up to it.


Yea not a nodejs/javascript dev at all but this is failing to install on Fedora. I don't have time to dig into it at the moment but if anybody has any well known gotchas that could be the issue that would be helpful :)

Edit: I do have nodejs and npx installed


Maybe make, python and pip. From what I gather this is a node wrapper it's actually python that runs the model


Does anyone know how to avoid downloading the model weights when doing `npx dalai llama`, and instead telling the install process where they are on my drive?


you could clone the repo and comment out https://github.com/cocktailpeanut/dalai/blob/main/index.js#L... i.e. the specific synchronous download call..?


Does this use the GPU? If not why? Aren't GPUs much faster than CPUs at AI?


Is is usable without a GPU... it'll output data a bit faster than most people type.


I think thats exactly the point so everyone can run it on their PCs with no GPU.


Or without a beefy GPU. I've got 8GB VRAM, which is great for Stable Diffusion but not useful for any of the language models released so far.

I think the 4-bit 7B LLaMA would work, but the 7B is pretty fast anyway without GPU.


I'm installing it here. How's the 7B model going so far?


Haha, I just finished ordering 32GB of additional memory for my PC so I can run the 65B model, if that tells you anything. I'm upgrading from 32GB -> 64GB.

7B is fine, 13B is better. Both are fun toys and almost make sense most of the time, but even with a lot of parameter tuning they're often incoherent. You can tell that they have encoded fewer relationships between concepts than the higher-parameter models we've gotten used to--it's much closer to GPT-2 than GPT-3.

They're good enough to whet my appetite and give me a lot of ideas of what I want to do, they're just not quite good enough to make those applications reliably useful. Based on the reports I'm hearing here of just how much better the 65B model is than the 7B, I decided it was worth $80 for a few new sticks of RAM to be able to use the full model. Still way cheaper than buying a graphics card capable of handling it.


Heh, you just made me upgrade as well. After originally paying 130 € for 32 GB, it’s nice that I only had to pay 70 € to double it ;) Not sure if I want to run LLMs (or if my Ryzen 5 3600 is even powerful enough), but I’ve wanted some more RAM for a while.


If I was running in a server context, would the 50gb of ram be required to respond to one request, or can it be used to respond to multiple requests simultaneously?


I'm very late to this question, but I believe that that amount is only required once, but the context tensor will need to be created per request. I haven't confirmed that, though.


I'd assume that all the calculations used for 1 request would already eat up that amount of memory, but I could be wrong!


I'm still holding on to a small bit of hope that the GPU market will normalise this year. Don't think that I'm the only one looking to get something highly capable but for a fair price.


> I’m still holding on to a small bit of hope that the GPU market will normalize this year.

I suspect all the people hoping it will (b/c of Stable Diffusion, etc.) are exactly the reason it won’t.


Me too. But for 3rd world countries its mad priced.


It's expensive for first-world countries too. Just look at the 4090 - it's insane that it costs 2k EUR... it's literally double the fair price (which itself is high).


Very nice. Anyway to add an option to install elsewhere other than ~/ ?


I ran "npx dalai llama" and it's just... sitting there (after I hit "y" to confirm). I checked btop++ and there's barely any downloading or CPU activity occurring, so not sure what it's doing... but does "pip3 install torch torchvision torchaudio sentencepiece numpy" take a while?

If it's actually downloading the 3.9GB of model weights or whatever, it would be pretty cool if it showed a progress bar of some sort. Stretch goal, for sure, but a very nice nicety for users.

anyway, I'll leave it be and check on it to see when it's complete. Super cool if this works!!


Made a comment on the other thread: why can’t we have a one click install thing and here it is. Nice!


Well that's pretty wild. I was wondering whether I wanted to build LLaMA tomorrow but you upended my plans in the space of 2 minutes. 10/10 well done.


There's an elephant in the room, or is it just me?

Is your script making users violate the original license agreement(§)?

For the record, i don't think Meta will go after you or anyone else. But they may decide not to make their future models available after what is happening with the Llama weights.

I realize that some people are of the opinion that AI models (weights) cannot be copyrighted at all.

--

§ the license agreement is at https://forms.gle/jk851eBVbX1m5TAv5


Yes, you are right, every project that distributes LLaMA right now is violating Meta's agreement.


I've got a weird, probably untrue conspiracy theory about this.

Hugging face releases stable diffusion. It goes viral and vastly outpaces the competition in the blink of an eye. Then they get sued.

Meta sees both of these things go down. Meta needs a leg up on chat GPT, but worries about legal repercussions similar to stable diffusion.

Whoops, it leaked! Hey, we didn't say those dastardly devs could use it.


>But they may decide not to make their future models available after what is happening with the Llama weights.

I think that ship has probably sailed, in that no one is going to release weights in this way again. Either they will publish them outright (like Whisper) or they will keep them (almost) completely closed.


This is awesome! I've wanted to try llama.cpp and you just reduced my to-do list significantly on my Sunday :) Thanks!


Looks great! Does it work on Windows please?


For Windows:

1. Binary build https://github.com/jaykrell/llama.cpp/releases/tag/1

2. Quantized model (7B/13B/30B) https://mega.nz/folder/UjAUES6Z#bGhKkyiZX3eRrn9HcxVVfA

3. main.exe -m ggml-model-q4_0.bin -t 8 -n 128


Thanks. Initial test:

main.exe -m ggml-model-q4_0.bin -t 8 -n 128 -p "The Drake equation is nonsense because"

The Drake equation is nonsense because it takes parameters that can only be known AFTER the conclusion is reached. It would be like saying "I'm going to prove a theorem by starting from the conclusion, then making up the proof. The Drake equation uses the existence of extraterrestrial intelligence as the conclusion and then making up the parameters. It is nonsense.


Nice, main.exe seems to work just fine with the 7B quantized model - generates a token every 400ms on an AMD Ryzen 5 2600!

But, quantize.exe doesn't seem to work - any valid command (such as below) pauses for a split second, then returns with no output?

$ quantize.exe ggml-model-f16.bin ggml-model-q4_0.bin 2


In case this helps anyone else: I built it myself on Windows with CMake, and then everything just works.


Do you mind sharing the binaries?


Sure! https://filetransfer.io/data-package/8hxKAiaH#link

I wasn't sure where to upload them, and that link is only good for 50 downloads. Can put them somewhere else if you know a better location that doesn't require signup.


Thank you.

llama.exe is basically main.exe?

I actually learned how to compile this code via CMake/VS2019. It's sure a whole lot more complicated then it was 25 years ago when I was writing C.


Yes, llama.exe is actually the name the project produces - the other poster must have renamed it to main.exe.

I just did `scoop install cmake`, then built from the command line, was a doddle!


I actually am installing in windows via WSL/Ubuntu fwiw


My attempt does not work, and now I'm trying to figure out where the 35+ GB of data and files that were added to my hard drive are located so I can clean it all off.


I got it to work with WSL/Ubuntu in case you want to try it that way.


If it makes common unix-ish assumptions like “Python 3 executables have a ‘3’ appended to their name”, which other comments here seem to suggest it does, it won’t, even if you have the required version of python installed.


So, I actually got it working on Windows, pretty easily!

The provided `main.exe` binary worked as-is, but `quantize.exe` did not - I built myself with CMake, and `quantize.exe` started working too.


Curious too. Let me know if you try it out. Technically I think it should work.


I tried it, doesn't work. Trying the sibling post from @buzzier.


You, sir or madam, are a hero.


When I run this commnad: npx dalai llama

I get the following output / errors?

What exactly do I need to install prior to running that command?

---------------------------- >> npx dalai llama

exec: git clone https://github.com/ggerganov/llama.cpp.git /Users/rickg/llama.cpp in undefined git clone https://github.com/ggerganov/llama.cpp.git /Users/rickg/llama.cpp exit

The default interactive shell is now zsh. To update your account to use zsh, please run `chsh -s /bin/zsh`. For more details, please visit https://support.apple.com/kb/HT208050. a.cpp3.2$ git clone https://github.com/ggerganov/llama.cpp.git /Users/rickg/llam fatal: destination path '/Users/rickg/llama.cpp' already exists and is not an empty directory. bash-3.2$ exit exit exec: git pull in /Users/rickg/llama.cpp git pull exit

The default interactive shell is now zsh. To update your account to use zsh, please run `chsh -s /bin/zsh`. For more details, please visit https://support.apple.com/kb/HT208050. bash-3.2$ git pull Already up to date. bash-3.2$ exit exit exec: python3 -m venv /Users/rickg/llama.cpp/venv in undefined python3 -m venv /Users/rickg/llama.cpp/venv exit

The default interactive shell is now zsh. To update your account to use zsh, please run `chsh -s /bin/zsh`. For more details, please visit https://support.apple.com/kb/HT208050. bash-3.2$ python3 -m venv /Users/rickg/llama.cpp/venv bash-3.2$ exit exit exec: /Users/rickg/llama.cpp/venv/bin/pip install torch torchvision torchaudio sentencepiece numpy in undefined /Users/rickg/llama.cpp/venv/bin/pip install torch torchvision torchaudio sentencepiece numpy exit

The default interactive shell is now zsh. To update your account to use zsh, please run `chsh -s /bin/zsh`. For more details, please visit https://support.apple.com/kb/HT208050. io sentencepiece numpy/llama.cpp/venv/bin/pip install torch torchvision torchaud Requirement already satisfied: torch in ./llama.cpp/venv/lib/python3.10/site-packages (1.13.1) Requirement already satisfied: torchvision in ./llama.cpp/venv/lib/python3.10/site-packages (0.14.1) Requirement already satisfied: torchaudio in ./llama.cpp/venv/lib/python3.10/site-packages (0.13.1) Requirement already satisfied: sentencepiece in ./llama.cpp/venv/lib/python3.10/site-packages (0.1.97) Requirement already satisfied: numpy in ./llama.cpp/venv/lib/python3.10/site-packages (1.24.2) Requirement already satisfied: typing-extensions in ./llama.cpp/venv/lib/python3.10/site-packages (from torch) (4.5.0) Requirement already satisfied: pillow!=8.3.,>=5.3.0 in ./llama.cpp/venv/lib/python3.10/site-packages (from torchvision) (9.4.0) Requirement already satisfied: requests in ./llama.cpp/venv/lib/python3.10/site-packages (from torchvision) (2.28.2) Requirement already satisfied: charset-normalizer<4,>=2 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (3.1.0) Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (1.26.15) Requirement already satisfied: idna<4,>=2.5 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (3.4) Requirement already satisfied: certifi>=2017.4.17 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (2022.12.7)

[notice] A new release of pip available: 22.3.1 -> 23.0.1 [notice] To update, run: python3 -m pip install --upgrade pip bash-3.2$ exit exit exec: make in /Users/rickg/llama.cpp make exit

The default interactive shell is now zsh. To update your account to use zsh, please run `chsh -s /bin/zsh`. For more details, please visit https://support.apple.com/kb/HT208050. bash-3.2$ make I llama.cpp build info: I UNAME_S: Darwin I UNAME_P: arm I UNAME_M: arm64 I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread I LDFLAGS: -framework Accelerate I CC: Apple clang version 12.0.5 (clang-1205.0.22.9) I CXX: Apple clang version 12.0.5 (clang-1205.0.22.9)

cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o ggml.c:1364:25: error: implicit declaration of function 'vdotq_s32' is invalid in C99 [-Werror,-Wimplicit-function-declaration] int32x4_t p_0 = vdotq_s32(vdupq_n_s32(0), v0_0ls, v1_0ls); ^ ggml.c:1364:19: error: initializing 'int32x4_t' (vector of 4 'int32_t' values) with an expression of incompatible type 'int' int32x4_t p_0 = vdotq_s32(vdupq_n_s32(0), v0_0ls, v1_0ls); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ggml.c:1365:19: error: initializing 'int32x4_t' (vector of 4 'int32_t' values) with an expression of incompatible type 'int' int32x4_t p_1 = vdotq_s32(vdupq_n_s32(0), v0_1ls, v1_1ls); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ggml.c:1367:13: error: assigning to 'int32x4_t' (vector of 4 'int32_t' values) from incompatible type 'int' p_0 = vdotq_s32(p_0, v0_0hs, v1_0hs); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ggml.c:1368:13: error: assigning to 'int32x4_t' (vector of 4 'int32_t' values) from incompatible type 'int' p_1 = vdotq_s32(p_1, v0_1hs, v1_1hs); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5 errors generated. make: * [ggml.o] Error 1 bash-3.2$ exit exit /Users/rickg/.npm/_npx/3c737cbb02d79cc9/node_modules/dalai/index.js:153 throw new Error("running 'make' failed") ^

Error: running 'make' failed at Dalai.install (/Users/rickg/.npm/_npx/3c737cbb02d79cc9/node_modules/dalai/index.js:153:13)


seeing this too. did you find a solution?


updating xcode did the trick


Where does it say I need Xcode installed?

Is there a list of prerequisites?

Hey thanks, after installing Xcode, that did resolve the issue.


It's free. there's extremely cheap, and there's free. no matter how extremely cheap something is, "free" is on a completely different level and gives us a new assumption that enables a lot of things that are not possible when each request is paid (no matter how cheap it is)


You do have to pay for electricity which can be significant when you have multiple GPUs


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You