Hi guys, I really didn't think that it would move THIS fast, but it did, and we now have Alpaca.cpp.
As soon as I saw Alpaca.cpp on the front page, I started working on Dalai's JavaScript integration, and today I'm happy to release Dalai Alpaca.
Dalai is now kind of like a package manager of sorts, just one command to install LLaMA or Alpaca, and all integrated with the web UI and the ultra-hackable API. All you need to do is one command: "npx dalai install alpaca 7b"
Also, been fixing a lot of bugs since last time so if you had trouble installing, I recommend you try again.
Officially supports Windows, Mac and Linux. Still some quirks here and there but mostly works and happy to figure out if you are having trouble. This time it's much simpler to use because it only takes up like 4.2GB. Appreciate any feedback!
I posted Dalai on HN yesterday (https://news.ycombinator.com/item?id=35127020) and I have been spending the last 24 hours trying to listen to all the feedback and incorporate all the incoming pull requests and make a new release.
1. If you had trouble installing and they were just not installing for some reason, make sure you try this version.
2. If you had trouble working with models other than the 7B one, yes it was my fault, only the 7B one was working, and now this has been fixed, and now ALL models should work with this new version
3. Use existing workspace: If you already have an existing llama.cpp workspace folder, now Dalai can connect to it programmatically instead of creating its own repository under ~/dalai. Remember, while the easy installation is important, the most important aspect of Dalai is that it lets you interact with llama.cpp with JavaScript. So even if you already have an existing llama.cpp and just want to play with it using JS, you can use Dalai.
4. New Web UI
Finally, like 90% of the code for this release came from people other than myself and it's amazing how this happened. Thank you everyone who contributed.
There are some further issues I think need to be addressed. If you have any feedback that hasn't been addressed with this release, please let me know!
Hey guys, I was so inspired by the llama.cpp project that I spent all day today to build a weekend side project.
Basically it lets you one-click install LLaMA on your machine with no bullshit. All you need is just run "npx dalai llama".
I see that the #1 post today is a whole long blog post about how to walk through and compile cpp and download files and all that to finally run LLaMA on your machine, but basically I have 100% automated this with a simple NPM package/application.
On top of that, the whole thing is a single NPM package and was built with hackability in mind. With just one line of JS function call you can call LLaMA from YOUR app.
Lastly, EVEN IF you don't use JavaScript, Dalai exposes a socket.io API, so you can use whatever language you want to interact with Dalai programmatically.
Thanks for all the feedback! I went outside to take a walk after posting this and just came back, and went through them to summarize what needs to be improved.
Basically looks like it comes down to the following:
- *customize features:* Should not be difficult (will add flag features)
- *path:* customize the home directory (instead of automatically storing to $HOME)
- *python:* some people are having issues with the python binary (since the package is essentially calling these shell commands). Maybe add a flag to specify the exact name of the python binary (such as "--python python3")
- *avoid downloading files:* I have this issue too when I just want to install the code instead of downloading the full model which takes a long time. Might add a flag to avoid downloading models in case you already have them (EDIT: actually upon thinking about it, it's better to just set the source model folder, something like --model)
- *other flags:* The rest of the flags natively supported by the llama.cpp project, such as top_k, top_p, temp, batch_size, threads, seed, n_predict, etc. (They are already in the code but just was not exposed for CLI and not documented)
- *documentation*
- document the machine spec
- document the storage spec: how much space is used?
- node version: which version of node.js is required?
- python version: which version of python doesn't work?
Am I missing anything? Feel free to leave comments, will try to roll out some updates as soon as I can. To stay updated, feel free to follow me on twitter https://twitter.com/cocktailpeanut (or you could create issues on GitHub too!)
I tried to run your NPX commands from the examples on a fresh WSL install of Ubuntu 20.04, but if you don't have build tools installed, they both just silently fail.
I only realized what was happening after trying to go the other route and use it in a package, where I then noticed the NPM install will give a node-gyp error about make missing.
I'm on NixOS, where you have to explicitly state dependencies (which is a good thing, except when... this happens)
Besides make (which I can quickly set up a project environment to make available for), what other deps do you think it uses but doesn't declare or state? ;)
The other one I noticed is pip! A lot of the script fails without pip, and it takes until after the fairly long downloads finish to let you know it was needed.
I successfully used the latest version of node LTS (via NVM) and the latest versions of python-pip3 and build essentials from the Canonical apt repo, if that helps.
I don’t understand why it’s downloading at all, that shouldn’t be default behavior.
It should have default instructions to load a file from a default place, and then arguments/flags to load from a specific path, and then MAYBE a prompt to download the models after it can’t find them on the paths, plural
I followed the initial instructions and the 7B model worked just fine.
I tried the supplementary instructions to download some of the models (7B, 13B, and 30B), and it didn't seem to work. The prompt returned nothing after waiting for several minutes.
Is there a way to run just one of the larger models?
Yes. Starting with the Facebook versions of LLaMA-7B you just quantize the model to 4bit on your desktop (since it takes 14GB of RAM) and then move it to your phone and follow the Android instructions in the repo. https://github.com/ggerganov/llama.cpp/#android
I've seen dozens of screenshots of it running in termux on androids by now at completely usable speeds.
I ran this on my intel i7-7700k with 32 gig ram. It ran very slow. Almost 1 word per second slow. Not sure if I did something wrong.
Distro Ubuntu 22.04
My biggest concern about these LLMs was the corporate sequestration and the potential socioeconomic imbalances it would create. The work you are doing here is part of some amazing work to check that back. In summary—- Bruhhhhhh. THANK YOU!
This is something to keep an eye, really. The solution for making that sequestration impossible is twofold:
1. to know how to architect and create LLMs (including training data readiness)
2. have them produced in hardware that is acquirable at reasonable cost for a normal citizen
Yea not a nodejs/javascript dev at all but this is failing to install on Fedora. I don't have time to dig into it at the moment but if anybody has any well known gotchas that could be the issue that would be helpful :)
Does anyone know how to avoid downloading the model weights when doing `npx dalai llama`, and instead telling the install process where they are on my drive?
Haha, I just finished ordering 32GB of additional memory for my PC so I can run the 65B model, if that tells you anything. I'm upgrading from 32GB -> 64GB.
7B is fine, 13B is better. Both are fun toys and almost make sense most of the time, but even with a lot of parameter tuning they're often incoherent. You can tell that they have encoded fewer relationships between concepts than the higher-parameter models we've gotten used to--it's much closer to GPT-2 than GPT-3.
They're good enough to whet my appetite and give me a lot of ideas of what I want to do, they're just not quite good enough to make those applications reliably useful. Based on the reports I'm hearing here of just how much better the 65B model is than the 7B, I decided it was worth $80 for a few new sticks of RAM to be able to use the full model. Still way cheaper than buying a graphics card capable of handling it.
Heh, you just made me upgrade as well. After originally paying 130 € for 32 GB, it’s nice that I only had to pay 70 € to double it ;) Not sure if I want to run LLMs (or if my Ryzen 5 3600 is even powerful enough), but I’ve wanted some more RAM for a while.
If I was running in a server context, would the 50gb of ram be required to respond to one request, or can it be used to respond to multiple requests simultaneously?
I'm very late to this question, but I believe that that amount is only required once, but the context tensor will need to be created per request. I haven't confirmed that, though.
I'm still holding on to a small bit of hope that the GPU market will normalise this year. Don't think that I'm the only one looking to get something highly capable but for a fair price.
It's expensive for first-world countries too. Just look at the 4090 - it's insane that it costs 2k EUR... it's literally double the fair price (which itself is high).
I ran "npx dalai llama" and it's just... sitting there (after I hit "y" to confirm). I checked btop++ and there's barely any downloading or CPU activity occurring, so not sure what it's doing... but does "pip3 install torch torchvision torchaudio sentencepiece numpy" take a while?
If it's actually downloading the 3.9GB of model weights or whatever, it would be pretty cool if it showed a progress bar of some sort. Stretch goal, for sure, but a very nice nicety for users.
anyway, I'll leave it be and check on it to see when it's complete. Super cool if this works!!
There's an elephant in the room, or is it just me?
Is your script making users violate the original license agreement(§)?
For the record, i don't think Meta will go after you or anyone else. But they may decide not to make their future models available after what is happening with the Llama weights.
I realize that some people are of the opinion that AI models (weights) cannot be copyrighted at all.
>But they may decide not to make their future models available after what is happening with the Llama weights.
I think that ship has probably sailed, in that no one is going to release weights in this way again. Either they will publish them outright (like Whisper) or they will keep them (almost) completely closed.
main.exe -m ggml-model-q4_0.bin -t 8 -n 128 -p "The Drake equation is nonsense because"
The Drake equation is nonsense because it takes parameters that can only be known AFTER the conclusion is reached. It would be like saying "I'm going to prove a theorem by starting from the conclusion, then making up the proof.
The Drake equation uses the existence of extraterrestrial intelligence as the conclusion and then making up the parameters. It is nonsense.
I wasn't sure where to upload them, and that link is only good for 50 downloads. Can put them somewhere else if you know a better location that doesn't require signup.
My attempt does not work, and now I'm trying to figure out where the 35+ GB of data and files that were added to my hard drive are located so I can clean it all off.
If it makes common unix-ish assumptions like “Python 3 executables have a ‘3’ appended to their name”, which other comments here seem to suggest it does, it won’t, even if you have the required version of python installed.
The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
a.cpp3.2$ git clone https://github.com/ggerganov/llama.cpp.git /Users/rickg/llam
fatal: destination path '/Users/rickg/llama.cpp' already exists and is not an empty directory.
bash-3.2$ exit
exit
exec: git pull in /Users/rickg/llama.cpp
git pull
exit
The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
bash-3.2$ git pull
Already up to date.
bash-3.2$ exit
exit
exec: python3 -m venv /Users/rickg/llama.cpp/venv in undefined
python3 -m venv /Users/rickg/llama.cpp/venv
exit
The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
bash-3.2$ python3 -m venv /Users/rickg/llama.cpp/venv
bash-3.2$ exit
exit
exec: /Users/rickg/llama.cpp/venv/bin/pip install torch torchvision torchaudio sentencepiece numpy in undefined
/Users/rickg/llama.cpp/venv/bin/pip install torch torchvision torchaudio sentencepiece numpy
exit
The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
io sentencepiece numpy/llama.cpp/venv/bin/pip install torch torchvision torchaud
Requirement already satisfied: torch in ./llama.cpp/venv/lib/python3.10/site-packages (1.13.1)
Requirement already satisfied: torchvision in ./llama.cpp/venv/lib/python3.10/site-packages (0.14.1)
Requirement already satisfied: torchaudio in ./llama.cpp/venv/lib/python3.10/site-packages (0.13.1)
Requirement already satisfied: sentencepiece in ./llama.cpp/venv/lib/python3.10/site-packages (0.1.97)
Requirement already satisfied: numpy in ./llama.cpp/venv/lib/python3.10/site-packages (1.24.2)
Requirement already satisfied: typing-extensions in ./llama.cpp/venv/lib/python3.10/site-packages (from torch) (4.5.0)
Requirement already satisfied: pillow!=8.3.,>=5.3.0 in ./llama.cpp/venv/lib/python3.10/site-packages (from torchvision) (9.4.0)
Requirement already satisfied: requests in ./llama.cpp/venv/lib/python3.10/site-packages (from torchvision) (2.28.2)
Requirement already satisfied: charset-normalizer<4,>=2 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (3.1.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (1.26.15)
Requirement already satisfied: idna<4,>=2.5 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (2022.12.7)
[notice] A new release of pip available: 22.3.1 -> 23.0.1
[notice] To update, run: python3 -m pip install --upgrade pip
bash-3.2$ exit
exit
exec: make in /Users/rickg/llama.cpp
make
exit
The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
bash-3.2$ make
I llama.cpp build info:
I UNAME_S: Darwin
I UNAME_P: arm
I UNAME_M: arm64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS: -framework Accelerate
I CC: Apple clang version 12.0.5 (clang-1205.0.22.9)
I CXX: Apple clang version 12.0.5 (clang-1205.0.22.9)
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o
ggml.c:1364:25: error: implicit declaration of function 'vdotq_s32' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
int32x4_t p_0 = vdotq_s32(vdupq_n_s32(0), v0_0ls, v1_0ls);
^
ggml.c:1364:19: error: initializing 'int32x4_t' (vector of 4 'int32_t' values) with an expression of incompatible type 'int'
int32x4_t p_0 = vdotq_s32(vdupq_n_s32(0), v0_0ls, v1_0ls);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1365:19: error: initializing 'int32x4_t' (vector of 4 'int32_t' values) with an expression of incompatible type 'int'
int32x4_t p_1 = vdotq_s32(vdupq_n_s32(0), v0_1ls, v1_1ls);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1367:13: error: assigning to 'int32x4_t' (vector of 4 'int32_t' values) from incompatible type 'int'
p_0 = vdotq_s32(p_0, v0_0hs, v1_0hs);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1368:13: error: assigning to 'int32x4_t' (vector of 4 'int32_t' values) from incompatible type 'int'
p_1 = vdotq_s32(p_1, v0_1hs, v1_1hs);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5 errors generated.
make: * [ggml.o] Error 1
bash-3.2$ exit
exit
/Users/rickg/.npm/_npx/3c737cbb02d79cc9/node_modules/dalai/index.js:153
throw new Error("running 'make' failed")
^
Error: running 'make' failed
at Dalai.install (/Users/rickg/.npm/_npx/3c737cbb02d79cc9/node_modules/dalai/index.js:153:13)
It's free. there's extremely cheap, and there's free. no matter how extremely cheap something is, "free" is on a completely different level and gives us a new assumption that enables a lot of things that are not possible when each request is paid (no matter how cheap it is)
As soon as I saw Alpaca.cpp on the front page, I started working on Dalai's JavaScript integration, and today I'm happy to release Dalai Alpaca.
Dalai is now kind of like a package manager of sorts, just one command to install LLaMA or Alpaca, and all integrated with the web UI and the ultra-hackable API. All you need to do is one command: "npx dalai install alpaca 7b"
Also, been fixing a lot of bugs since last time so if you had trouble installing, I recommend you try again.
Officially supports Windows, Mac and Linux. Still some quirks here and there but mostly works and happy to figure out if you are having trouble. This time it's much simpler to use because it only takes up like 4.2GB. Appreciate any feedback!