More

danielhanchen · 2026-04-02T16:16:59 1775146619

Thinking / reasoning + multimodal + tool calling.

We made some quants at https://huggingface.co/collections/unsloth/gemma-4 for folks to run them - they work really well!

Guide for those interested: https://unsloth.ai/docs/models/gemma-4

Also note to use temperature = 1.0, top_p = 0.95, top_k = 64 and the EOS is "<turn|>". "<|channel>thought\n" is also used for the thinking trace!

evilelectron · 2026-04-02T17:09:47 1775149787

Daniel, your work is changing the world. More power to you.

I setup a pipeline for inference with OCR, full text search, embedding and summarization of land records dating back 1800s. All powered by the GGUF's you generate and llama.cpp. People are so excited that they can now search the records in multiple languages that a 1 minute wait to process the document seems nothing. Thank you!

danielhanchen · 2026-04-02T17:10:51 1775149851

Oh appreciate it!

Oh nice! That sounds fantastic! I hope Gemma-4 will make it even better! The small ones 2B and 4B are shockingly good haha!

qingcharles · 2026-04-04T04:53:39 1775278419

Just switched from 3.1 Flash Lite to Gemma-4 31B on the AI Studio API since there is a generous 1500/day on non-billed projects. It's doing fantastic.

polishdude20 · 2026-04-02T18:19:17 1775153957

Hey in really interested in your pipeline techniques. I've got some pdfs I need to get processed but processing them in the cloud with big providers requires redaction.

Wondering if a local model or a self hosted one would work just as well.

evilelectron · 2026-04-02T20:01:56 1775160116

I run llama.cpp with Qwen3-VL-8B-Instruct-Q4_K_S.gguf with mmproj-F16.gguf for OCR and translation. I also run llama.cpp with Qwen3-Embedding-0.6B-GGUF for embeddings. Drupal 11 with ai_provider_ollama and custom provider ai_provider_llama (heavily derived from ai_provider_ollama) with PostreSQL and pgvector.

People on site scan the documents and upload them for archival. The directory monitor looks for new files in the archive directories and once a new file is available, it is uploaded to Drupal. Once a new content is created in Drupal, Drupal triggers the translation and embedding process through llama.cpp. Qwen3-VL-8B is also used for chat and RAG. Client is familiar with Drupal and CMS in general and wanted to stay in a similar environment. If you are starting new I would recommend looking at docling.

lwhi · 2026-04-03T06:50:48 1775199048

Are you linking any of the processes using the Drupal AI module suite?

evilelectron · 2026-04-03T14:25:59 1775226359

Yes, they are all linked using Drupal's AI modules. I have an OpenCV application that removes the old paper look, enhances the contrast and fixes the orientation of the images before they hit llama.cpp for OCR and translation.

chrisweekly · 2026-04-02T20:13:48 1775160828

Disclaimer: I'm an AI novice relative to many here. FWIW last wknd I spent a couple hours setting up self-hosted n8n with ollama and gemma3:4b [EDIT: not Qwen-3.5], using PDF content extraction for my PoC. 100% local workflow, no runtime dependency on cloud providers. I doubt it'd scale very well (macbook air m4, measly 16GB RAM), but it works as intended.

patrickk · 2026-04-03T06:33:01 1775197981

For those who wish to do OCR on photos, like receipts, or PDFs or anything really, Paperless-NGX works amazingly well and runs on a potato.

polishdude20 · 2026-04-02T20:25:18 1775161518

How do you extract the content? OCR? Pdf to text then feed into qwen?

I tried something similar where I needed a bunch of tables extracted from the pdf over like 40 pages. It was crazy slow on my MacBook and innacurate

philipkglass · 2026-04-02T20:47:24 1775162844

If you have a basic ARM MacBook, GLM-OCR is the best single model I have found for OCR with good table extraction/formatting. It's a compact 0.9b parameter model, so it'll run on systems with only 8 GB of RAM.

https://github.com/zai-org/GLM-OCR

Use mlx-vlm for inference:

https://github.com/zai-org/GLM-OCR/blob/main/examples/mlx-de...

Then you can run a single command to process your PDF:

  glmocr parse example.pdf

  Loading images: example.pdf
  Found 1 file(s)
  Starting Pipeline...
  Pipeline started!
  GLM-OCR initialized in self-hosted mode
  Using Pipeline (enable_layout=true)...

  === Parsing: example.pdf (1/1) ===

My test document contains scanned pages from a law textbook. It's two columns of text with a lot of footnotes. It took 60 seconds to process 5 pages on a MBP with M4 Max chip.

After it's done, you'll have a directory output/example/ that contains .md and .json files. The .md file will contain a markdown rendition of the complete document. The .json file will contain individual labeled regions from the document along with their transcriptions. If you get all the JSON objects with

  "label": "table"

from the JSON file, you can get an HTML-formatted table from each "content" section of these objects.

It might still be inaccurate -- I don't know how challenging your original tables are -- but it shouldn't be terribly slow. The tables it produced for me were good.

I have also built more complex work flows that use a mixture of OCR-specialized models and general purpose VLM models like Qwen 3.5, along with software to coordinate and reconcile operations, but GLM-OCR by itself is the best first thing to try locally.

polishdude20 · 2026-04-02T21:56:39 1775166999

Thanks! Just tried it on a 40 page pdf. Seems to work for single images but the large pdf gives me connection timeouts

philipkglass · 2026-04-02T22:04:54 1775167494

I also get connection timeouts on larger documents, but it automatically retries and completes. All the pages are processed when I'm done. However, I'm using the Python client SDK for larger documents rather than the basic glmocr command line tool. I'm not sure if that makes a difference.

polishdude20 · 2026-04-03T04:52:43 1775191963

Yeah looks like the cli also retries as well. I was able to get it working using a higher timeout.

davidbjaffe · 2026-04-03T14:25:04 1775226304

Cool! For GLM-OCR, do you use "Option 2: Self-host with vLLM / SGLang" and in that case, am I correct that there is no internet connection involved and hence connection timeouts would be avoided entirely?

philipkglass · 2026-04-03T14:54:33 1775228073

When you self-host, there's still a client/server relationship between your self-hosted inference server and the client that manages the processing of individual pages. You can get timeouts depending on the configured timeouts, the speed of your inference server, and the complexity of the pages you're processing. But you can let the client retry and/or raise the initial timeout limit if you keep running into timeouts.

That said, this is already a small and fast model when hosted via MLX on macOS. If you run the inference server with a recent NVidia GPU and vLLM on Linux it should be significantly faster. The big advantage with vLLM for OCR models is its continuous batching capability. Using other OCR models that I couldn't self-host on macOS, like DeepSeek 2 OCR or Chandra 2, vLLM gave dramatic throughput improvements on big documents via continuous batching if I process 8-10 pages at a time. This is with a single 4090 GPU.

chrisweekly · 2026-04-02T21:40:08 1775166008

1. Correction: I'd planned to use Qwen-3.5 but ended up using gemma3:4b.

2. The n8n workflow passes a given binary pdf to gemma, which (based on a detailed prompt) analyzes it and produces JSON output.

See https://github.com/LinkedInLearning/build-with-ai-running-lo... if you want more details. :)

tehologist · 2026-04-03T04:27:11 1775190431

Python pdftools to convert to images and tesseract to ocr them to text files. Fast free and can run on CPU.

jorl17 · 2026-04-02T19:22:02 1775157722

Seconded, would also love to hear your story if you would be willing

Breza · 2026-04-03T14:41:35 1775227295

I'm very active in family history and this kind of project is massively helpful, thank you

wok4899 · 2026-04-04T11:51:18 1775303478

This is a very interesting project. If it's publicly available, would you mind sharing it? I would love to understand how it works.

Ps: found your other comments, thanks.

irishcoffee · 2026-04-03T14:23:02 1775226182

> your work is changing the world

I realize this may have been hyperbole, but it sure isn't changing the world.

a96 · 2026-04-05T10:59:07 1775386747

For relatively small values of changing or world, it sure is.

In the world of local models, Unsloth is one of the most significant projects there is.

akavel · 2026-04-02T22:57:04 1775170624

I'm trying to disable "thinking", but it doesn't seem to work (in llama.cpp). The usual `--reasoning-budget 0` doesn't seem to change it, nor `--chat-template-kwargs '{"enable_thinking":false}'` (both with `--jinja`). Am I missing something?

EDIT: Ok, looks like there's yet another new flag for that in llama.cpp, and this one seems to work in this case: `--reasoning off`.

FWIW, I'm doing some initial tries of unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL, and for writing some Nix, I'm VERY impressed - seems significantly better than qwen3.5-35b-a3b for me for now. Example commandline on a Macbook Air M4 32gb RAM:

  llama-cli -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL  -t 1.0 --top-p 0.95 --top-k 64 -fa on --no-mmproj --reasoning-budget 0 -c 32768 --jinja --reasoning off

(at release b8638, compiled with Nix)

danielhanchen · 2026-04-03T05:26:53 1775194013

Oh very cool! Will check the `--reasoning off` flag as well!

Yep the models are really good!

Imustaskforhelp · 2026-04-02T16:49:48 1775148588

Daniel, I know you might hear this a lot but I really appreciate a lot of what you have been doing at Unsloth and the way you handle your communication, whether within hackernews/reddit.

I am not sure if someone might have asked this already to you, but I have a question (out of curiosity) as to which open source model you find best and also, which AI training team (Qwen/Gemini/Kimi/GLM) has cooperated the most with the Unsloth team and is friendly to work with from such perspective?

danielhanchen · 2026-04-02T16:59:56 1775149196

Thanks a lot for the support :)

Tbh Gemma-4 haha - it's sooooo good!!!

For teams - Google haha definitely hands down then Qwen, Meta haha through PyTorch and Llama and Mistral - tbh all labs are great!

Imustaskforhelp · 2026-04-02T17:04:06 1775149446

Now you have gotten me a bit excited for Gemma-4, Definitely gonna see if I can run the unsloth quants of this on my mac air & thanks for responding to my comment :-)

danielhanchen · 2026-04-02T17:07:03 1775149623

Thanks! Have a super good day!!

genpfault · 2026-04-03T02:14:33 1775182473

llama.cpp (b8642) auto-fits ~200k context on this 24GB RX 7900 XTX & it shows a solid 100+ tok/s ("S_TG t/s") on the first 32k of it, nice!

    ./llama-batched-bench -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL \
    -npp 1000,2000,4000,8000,16000,32000,64000,96000,128000 -ntg 128 -npl 1 -c 0
    |    PP |     TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s |
    |-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
    |  1000 |    128 |    1 |   1128 |    0.416 |  2404.87 |    1.064 |   120.29 |    1.480 |   762.20 |
    |  2000 |    128 |    1 |   2128 |    0.755 |  2649.86 |    1.075 |   119.04 |    1.830 |  1162.83 |
    |  4000 |    128 |    1 |   4128 |    1.501 |  2665.72 |    1.093 |   117.08 |    2.594 |  1591.49 |
    |  8000 |    128 |    1 |   8128 |    3.142 |  2545.85 |    1.114 |   114.87 |    4.257 |  1909.47 |
    | 16000 |    128 |    1 |  16128 |    6.908 |  2316.00 |    1.189 |   107.65 |    8.097 |  1991.73 |
    | 32000 |    128 |    1 |  32128 |   16.382 |  1953.31 |    1.278 |   100.12 |   17.661 |  1819.16 |
    | 64000 |    128 |    1 |  64128 |   43.427 |  1473.74 |    1.453 |    88.12 |   44.879 |  1428.89 |
    | 96000 |    128 |    1 |  96128 |   82.227 |  1167.50 |    1.623 |    78.86 |   83.850 |  1146.42 |
    |128000 |    128 |    1 | 128128 |  133.237 |   960.69 |    1.797 |    71.25 |  135.034 |   948.86 |

spwa4 · 2026-04-03T20:12:20 1775247140

~50 tok/s on M1 Max 64Gb

danielhanchen · 2026-04-03T12:04:55 1775217895

Oh nice that's pretty good!

l2dy · 2026-04-02T16:42:43 1775148163

FYI, screenshot for the "Search and download Gemma 4" step on your guide is for qwen3.5, and when I searched for gemma-4 in Unsloth Studio it only shows Gemma 3 models.

danielhanchen · 2026-04-02T16:45:18 1775148318

We're still updating it haha! Sorry! It's been quite complex to support new models without breaking old ones

smallerize · 2026-04-02T21:02:41 1775163761

Speaking of which, do you think Step 3.5 Flash is going to happen or should I stop holding my breath?

danielhanchen · 2026-04-03T12:03:54 1775217834

Oh quants - haha I can re-investigate it - just totally forgot about them

trashcan2137 · 2026-04-03T02:05:51 1775181951

  and the EOS is "<turn|>". "<|channel>thought\n" is also used for the thinking trace!

Can someone explain this to me? Why is this faux-XML important here?

pertymcpert · 2026-04-03T05:14:28 1775193268

That’s how the model is trained to signal the end to its generation and to indicate its thinking.

sroussey · 2026-04-03T06:52:27 1775199147

These are likely individual tokens. They are super common.

rizzo94 · 2026-04-03T15:34:19 1775230459

Huge fan of the Unsloth quants! Having reasoning and tool calling this accessible locally is a massive leap forward.

The main hurdle I've found with local tool calling is managing the execution boundaries safely. I’ve started plugging these local models into PAIO to handle that. Since it acts as a hardened execution layer with strict BYOK sovereignty, it lets you actually utilize Gemma-4's tool calling capabilities without the low-level anxiety of a hallucination accidentally wiping your drive. It’s the perfect secure gateway for these advanced local models.

Wowfunhappy · 2026-04-02T21:00:48 1775163648

Hi! Do you ever make quants of the base models? I'm interested in experimenting with them in non-chat contexts.

car · 2026-04-03T02:15:43 1775182543

Yes, they are listed on huggingface. The instruction trained models have an 'it' in their name.

https://huggingface.co/collections/unsloth/gemma-4

Edit: Sorry, I'm not sure if this is a quant, but it says 'finetuned' from the Google Gemma 4 parent snapshot. It's the same size as the UD 8-bit quant though.

Wowfunhappy · 2026-04-03T03:48:04 1775188084

Only the 'it' models seem to have quants. I was really hoping to try a base model.

kristjansson · 2026-04-03T07:03:36 1775199816

Basic quantization is easy if you have enough RAM (not VRAM) to load the weights.

zaat · 2026-04-02T17:41:12 1775151672

Thank you for your work.

You have an answer on your page regarding "Should I pick 26B-A4B or 31B?", but can you please clarify if, assuming 24GB vRAM, I should pick a full precision smaller model or 4 bit larger model?

petu · 2026-04-02T20:06:51 1775160411

Try 26B first. 31B seems to have very heavy KV cache (maybe bugged in llama.cpp at the moment; 16K takes up 4.9GB).

edit: 31B cache is not bugged, there's static SWA cost of 3.6GB.. so IQ4_XS at 15.2GB seems like reasonable pair, but even then barely enough for 64K for 24GB VRAM. Maybe 8 bit KV quantization is fine now after https://github.com/ggml-org/llama.cpp/pull/21038 got merged, so 100K+ is possible.

> I should pick a full precision smaller model or 4 bit larger model?

4 bit larger model. You have to use quant either way -- even if by full precision you mean 8 bit, it's gonna be 26GB + overhead + chat context.

Try UD-Q4_K_XL.

danielhanchen · 2026-04-02T20:12:31 1775160751

Yes UD-Q4_K_XL works well! :)

mixtureoftakes · 2026-04-02T20:25:01 1775161501

what is the main difference between "normal" quants and the UD ones?

car · 2026-04-02T20:58:11 1775163491

They explain it here:

https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs

For the best quality reply, I used the Gemma-4 31B UD-Q8_K_XL quant with Unsloth Studio to summarize the URL with web search. It produced 4.9 tok/s (including web search) on an MacBook Pro M1 Max with 64GB.

Here an excerpt of it's own words:

Unsloth Dynamic 2.0 Quantization

Dynamic 2.0 is not just a "bit-reduction" but an intelligent, per-layer optimization strategy.

- Selective Layer Quantization: Instead of making every layer 4-bit, Dynamic 2.0 analyzes every single layer and selectively adjusts the quantization type. Some critical layers may be kept at higher precision, while less critical layers are compressed more.

- Model-Specific Tailoring: The quantization scheme is custom-built for each model. For example, the layers selected for quantization in Gemma 3 are completely different from those in Llama 4.

- High-Quality Calibration: They use a hand-curated calibration dataset of >1.5M tokens specifically designed to enhance conversational chat performance, rather than just optimizing for Wikipedia-style text.

- Architecture Agnostic: While previous versions were mostly effective for MoE (Mixture of Experts) models, Dynamic 2.0 works for all architectures (both MoE and non-MoE).

danielhanchen · 2026-04-02T18:00:18 1775152818

Thank you!

I presume 24B is somewhat faster since it's only 4B activated - 31B is quite a large dense model so more accurate!

ryandrake · 2026-04-02T19:44:50 1775159090

This is one of the more confusing aspects of experimenting with local models as a noob. Given my GPU, which model should I use, which quantization of that model should I pick (unsloth tends to offer over a dozen!) and what context size should I use? Overestimate any of these, and the model just won't load and you have to trial-and-error your way to finding a good combination. The red/yellow/green indicators on huggingface.co are kind of nice, but you only know for sure when you try to load the model and allocate context.

danielhanchen · 2026-04-02T19:57:12 1775159832

Definitely Unsloth Studio can help - we recommend specific quants (like Gemma-4) and also auto calculate the context length etc!

ryandrake · 2026-04-02T20:05:37 1775160337

Will have to try it out. I always thought that was more for fine-tuning and less for inference.

danielhanchen · 2026-04-02T20:12:19 1775160739

Oh yes sadly we partially mis-communicated haha - there's both and synthetic data generation + exporting!

kapimalos · 2026-04-02T23:03:40 1775171020

Noob question. Why I would use this version over the original model?

piyh · 2026-04-02T23:05:39 1775171139

1/3 the RAM & CPU consumed for 99% the performance

pentagrama · 2026-04-02T19:24:39 1775157879

Hey, I tried to use Unsloth to run Gemma 4 locally but got stuck during the setup on Windows 11.

At some point it asked me to create a password, and right after that it threw an error. Here’s a screenshot: https://imgur.com/a/sCMmqht

This happened after running the PowerShell setup, where it installed several things like NVIDIA components, VS Code, and Python. At the end, PowerShell tell me to open a http://localhost URL in my browser, and that’s where I was prompted to set the password before it failed.

Also, I noticed that an Unsloth icon was added to my desktop, but when I click it, nothing happens.

For context, I’m not a developer and I had never used PowerShell before. Some of the steps were a bit intimidating and I wasn’t fully sure what I was approving when clicking through.

The overall experience felt a bit rough for my level. It would be great if this could be packaged as a simple .exe or a standalone app instead of going through terminal and browser steps.

Are there any plans to make something like that?

danielhanchen · 2026-04-02T19:54:43 1775159683

Apologies we just fixed it!! If you try again from source ie

irm https://unsloth.ai/install.ps1 | iex

it should work hopefully. If not - please at us on Discord and we'll help you!

The Network error is a bummer - we'll check.

And yes we're working on a .exe!!

pentagrama · 2026-04-03T04:34:00 1775190840

It worked! https://imgur.com/a/SOfiRhv

Thanks, will check it out tomorrow.

Hope the unsloth-setup.exe > Windows App is coming soon! I think it will expand accessibility and user base.

danielhanchen · 2026-04-03T12:05:21 1775217921

Oh nice! Glad it worked! Yes!! We're working on the app!

sillysaurusx · 2026-04-03T04:54:26 1775192066

Temperature 1.0 used to be bad for sampling. 0.7 was the better choice, and the difference in results were noticeable. You may want to experiment with this.

danielhanchen · 2026-04-03T05:27:44 1775194064

You might be right, but Google's recommendation was temp 1 etc primarily because all their benchmarks were used with these numbers, so it's better reproducibility for downstream tasks

sillysaurusx · 2026-04-03T05:29:33 1775194173

Fair, though putting a note in the readme about temperature 0.7 couldn't hurt.

I wonder why they do benchmarks with 1 instead of 0.7... that's strange. 0.7 or 0.8 at most gives noticeably better samples.

davedx · 2026-04-03T09:13:00 1775207580

Reproducibility. They're benchmarks.

sillysaurusx · 2026-04-03T09:15:26 1775207726

Reproducibility is a matter of using the same input seeds, which jax can do. 0.7 vs 1.0 would make no difference for that.

Without seeds, 0.7 would be less random than 1.0, so it'd be (slightly) more reproducible.

Kye · 2026-04-02T22:42:45 1775169765

I haven't tried a local model in a while. I can only fit E4B in VRAM (8GB), but it's good enough that I can see it replacing Claude.ai for some things.

sixhobbits · 2026-04-03T07:20:57 1775200857

Thanks for this, I gave this guide to my Claude and he oneshot the unsloth and gemma4 set up on the old macbook he runs on. It's way faster than I expected, haven't tried out local models for a few generations but will be very nice when they become useful

danielhanchen · 2026-04-03T12:04:40 1775217880

Thanks! Oh nice! Ye local models are advancing much faster than I expected!

egeres · 2026-04-02T20:14:59 1775160899

Thank you and your brother for all the amazing work, it's really inspiring to others <3

danielhanchen · 2026-04-02T20:27:24 1775161644

Thank you and appreciate it!

zkmon · 2026-04-03T08:37:39 1775205459

How does Gemma 4 26B A4B compare with Qwen3.5 35B A3B for same quants(4)

mmaunder · 2026-04-05T16:08:42 1775405322

This comment deserves it's own HN post. Thanks!

jquery · 2026-04-02T20:16:58 1775161018

Awesome!! Thank you SO much for this.

danielhanchen · 2026-04-02T20:27:34 1775161654

Appreciate it!

nnucera · 2026-04-02T22:38:01 1775169481

Wow! Thank you very much!

danielhanchen · 2026-04-03T05:34:19 1775194459

Thanks!

zobzu · 2026-04-02T21:28:26 1775165306

neat, time to update my spam filter model hehe

danielhanchen · 2026-04-03T05:35:09 1775194509

Haha! Ye the model is really good

danielhanchen · 2026-03-18T17:56:30 1773856590

Apologies on the delay - we fixed it! Please re-try with:

curl -LsSf https://astral.sh/uv/install.sh | sh

uv venv unsloth_studio --python 3.13

source unsloth_studio/bin/activate

uv pip install unsloth --torch-backend=auto

unsloth studio setup

unsloth studio -H 0.0.0.0 -p 8888

car · 2026-03-19T01:29:04 1773883744

Thank you for the follow up! Big fan of your models here, thanks for everything you are doing!

Works fine on MacOS now (chat only).

On Ubuntu 24.04 with two GPU's (3090+3070), it appears that Llama.cpp sometimes uses the CPU and not GPU. This is judging from the tk/s and CPU load for identical models run with US-studio vs. just Llama.cpp (bleeding edge).

danielhanchen · 2026-03-18T17:54:58 1773856498

If you can try again we just updated the process sorry! We did a new pypi release:

curl -LsSf https://astral.sh/uv/install.sh | sh

uv venv unsloth_studio --python 3.13

source unsloth_studio/bin/activate

uv pip install unsloth==2026.3.7 --torch-backend=auto

unsloth studio setup

unsloth studio -H 0.0.0.0 -p 8888

ontouchstart · 2026-03-18T23:05:13 1773875113

Daniel, it works now. Thanks for the hard work!

https://gist.github.com/ontouchstart/532312fcba59aec3ce7f6aa...

ontouchstart · 2026-03-18T18:24:15 1773858255

Thank you Daniel.

Here is the error message on my machine:

https://gist.github.com/ontouchstart/86ca3cbd8b6b61fa0aeec75...

It seems we might need more instructions on how to set up python (via uv) in vanilla MacOS.

ontouchstart · 2026-03-18T19:40:30 1773862830

I have been updating the gist above and will stop this

``` ../scipy/meson.build:274:9: ERROR: Dependency lookup for OpenBLAS with method 'pkgconfig' failed: Pkg-config for machine host machine not found. Giving up. ```

Too much work.

danielhanchen · 2026-03-18T01:57:06 1773799026

Glad it was helpful!

danielhanchen · 2026-03-18T01:55:57 1773798957

Oh will check and fix - thanks

danielhanchen · 2026-03-18T01:55:44 1773798944

Hey! Our primary objective for now is to provide the open source community with cool and useful tooling - we found closed source to be much more popular because of better tooling!

We have much much in the pipeline!!

brainless · 2026-03-18T02:07:43 1773799663

Thanks! How do you earn or keep yourself afloat? I really like what you guys are doing. And similar orgs. I am personally doing the same, full-time. But I am worried when I will run out of personal savings.

vessenes · 2026-03-19T15:55:15 1773935715

I've been wondering this since they started it, mostly as a concern they stay afloat. Since Daniel does the work of ten, it seems like their value:cost ratio is world-class at the very least.

With the studio release, it seems to like they could be on the path to just bootstrapping a unicorn or a 10x corn or whatever that's called, which is super interesting. Anyway, his refusal to go into details reassures me, sounds like things are fine, and they're shipping. Vai com dios

richardw · 2026-03-18T10:59:30 1773831570

Daniel is a very impressive guy. Well within the realm of “fund the people not the idea” that YC seems to do. Got a few bucks from them and probably earning from collaborations etc. Odds of them not figuring out a business model seem slim.

https://www.ycombinator.com/companies/unsloth-ai

sowbug · 2026-03-18T15:08:14 1773846494

From comments elsewhere in this thread, it sounds like Unsloth could also be getting some decent consulting revenue from larger companies.

reactordev · 2026-03-18T15:51:01 1773849061

The opportunity here is HUUUUGGGEEEE!!!

Companies have no idea what they are doing, they know they need it, they know they want it, engineers want it, they don’t have it in their ecosystem so this is a perfect opportunity to come in with a professional services play. We got you on inference training/running, your models, all that, just focus on your business. Pair that with huggingface’s storage and it’s a win/win.

zokier · 2026-03-18T17:05:10 1773853510

Investments are not income

csomar · 2026-03-18T03:08:01 1773803281

You didn't answer the parent question.

segmondy · 2026-03-18T11:58:54 1773835134

They don't owe anyone an answer.

zokier · 2026-03-18T17:04:00 1773853440

But if they want to attract users, like they seem to do, then answering would go long way.

kartaka83838 · 2026-03-18T05:53:41 1773813221

that doesnt sound reassuring?

danielhanchen · 2026-03-17T22:44:13 1773787453

Thanks! We do have normal AMD support for Unsloth but yes the UI doesn't support it just yet! Will keep you posted!

MrDrMcCoy · 2026-03-18T16:38:33 1773851913

What does "normal AMD support" mean here? I was completely unable to get it working on my Ryzen AI 9700 XT. I had to munge the versions in the requirements to get libraries compatible with recent enough ROCm, and it didn't go well at all. My last attempt was a couple weeks before studio was announced.

danielhanchen · 2026-03-17T22:42:34 1773787354

Hey will check ASAP and fix - sorry about that

danielhanchen · 2026-03-17T22:33:51 1773786831

Actually the opposite haha- more than 50% of our audience comes from large organizations eg Meta, NASA, the UN, Walmart, Spotify, AWS, Google, and the list goes on!

danielhanchen · 2026-03-17T22:32:09 1773786729

You would be surprised! Nearly every Fortune 500 company has utilized either our RL fine-tuning package or used our quants and models - the UI was primarily a culmination of pain points folks had when doing either training or inference!

We're complimentary to LM Studio - they have a great tool as well!

TheTaytay · 2026-03-17T23:54:23 1773791663

I don’t know why this is being downvoted. Danielhanchen is legit, and unsloth was early to the fine-tuning on a budget party.

danielhanchen · 2026-03-18T01:57:21 1773799041

Haha no worries at all :)

HN For You