More

evilelectron · 2026-04-03T21:45:30 1775252730

This is the way.

I am doing something similar where I have a parser which looks for changes in documentation, matches them with the GraphQL schema and generates code using Apollo. In a nutshell it is a code generator written using Claude to generate more code and on failure goes back to Claude to fix the generator and asks a human for review.

evilelectron · 2026-04-03T21:19:13 1775251153

Hello again dot.

Look again at that dot. That's here. That's home. That's us. — Carl Sagan, Pale Blue Dot, 1994

evilelectron · 2026-04-02T17:09:47 1775149787

Daniel, your work is changing the world. More power to you.

I setup a pipeline for inference with OCR, full text search, embedding and summarization of land records dating back 1800s. All powered by the GGUF's you generate and llama.cpp. People are so excited that they can now search the records in multiple languages that a 1 minute wait to process the document seems nothing. Thank you!

danielhanchen · 2026-04-02T17:10:51 1775149851

Oh appreciate it!

Oh nice! That sounds fantastic! I hope Gemma-4 will make it even better! The small ones 2B and 4B are shockingly good haha!

qingcharles · 2026-04-04T04:53:39 1775278419

Just switched from 3.1 Flash Lite to Gemma-4 31B on the AI Studio API since there is a generous 1500/day on non-billed projects. It's doing fantastic.

polishdude20 · 2026-04-02T18:19:17 1775153957

Hey in really interested in your pipeline techniques. I've got some pdfs I need to get processed but processing them in the cloud with big providers requires redaction.

Wondering if a local model or a self hosted one would work just as well.

evilelectron · 2026-04-02T20:01:56 1775160116

I run llama.cpp with Qwen3-VL-8B-Instruct-Q4_K_S.gguf with mmproj-F16.gguf for OCR and translation. I also run llama.cpp with Qwen3-Embedding-0.6B-GGUF for embeddings. Drupal 11 with ai_provider_ollama and custom provider ai_provider_llama (heavily derived from ai_provider_ollama) with PostreSQL and pgvector.

People on site scan the documents and upload them for archival. The directory monitor looks for new files in the archive directories and once a new file is available, it is uploaded to Drupal. Once a new content is created in Drupal, Drupal triggers the translation and embedding process through llama.cpp. Qwen3-VL-8B is also used for chat and RAG. Client is familiar with Drupal and CMS in general and wanted to stay in a similar environment. If you are starting new I would recommend looking at docling.

lwhi · 2026-04-03T06:50:48 1775199048

Are you linking any of the processes using the Drupal AI module suite?

evilelectron · 2026-04-03T14:25:59 1775226359

Yes, they are all linked using Drupal's AI modules. I have an OpenCV application that removes the old paper look, enhances the contrast and fixes the orientation of the images before they hit llama.cpp for OCR and translation.

chrisweekly · 2026-04-02T20:13:48 1775160828

Disclaimer: I'm an AI novice relative to many here. FWIW last wknd I spent a couple hours setting up self-hosted n8n with ollama and gemma3:4b [EDIT: not Qwen-3.5], using PDF content extraction for my PoC. 100% local workflow, no runtime dependency on cloud providers. I doubt it'd scale very well (macbook air m4, measly 16GB RAM), but it works as intended.

patrickk · 2026-04-03T06:33:01 1775197981

For those who wish to do OCR on photos, like receipts, or PDFs or anything really, Paperless-NGX works amazingly well and runs on a potato.

polishdude20 · 2026-04-02T20:25:18 1775161518

How do you extract the content? OCR? Pdf to text then feed into qwen?

I tried something similar where I needed a bunch of tables extracted from the pdf over like 40 pages. It was crazy slow on my MacBook and innacurate

philipkglass · 2026-04-02T20:47:24 1775162844

If you have a basic ARM MacBook, GLM-OCR is the best single model I have found for OCR with good table extraction/formatting. It's a compact 0.9b parameter model, so it'll run on systems with only 8 GB of RAM.

https://github.com/zai-org/GLM-OCR

Use mlx-vlm for inference:

https://github.com/zai-org/GLM-OCR/blob/main/examples/mlx-de...

Then you can run a single command to process your PDF:

  glmocr parse example.pdf

  Loading images: example.pdf
  Found 1 file(s)
  Starting Pipeline...
  Pipeline started!
  GLM-OCR initialized in self-hosted mode
  Using Pipeline (enable_layout=true)...

  === Parsing: example.pdf (1/1) ===

My test document contains scanned pages from a law textbook. It's two columns of text with a lot of footnotes. It took 60 seconds to process 5 pages on a MBP with M4 Max chip.

After it's done, you'll have a directory output/example/ that contains .md and .json files. The .md file will contain a markdown rendition of the complete document. The .json file will contain individual labeled regions from the document along with their transcriptions. If you get all the JSON objects with

  "label": "table"

from the JSON file, you can get an HTML-formatted table from each "content" section of these objects.

It might still be inaccurate -- I don't know how challenging your original tables are -- but it shouldn't be terribly slow. The tables it produced for me were good.

I have also built more complex work flows that use a mixture of OCR-specialized models and general purpose VLM models like Qwen 3.5, along with software to coordinate and reconcile operations, but GLM-OCR by itself is the best first thing to try locally.

polishdude20 · 2026-04-02T21:56:39 1775166999

Thanks! Just tried it on a 40 page pdf. Seems to work for single images but the large pdf gives me connection timeouts

philipkglass · 2026-04-02T22:04:54 1775167494

I also get connection timeouts on larger documents, but it automatically retries and completes. All the pages are processed when I'm done. However, I'm using the Python client SDK for larger documents rather than the basic glmocr command line tool. I'm not sure if that makes a difference.

polishdude20 · 2026-04-03T04:52:43 1775191963

Yeah looks like the cli also retries as well. I was able to get it working using a higher timeout.

davidbjaffe · 2026-04-03T14:25:04 1775226304

Cool! For GLM-OCR, do you use "Option 2: Self-host with vLLM / SGLang" and in that case, am I correct that there is no internet connection involved and hence connection timeouts would be avoided entirely?

philipkglass · 2026-04-03T14:54:33 1775228073

When you self-host, there's still a client/server relationship between your self-hosted inference server and the client that manages the processing of individual pages. You can get timeouts depending on the configured timeouts, the speed of your inference server, and the complexity of the pages you're processing. But you can let the client retry and/or raise the initial timeout limit if you keep running into timeouts.

That said, this is already a small and fast model when hosted via MLX on macOS. If you run the inference server with a recent NVidia GPU and vLLM on Linux it should be significantly faster. The big advantage with vLLM for OCR models is its continuous batching capability. Using other OCR models that I couldn't self-host on macOS, like DeepSeek 2 OCR or Chandra 2, vLLM gave dramatic throughput improvements on big documents via continuous batching if I process 8-10 pages at a time. This is with a single 4090 GPU.

chrisweekly · 2026-04-02T21:40:08 1775166008

1. Correction: I'd planned to use Qwen-3.5 but ended up using gemma3:4b.

2. The n8n workflow passes a given binary pdf to gemma, which (based on a detailed prompt) analyzes it and produces JSON output.

See https://github.com/LinkedInLearning/build-with-ai-running-lo... if you want more details. :)

tehologist · 2026-04-03T04:27:11 1775190431

Python pdftools to convert to images and tesseract to ocr them to text files. Fast free and can run on CPU.

jorl17 · 2026-04-02T19:22:02 1775157722

Seconded, would also love to hear your story if you would be willing

Breza · 2026-04-03T14:41:35 1775227295

I'm very active in family history and this kind of project is massively helpful, thank you

wok4899 · 2026-04-04T11:51:18 1775303478

This is a very interesting project. If it's publicly available, would you mind sharing it? I would love to understand how it works.

Ps: found your other comments, thanks.

irishcoffee · 2026-04-03T14:23:02 1775226182

> your work is changing the world

I realize this may have been hyperbole, but it sure isn't changing the world.

a96 · 2026-04-05T10:59:07 1775386747

For relatively small values of changing or world, it sure is.

In the world of local models, Unsloth is one of the most significant projects there is.

evilelectron · 2026-01-01T03:55:36 1767239736

Keep in mind for next time, the date you drop the mail in your local post office might not be the date it is postmarked. Impacts voting, bank fees, IRS and more.

evilelectron · on June 24, 2022

Look up JOBS act (https://www.sec.gov/spotlight/jobs-act.shtml) With a strong business plan, this could be a viable and great starting point. Good luck!

evilelectron · on May 29, 2021

Sad news indeed :(

evilelectron · on May 10, 2021

How about Pi-KVM (https://pikvm.org/)? Secure, flexible and extendible.

gnufx · on May 11, 2021

A KVM system is not a BMC. Running a cluster (even a fairly small one) you probably want power control, e.g. with powerman, serial console availability, e.g. with conman, metrics, e.g. via freeipmi and some monitoring solution, and probably logs for alerts. Control and monitoring need to be out of band.

(Powerman, conman, and freeipmi come from Livermore for use with serious HPC systems.)

evilelectron · on April 15, 2021

wxWidgets is a wrapper over the native GUI toolkit. It allows you to write a OS independent GUI code and because it is a wrapper you get most of the benefits of native toolkit like dark mode.

formerly_proven · on April 15, 2021

> most of the benefits of native toolkit like dark mode.

On which OS does dark mode work with wx? Certainly not with Windows... (which, as VZ points out, isn't wx's fault, it's just that no APIs officially exist for this - explorer.exe has dark mode, but that uses completely private, undocumented APIs).

VZ · on April 15, 2021

Maybe this has something to do with the fact that Windows doesn't have dark mode for the desktop applications in the first place.

evilelectron · on Aug 8, 2020

Add _nomap to your SSID to stop Google from using your access point for location services.

https://support.google.com/maps/answer/1725632?hl=en

inetknght · on Aug 8, 2020

The fact that you have to change your SSID to opt out of third parties using it is... shady at best. What happens when two competing third parties have conflicting name requirements for you to opt-out?

_abox · on Aug 9, 2020

And how do you know they actually obey it? :/

Knowing Google it will still go somewhere.

JadeNB · on Aug 8, 2020

> What happens when two competing third parties have conflicting name requirements for you to opt-out?

No worries! One or the other of them will change its mind as soon as any significant number of people start using the option.

evilelectron · on July 23, 2020

Why is Microsoft not interested? Or they are and it has not been reported?

ARM should fit well with them, plus it would give them a way to enter the mobile space again, this time by owning the IP and not making hardware.

HN For You