More

pierre · 2026-03-30T16:45:28 1774889128

A fast and open source spatial text parser

pierre · 2026-03-20T02:25:28 1773973528

A new document (including PDF) parser that outperform traditional tool such as PyPDF or MuTools.

Link to open source repo: https://github.com/run-llama/liteparse

pierre · 2026-03-19T07:39:44 1773905984

Contributor here, happy to answer any questions!

pierre · 2025-10-24T07:15:48 1761290148

Demo: https://olmocr.allenai.org/ Paper: https://arxiv.org/abs/2510.19817

pierre · 2025-07-25T13:44:24 1753451064

Main issue is that token are not equivalent across provider / models. With huge disparity inside provider beyond the tokenizer model:

- An image will take 10x token on gpt-4o-mini vs gpt-4.

- On gemini 2.5 pro output token are token except if you are using structure output, then all character are count as a token each for billing.

- ...

Having the price per token is nice, but what is really needed is to know how much a given query / answer will cost you, as not all token are equals.

alexellman · 2025-07-25T14:10:28 1753452628

yeah I am going to add an experiment that runs everyday and the cost of that will be a column on the table. It will be something like summarize this article in 200 words and every model gets the same prompt + article

bigiain · 2025-07-26T02:11:55 1753495915

For me, and I suspect a lot of other HN readers, a comparison/benchmark on a coding task would be more useful. Something small enough that you can affordably run it every day across a reasonable range of coding focused models, but non trivial enough to be representative of day to day AI assisted coding.

One other idea - for people spending $20 or $200/month for AI coding tools, a monitoring service that tracks and alerts on detected pricing changes could be something worth paying for. I'd definitely subscribe at $5/month for something like that, and I'd consider paying more, possibly even talking work into paying $20 or $30 per month.

BonoboIO · 2025-07-25T13:48:01 1753451281

On gemini 2.5 pro output token are token except if you are using structure output, then all character are count as a token each for billing.

Can you elaborate this? I don’t quite understand the difference.

rsanek · 2025-07-25T22:52:55 1753483975

I hadn't heard of this before either and can't find anything to support it on the pricing page.

https://ai.google.dev/gemini-api/docs/tokens

pierre · on March 3, 2025

LlamaIndex is building a platform for AI agents that can find information, synthesize insights, generate reports, and take actions over the most complex enterprise data.

We are seeking an exceptional engineer to join our growing LlamaParse team. Will work at the intersection of document processing, machine learning, and software engineering to push the boundaries of what's possible in document understanding. As a key member of a focused team, will have significant impact on our product's direction and technical architecture.

We are also hiring for a range of other roles, see our career page:

- Backend Software Engineer

- Forward Deploy Engineer

- Founding AI Engineer

- Open Source Engineer Python

- Founding Lead Product Manager

- Platform Engineer

- Senior Developer Relation Engineer

- Senior / Staff Backend Engineer

- Product Marketing Manager

nathan_douglas · on March 3, 2025

Hi Pierre, I see that the Platform Engineer position (which probably matches me most) says it's Hybrid. I'm very interested, but I live in Ohio. I understand sometimes things get clicked on accident, and just wanted to know if there might be an issue with this listing or if it's truly hybrid and the one you posted is remote, etc. Don't want to gum up the works :)

shadoweos · on March 6, 2025

You mention Product Manager but the role isn't mentioned on the career page.

pierre · on Feb 28, 2025

If you want to try agentic parsing we added support for sonnet-3.7 agentic parse and gemini 2.0 in llamaParse. cloud.llamaindex.ai/parse (select advanced options / parse with agent then a model)

However this come at a high cost in token and latency, but result in way better parse quality. Hopefully with new model this can be improved.

pierre · on Jan 3, 2025

This is a nice UI for end users, however it seems to be a seems wrapper on top of mutool, which is distributed as AGPL. If you want to process PDF locally, legally and safely you should use their CLI instead.

cess11 · on Jan 3, 2025

How did you figure that out? Couldn't it be Poppler as well?

pierre · on Jan 3, 2025

I read the output header, and see the Artifex (mutools / gs team) headers

cess11 · on Jan 3, 2025

Alrighty, that's a smoking gun.

pierre · on Sept 22, 2024

Parsing docs using LVM is the way forward (also see OCR2 paper released last week, people are having ablot of success parsing with fine tunned Qwen2).

The hard part is to prevent the model ignoring some part of the page and halucinations (see some of the gpt4o sample here like the xanax notice:https://www.llamaindex.ai/blog/introducing-llamaparse-premiu...)

However this model will get better and we may soon have a good pdf to md model.

fzysingularity · on Sept 22, 2024

We’ve been doing exactly this by doubling-down on VLMs (https://vlm.run)

- VLMs are way better at handling layout and context where OCR systems fail miserably

- VLMs read documents like humans do, which makes dealing with special layouts like bullets, tables, charts, footnotes much more tractable with a singular approach rather than have to special case a whole bunch of OCR + post-processing

- VLMs are definitely more expensive, but can be specialized and distilled for accurate and cost effective inference

In general, I think vision + LLMs can be trained to explicitly to “extract” information and avoid reasoning/hallucinating about the text. The reasoning can be another module altogether.

yigitkonur35 · on Sept 22, 2024

I did a ton of Googling before writing this code, but I couldn't find you guys anywhere. If I had, I'd have definitely used your stuff. You might want to think about running some small-scale Google Ads campaigns. They could be especially effective if you target people searching for both LLM and OCR together. Great product, congratz!

fzysingularity · on Sept 22, 2024

Hey, thanks! DM me if you want to test it out (sudeep@vlm.run).

Agreed on SEO - we’re redoing our landing page and searchability. We recently rebranded, hence the lack of direct search hits for LLM / OCR.

authorfly · on Sept 22, 2024

What about combining old school OCR with GPT visual OCR?

If your old school OCR output has output that is not present in the visual one, but is coherent (e.g. english sentences), you could get it back and slot it into the missing place from the visual output.

yigitkonur35 · on Sept 22, 2024

You're absolutely right. I use PDFTron (through CloudCovert) for full document OCR, but for pages with fewer than 100 characters, I switch to this API. It's a great combo – I get the solid OCR performance of SolidDocument for most content, but I can also handle tricky stuff like stats, old-fashioned text, or handwriting that regular OCR struggles with. That's why I added page numbers upfront.

fkilaiwi · on Sept 22, 2024

what paper are you referring to?

perrywky · on Sept 23, 2024

I guess this: https://arxiv.org/html/2409.01704v1

pierre · on July 26, 2024

yes, you can pass an array of path to the extract function.

HN For You