For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more o_____________o's commentsregister

I would pay for a simple, competent anything-to-markdown API. Something that could convert PDFs to high quality markdown with tables, etc. I'm using Document AI from Google right now and the ergonomics are awful.


If you just need to convert the files have you thought about using Zamzar (https://dev.zamzar.com/)?

We have a file conversion API that supports DOC/DOCX/ODT/PDF/TEX to Markdown conversion in one line of cURL (or you programming language of choice).

(Disclaimer: I'm the product lead for the Zamzar API).


Thanks I'll check it out. What do you do with PDFs that lock text in images, are you using ML/OCR? And as mentioned, tables?


Currently OCR support is limited to PDF > TXT conversion but we're hoping to add support for other output formats at some point. Feel free to shoot me an email at chris [at] zamzar [dot] com if you'd like to chat further.


ABBYY reader is a fine human-in-the-loop solution. It can at least output from PDF to ePub, and you can imagine going to markdown from there.


https://pd3f.com/ is a good PDF converter for the use-cases I tested (academic papers).


Anything to markdown is broader than part to markdown

Since pdfs are created so many different ways, do you have some examples and links of the pdfs that are awful?


There are two Javascript alternatives, but how robust the development is remains to be seen:

https://github.com/hwchase17/langchainjs

https://github.com/cfortuner/promptable


Could you expand on what you think is the state of the art and direction we should be heading in?


> Some people are so in touch with themselves they can tell they are getting sick before they have any measurable symptoms. Other people go months or years without knowing something is seriously wrong.

And many people derive imagined diagnoses from their anxieties, then come up with confident narratives based on "researching" online. Most of us are terrible at self-diagnosis, doubly at determining the causality to wherever they've arrived. There's a reason the double blind standard was a critical innovation.


I was going to test this out on an existing repo, but it wasn't clear from the outset that this is next.js only. I have a Create React App repo (with MUI) I'd try this on.


Yes today the bots assume you are connecting to a Next.js 13 app. Bad assumption I know! I am working to add repo a scanner (ChatGPT prompts!) to understand the general tech stack of a connected repo. This will be coming soon.


Why not just let the user select?



This is so stupid. His email address was at the bottom of every Dilbert strip. Was AIM doing email verification in 1999? The obvious answer is some trolls signed up for those accounts with his address.


> The OP, who is himself the frontman for the Decembrists

Wow, buried lede there. That should be in the title!


I read the article, pressed play on the song and thought "wow, the author did a great job at sounding like the decemberists", then scrolled back up and saw the author... doh!


When my pedantic keyboard warrior gears start turning, I think about the same xkcd a sibling commenter posted.

But I've been struggling with the recent tsunami of openly anti-intellectual, alt-everything pseudoscience in the US. I recently pleaded with my sister to read The Demon-Haunted World by Sagan.

There's something important in patient advocacy of truth.


fwiw I've come to similar conclusions, though I think Discourse has higher potential. For example, if you have the money, there are a couple of options to convert your instance into a native app. I'm doing some tests on a DigitalOcean instance and it doesn't seem that resource hungry so far.

The Flarum community seems much more friendly, DIY and open. Having to use PHP in 2023 is a little depressing, but it would presumably integrate more easily into other PHP projects like Wordpress.

I was hoping to use NodeBB for the pure js stack but it's missing some important features and the front end is greybearded out of using a modern framework because of "speed"[1]. You can use Postgres instead of Mongo, by the way:

> NodeBB Forum Software is powered by Node.js and supports either Redis, MongoDB, or a PostgreSQL database

[1]: https://community.nodebb.org/topic/14371/nodebb-reactjs/3?_=...


> For example, if you have the money, there are a couple of options to convert your instance into a native app.

Honestly, if I was going to pay for a forum, I'd probably go with (self-hosted) XenForo from the start. It has the most polished user experience of any forum software I have seen. I understand people sell custom mobile apps for it, too, though I haven't really looked.

I thought using NodeBB with PostgreSQL was not recommended for production. Looks like I was wrong. Thanks for telling me.


How do you like Discourse? I was looking at that vs Flarum over the last few days.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You