For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | agentcoops's favoritesregister

part of the problem is that most people don't own a pc or personal laptop - they use their phone and apps. None of my friends (35 years +) use laptops other than for work and openly say how much they have regressed technically. Some of these guys grew up with the internet in the early 00's and would be setting up switches for lan partys, using torrents and usenxt, limewire etc. These days they can barely open up microsoft word - but on instagram/twitter they're all over it. Sad really. I would always reach for my laptop first before my phone and I tend to very rarely visit social media sites (other than reddit) on laptop/desktop. I use glance - https://github.com/glanceapp/glance to parse my rss feeds - it's pretty good.

This is interesting because it's a case of "AI taking jobs" but not in the way people normally mean; these massive layoffs are happening not because AI is doing the work they used to do but because capex is sucking all of the operating money out of everywhere. The companies may be forced to replace some of the laid-off employees with AI (as far as possible) but that's an effect not a cause.

I'm running qwen 3.5 397b on very standard hardware. Just use the unsloth quants, they're great. I get like 20t/s or something.

It's super not a publicity stunt, qwen 3.5 is the base of the best local models out there IMO.


Thinking / reasoning + multimodal + tool calling.

We made some quants at https://huggingface.co/collections/unsloth/gemma-4 for folks to run them - they work really well!

Guide for those interested: https://unsloth.ai/docs/models/gemma-4

Also note to use temperature = 1.0, top_p = 0.95, top_k = 64 and the EOS is "<turn|>". "<|channel>thought\n" is also used for the thinking trace!


If you have a basic ARM MacBook, GLM-OCR is the best single model I have found for OCR with good table extraction/formatting. It's a compact 0.9b parameter model, so it'll run on systems with only 8 GB of RAM.

https://github.com/zai-org/GLM-OCR

Use mlx-vlm for inference:

https://github.com/zai-org/GLM-OCR/blob/main/examples/mlx-de...

Then you can run a single command to process your PDF:

  glmocr parse example.pdf

  Loading images: example.pdf
  Found 1 file(s)
  Starting Pipeline...
  Pipeline started!
  GLM-OCR initialized in self-hosted mode
  Using Pipeline (enable_layout=true)...

  === Parsing: example.pdf (1/1) ===
My test document contains scanned pages from a law textbook. It's two columns of text with a lot of footnotes. It took 60 seconds to process 5 pages on a MBP with M4 Max chip.

After it's done, you'll have a directory output/example/ that contains .md and .json files. The .md file will contain a markdown rendition of the complete document. The .json file will contain individual labeled regions from the document along with their transcriptions. If you get all the JSON objects with

  "label": "table"
from the JSON file, you can get an HTML-formatted table from each "content" section of these objects.

It might still be inaccurate -- I don't know how challenging your original tables are -- but it shouldn't be terribly slow. The tables it produced for me were good.

I have also built more complex work flows that use a mixture of OCR-specialized models and general purpose VLM models like Qwen 3.5, along with software to coordinate and reconcile operations, but GLM-OCR by itself is the best first thing to try locally.


In chapter 11 of All Quiet on the Western Front Paul and his unit find an abandoned food cache in the middle of no mans land. Instead of secreting away the food back to their lines where they will have to share it, they decide to just cook and eat it right then and there. But a spotter plane from the allies sees the smoke and then begins shelling their position. Cue a terrifying, if hilarious, scene where the soldiers try and cook pancakes as shells explode around them. Paul, as the last to leave, takes his pancakes on a plate and dashes out, timing his escape between bursts, and just barely making it back to the German trenches. Its a rare comic scene in an otherwise horrific and very real look at WW1.

The scene in the book is just so familiar to the lines in Ukraine these days, nearly a hundred years later. Instead of spotter planes near the dawn of aviation, we have satellites and drones (similarly quite new in the role). Instead of just shells and fuzing experts, we have FPV drones and much more sophisticated shells. Instead of buddies from the same towns all huddled together in cold muddy holes, we have deracinated units spread far and wide in laying in fear of thermal imaging. This results in a no mans land again, but a dozen kilometers wide instead of a few hundred meters wide, and somehow more psychologically damaging.

My point is that absent any tech that will miraculously be invented and deployed widely in the new few weeks, the Iran war, if it should be a ground one, is going to be just like Ukraine is today, which is somehow a worse version of trench warfare.

Even casual Victoria II players know that WW1 is essentially the final boss of the game. And the 'lesson' of Vicky II is essentialy: Do not fight WW1, it ruins Everything.

To be clear: The US is choosing to fight a worse version of WW1 without even a stated (or likely even known) condition of victory. We're about to send many thousands boys to suffer and die for not 'literally nothing', but actually literally nothing.


PSA: npm/bun/pnpm/uv now all support setting a minimum release age for packages.

I also have `ignore-scripts=true` in my ~/.npmrc. Based on the analysis, that alone would have mitigated the vulnerability. bun and pnpm do not execute lifecycle scripts by default.

Here's how to set global configs to set min release age to 7 days:

  ~/.config/uv/uv.toml
  exclude-newer = "7 days"

  ~/.npmrc
  min-release-age=7 # days
  ignore-scripts=true
  
  ~/Library/Preferences/pnpm/rc
  minimum-release-age=10080 # minutes
  
  ~/.bunfig.toml
  [install]
  minimumReleaseAge = 604800 # seconds
(Side note, it's wild that npm, bun, and pnpm have all decided to use different time units for this configuration.)

If you're developing with LLM agents, you should also update your AGENTS.md/CLAUDE.md file with some guidance on how to handle failures stemming from this config as they will cause the agent to unproductively spin its wheels.


Topical. My hobby project this week (0) has been hyper-optimizing microgpt for M5's CPU cores (and comparing to MLX performance). Wonder if anything changes under the regime I've been chasing with these new chips.

0: https://entrpi.github.io/eemicrogpt/


> For intelligence activities, any handling of private information will comply with the Fourth Amendment, the National Security Act of 1947 and the Foreign Intelligence and Surveillance Act of 1978, Executive Order 12333, and applicable DoD directives requiring a defined foreign intelligence purpose. The AI System shall not be used for unconstrained monitoring of U.S. persons’ private information as consistent with these authorities. The system shall also not be used for domestic law-enforcement activities except as permitted by the Posse Comitatus Act and other applicable law.

My reading of this is that OpenAI's contract with the Pentagon only prohibits mass surveillance of US citizens to the extent that that surveillance is already prohibited by law. For example, I believe this implies that the DoW can procure data on US citizens en masse from private companies - including, e.g., granular location and financial transaction data - and apply OpenAI's tools to that data to surveil and otherwise target US citizens at scale. As I understand it, this was not the case with Anthropic's contract.

If I'm right, this is abhorrent. However, I've already jumped to a lot of incorrect conclusions in the last few days, so I'm doing my best to withhold judgment for now, and holding out hope for a plausible competing explanation.

(Disclosure, I'm a former OpenAI employee and current shareholder.)


Even a16z is walking this back now. I wrote about why the “vibe code everything” thesis doesn’t hold up in two recent pieces:

(1) https://philippdubach.com/posts/the-saaspocalypse-paradox/

(2) https://philippdubach.com/posts/the-impossible-backhand/

Acharya’s framing is different from mine (he’s talking book on software stocks) but the conclusion is the same: the “innovation bazooka” pointed at rebuilding payroll is a bad allocation of resources. Benedict Evans called me out on LinkedIn for this (https://philippdubach.com/posts/is-ai-really-eating-the-worl...) take, which I take as a sign the argument is landing..


> Amazon's core business does not make sense. Despite being so massive, their retail operation makes almost no money.

Net profit margins for retail are only around 3% across the industry.

Amazon isn't actually doing anything unusual in that regard. Retail is just a very low profit margin business whether it's physical or online.

These numbers are always confusing to those of us in the tech world where SaaS net profit margins are always very high.


hi. i run "ocr" with dmenu on linux, that triggers maim where i make a visual selection. a push notification shows the body (nice indicator of a whiff), but also it's on my clipboard

  #!/usr/bin/env bash

  # requires: tesseract-ocr imagemagick maim xsel

  IMG=$(mktemp)
  trap "rm $IMG*" EXIT

  # --nodrag means click 2x
  maim -s --nodrag --quality=10 $IMG.png

  # should increase detection rate
  mogrify -modulate 100,0 -resize 400% $IMG.png

  tesseract $IMG.png $IMG &>/dev/null
  cat $IMG.txt | xsel -bi
  notify-send "Text copied" "$(cat $IMG.txt)"

  exit

Deepseek OCR is no longer state of the art. There are much better open source OCR models available now.

ocrarena.ai maintains a leaderboard, and a number of other open source options like dots [1] or olmOCR [2] rank higher.

[1] https://www.ocrarena.ai/compare/dots-ocr/deepseek-ocr

[2] https://www.ocrarena.ai/compare/olmocr-2/deepseek-ocr


Hi karhuton,

Thanks for your honest and thoughtful feedback.

Re: the features that you mentioned - these are definitely on my list. I thought that getting the product out there sooner was preferable to waiting longer at this stage. But I fully resonate with you, and I’m working on releasing them shortly.

Re: pricing, this is something I gave a lot of thought to, and I came to the conclusion that instead of participating in a race to the bottom, I prefer that the paying customers really see value in my product. I would like to offer a more generous free plan and find the right niche in the design field for those paying customers.

With this in mind, here’s a 50% discount code for any plan, for this community and anyone who would like to support this project: HN50

Re: the mailing list, it’s a great idea. I’ll implement a subscription list soon for the people who are interested. In the meantime, you can send me an email at contact@vecti.com with your email, and you will be the first person to get notified of the product progress.


(original author of that article here)

This is a blast from the past! And indeed, I use Eio these days to do direct-style IO; you can see an example of a HTTP/1.1 parser done in that style, and there's nary a monad in sight! https://tangled.org/anil.recoil.org/ocaml-requests/blob/main...

OCaml's effects are similar to thethe delimcc hack used in the article, except that it's nicely integrated into OCaml 5 and very high performance; basically a stack switch: https://anil.recoil.org/papers/2021-pldi-retroeff.pdf


A paper on the same topic: On the Expressiveness of Softmax Attention: A Recurrent Neural Network Perspective, Gabriel Mongaras, Eric C. Larson, https://arxiv.org/abs/2507.23632

Video presentation if someone prefers it: https://www.youtube.com/watch?v=PN3nYBowSvM

Linear attention is a first-degree approximation of Softmax attention, and model performance gets better as you increase the degree of the Taylor approximation.

I'm thinking about adapting an existing model to Taylor-approximated attention. I think it should be possible with some model surgery and rehabilitation training.


I think this is a good time for a shameless plug. The last 2 month or so I am working on my own project [1] for learning more characters. I have made a tool with powerful search function, training mode, and other useful features, such as displaying plots that show you your progress and whether you are reaching your daily training goal, and the ability to save searches, a la Thunderbird saved filters. It is written in Python and oldschool tkinter with custom widgets for a somewhat more modern and capable feel. It is very configurable. Though currently configuring it means touching a JSON file, as I have not yet bothered writing GUI for that.

I am mostly developing this for myself, to have the perfect tool for me, but I dare say, that I have not seen anything comparable and that I let my 10y+ experience in learning Chinese influence my design decisions. Oh, and it is free /libre software of course (AGPL). It comes with an ever improving vocabulary file that has tons of metadata about words, their usage, how to memorize them, etc. under ODbL (open database license).

[1]: https://codeberg.org/ZelphirKaltstahl/xiaolong-dictionary


Richard Cook #18 (and #10) strikes again!

https://how.complexsystems.fail/#18

It'd be fun to read more about how you all procedurally respond to this (but maybe this is just a fixation of mine lately). Like are you tabletopping this scenario, are teams building out runbooks for how to quickly resolve this, what's the balancing test for "this needs a functional change to how our distributed systems work" vs. "instead of layering additional complexity on, we should just have a process for quickly and maybe even speculatively restoring this part of the system to a known good state in an outage".


Emacs also has Org-mode and org-babel, which can work a lot like a Jupyter notebook, and can even talk to jupyter kernels. I do a lot in Emacs, especially now that I'm comfortable with GPTel.

I open a poorly aligned, pixelated PDF scan of a 100+ year old Latin textbook in Emacs, mark a start page, end page, and Emacs lisp code shells out to qpdf to create a new smaller pdf from my page range to /tmp, and then adds the resulting PDF to my LLM context. Then my code calls gptel-request with a custom prompt and I get an async elisp callback with the OCR'd PDF now in Emacs' org-mode format, complete with italics, bold, nicely formatted tables, and with all the right macrons over the vowels, which I toss into a scratch buffer. Now that the chapter from my textbook in a markup format, I can select a word, immediately pop up a Latin-to-English dictionary entry or select a whole sentence to hand to an LLM to analyze with a full grammatical breakdown while I'm doing my homework exercises. This 1970s vintage text editor is also a futuristic language learning platform, it blows my mind.



Relatedly, I've been working on discovering search ranking algorithms. Starting with a primitive, what can an agent do to generate search ranking + query understanding code that best optimizes a ground truth:

https://softwaredoug.com/blog/2025/10/19/agentic-code-genera...


I'm a tedious broken record about this (among many other things) but if you haven't read this Richard Cook piece, I strongly recommend you stop reading this postmortem and go read Cook's piece first. It won't take you long. It's the single best piece of writing about this topic I have ever read and I think the piece of technical writing that has done the most to change my thinking:

https://how.complexsystems.fail/

You can literally check off the things from Cook's piece that apply directly here. Also: when I wrote this comment, most of the thread was about root-causing the DNS thing that happened, which I don't think is the big story behind this outage. (Cook rejects the whole idea of a "root cause", and I'm pretty sure he's dead on right about why.)


`rust-GPU` and `rust-CUDA` fall in the category to me of "Rust is great, let's build the X ecosystem in rust". Meanwhile, it's been in a broken and dormant state for years. There was a leadership/dev change recently, (Are the creators of VectorWare the creators of Rust-CUDA, or the new leaders?), and more activity. I haven't tried since.

If you have a Rust application or library and want to use the GPU, these approaches are comparatively smooth:

  - WGPU: Great for 3D graphics
  - Ash and other Vulkan bindings: Low-level graphics bindings
  - Cudarc: Nice API for running CUDA kernels.
I am using WGPU and Cudarc for structural biology + molecular dynamics computations, and they work well.

Rust - CUDA feels like lots-of-PR, but not as good of a toolkit as these quieter alternatives. What would be cool for them to deliver, and I think is in their objectives: Cross-API abstractions, so you could, for example, write code that runs on Vulkan Compute in addition to CUDA.

Something else that would be cool: High-level bindings to cuFFT and vkFFT. You can FFI them currently, but that's not ideal. (Not too bad to impl though, if you're familiar with FFI syntax and the `cc` crate)


This is quite a good overview, and parts reflect well how things played out in language model research. It's certainly true that language models and deep learning were not considered particularly promising in NLP, which frustrated me greatly at the time since I knew otherwise!

However the article misses the first two LLMs entirely.

Radford cited CoVE, ELMo, and ULMFiT as the inspirations for GPT. ULMFiT (my paper with Sebastian Ruder) was the only one which actually fine-tuned the full language model for downstream tasks. https://thundergolfer.com/blog/the-first-llm

ULMFiT also pioneered the 3-stage approach of fine-tuning the language model using a causal LM objective and then fine-tuning that with a classification objective, which much later was used in GPT 3.5 instruct, and today is used pretty much everywhere.

The other major oversight in the article is that Dai and Le (2015) is missing -- that pre-dated even ULMFiT in fine-tuning a language model for downstream tasks, but they missed the key insight that a general purpose pretrained model using a large corpus was the critical first step.

It's also missing a key piece of the puzzle regarding attention and transformers: the memory networks paper recently had its 10th birthday and there's a nice writeup of its history here: https://x.com/tesatory/status/1911150652556026328?s=46

It came out about the same time as the Neural Turing Machines paper (https://arxiv.org/abs/1410.5401), covering similar territory -- both pioneered the idea of combining attention and memory in ways later incorporated into transformers.


> given that we don't have access to that data?

Actually, we do have some of that data. You can just look at the H-1B employer datahub, filter by Google (or whichever other employer you want to scrutinize), and then look at the crosstab.

https://www.uscis.gov/tools/reports-and-studies/h-1b-employe...

Looking at Google specifically, I observe two things:

1. The number of new approvals has gone down drastically since 2019 (2019: 2706, 2020: 1680, 2021: 1445, 2022: 1573, 2023: 1263, 2024: 1065, 2025: 1250). (These numbers were derived by summing up all the rows for a given year, but it's close enough to just look at the biggest number that represents HQ.)

Compared to the overall change in total employees as reported in the earnings calls (which was accelerating up through 2022 but then stagnated around 2025), we don't actually see much anything noteworthy.

2. Most approvals are either renewals ("Continuation Approval"), external hires who are just transferring their existing H-1B visas ("Change of Employer Approval"), and internal transfers ("Amended Approval").


I don’t know the post you’re referring to but I highly recommend How the Immune System Works by Lauren Sompayrac. It explains the interesting parts without getting bogged down in the details of every signalling pathway, but without dumbing things down too much.

> His books, many of which are annotated with margin comments,

I'm not saying that he did, but this along with being the right age to have read How to Read a Book by Mortimer J. Adler strongly suggest that he used that book to grasp a lot more of his books than most people can.

That book gives you a very good strategy for reading books that are beyond you normally. In the three years since I've read it I've managed to finish books that I couldn't read even when I was doing my PhD and it was my full time job to understand them.

The funny thing is that I only ran into that book when I was trying to figure out how to build knowledge graphs for complex documents using LLMs. Using multiple readings to create a summary of each chunk, then a graph of the connections between the chunks, then a glossary of all the terms and finally a critique of each chunk gave better than sota results for the documents I was working on.


Here is one path to learn Bayesian starting from basics, assuming modern R path with tidyverse (recommended):

First learn some basic probability theory: Peter K. Dunn (2024). The theory of distributions. https://bookdown.org/pkaldunn/DistTheory

Then frequentist statistics: Chester Ismay, Albert Y. Kim, and Arturo Valdivia - https://moderndive.com/v2/ Mine Çetinkaya-Rundel and Johanna Hardin - https://openintrostat.github.io/ims/

Finally Bayesian: Johnson, Ott, Dogucu - https://www.bayesrulesbook.com/ This is a great book, it will teach you everything from very basics to advanced hierachical bayesian modeling and all that by using reproducible code and stan/rstanarm

Once you master this, next level may be using brms and Solomon Kurz has done full Regression and Other Stories Book using tidyerse/brms. His knowledge of tidyverse and brms is impressive and demonstrated in his code. https://github.com/ASKurz/Working-through-Regression-and-oth...


This is much folk psychology with some correct affinities/"functions" that neuroscience has identified and studies.

For the skinny on where cognition really is at, here's Gyuri Buzsaki's short but sweet The Brain—Cognition Behavior Problem:

https://pmc.ncbi.nlm.nih.gov/articles/PMC7415918/


I'm at a conference at the moment so can't give a lengthy answer, but I'm the maintainer of virt-v2v, one large open source OCaml project (large if you include all the dependencies) which generates actual multi-millions in annual revenue, but is often overlooked in all this discussion of the OCaml ecosystem. Glad to talk by email some time.

[BTW we currently have open positions for two developers]


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You