For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | paldepind2's commentsregister

Sorry if this is a basic question, but what's you workflow for feeding the papers into the LLM and getting the implementation done? The coding agents that I've used are not able to read PDFs, so I've been wondering how to do it.


this is actually a great question - I just extract the text with PyPDF, but did a brief search on the functionality I'd like to have (convert math equations to LaTeX, extract images, reformat in markdown, extract data from charts) and it looks like there are a couple of promising Python libs like Docling and Marker.. I should really improve this part of my workflow.


after looking into it for a little while, Docling and Marker work pretty well but are very slow. I haven't found anything else that extracts math suitably. It takes 10+ minutes per pdf, so I'm going to run it on a batch of these papers overnight and create my own little gaussian splatting RAG database. It's really too bad PDF is so terrible.


I completely agree with the points in this article and have come to the same conclusion after using languages that default to unary curried functions.

> I'd also love to hear if you know any (dis)advantages of curried functions other than the ones mentioned.

I think it fundamentally boils down to the curried style being _implicit_ partial application, whereas a syntax for partial application is _explicit_. And as if often the case, being explicit is clearer. If you see something like

    let f = foobinade a b
in a curried language then you don't immediately know if `f` is the result of foobinading `a` and `b` or if `f` is `foobinade` partially applied to some of its arguments. Without currying you'd either write

    let f = foobinade(a, b)
or

    let f = foobinade(a, b, $) // (using the syntax in the blog post)
and now it's immediately explicitly clear which of the two cases we're in.

This clarity not only helps humans, it also help compilers give better error messages. In a curried languages, if a function is mistakenly applied to too few arguments then the compiler can't always immediately detect the error. For instance, if `foobinate` takes 3 arguments, then `let f = foobinade a b` doesn't give rise to any errors, whereas a compiler can immediately detect the error in `let f = foobinade(a, b)`.

A syntax for partial application offers the same practical benefits of currying without the downsides (albeit loosing some of the theoretical simplicity).


The functional programming take is that “the result of foobinade-ing an and b” IS “foobinade applied to two of its arguments”. The application is not some syntactic pun or homonym that can refer to two different meanings—those are the same meaning.


Let us postulate two functions. One is named foobinade, and it takes three arguments. The other is named foobinadd, and it only takes two arguments. (Yes, I know, shoot anybody who actually names things that way.)

When someone writes

  f = foobinade a b
  g = foobinadd c d
there is no confusion to the compiler. The problem is the reader. Unless you have the signatures of foobinade and foobinadd memorized, you have no way to tell that f is a curried function and g is an actual result.

Whereas with explicit syntax, the parentheses say what the author thinks they're doing, and the compiler will yell at them if they get it wrong.


> Unless you have the signatures of foobinade and foobinadd memorized, you have no way to tell that f is a curried function and g is an actual result.

Yes, but the exact FP idea here is that this distinction is meaningless; that curried functions are "actual results". Or rather, you never have a result that isn't a function; `0` and `lambda: 0` (in Python syntax) are the same thing.

It does, of course, turn out that for many people this isn't a natural way of thinking about things.


> Yes, but the exact FP idea here is that this distinction is meaningless; that curried functions are "actual results".

Everyone knows that. At least everyone who would click a post titled "A case against currying." The article's author clearly knows that too.

That's not the point. The point is that this distinction is very meaningful in practice, as many functions are only meant to be used in one way. It's extremely rare that you need to (printf "%d %d" foo). The extra freedom provided by currying is useful, but it should be opt-in.

Just because two things are fundamentally equivalent, it doesn't mean it's useless to distinguish them. Mathematics is the art of giving the same name to different things; and engineering is the art of giving different names to the same thing depending on the context.


> It's extremely rare that

Not when a language embraces currying fully and then you find that it’s used all the fucking time.

It’s really simple as that: a language makes the currying syntax easy, and programmers use it all the time; a language disallows currying or makes the currying syntax unwieldy, and programmers avoid it.


> It's extremely rare that you need to (printf "%d %d" foo)

I write stuff like `map (printf "%d %d" m) ns` all the time. I daresay I even do the map as a partial application, so double currying.


But arguably your intent would be much more clear with something like `map (printf "%d %d" m _) ns` or a lambda.

I don't think parent is saying that partial application is bad, far from it. But to a reader it is valuable information whether it's partial or full application.


Not really when reading `iter (printf %"d %d" m) ns`, I am likely to read it in three steps

  - `iter`: this is a side-effect on a collection
  - `(printf`: ok, this is just printing, I don't care about what is printed, let's skip to the `)`
  - ns: ok, this is the collection being printed
Notice that having a lambda or a partial application `_` will only add noise here.

> But to a reader it is valuable information whether it's partial or full application.

This can be a valuable information in some context, but in a functional language, functions are values. Thus a "partial application" (in term of closure construction) might be better read as a full application because the main type of concern in the current context is a functional type.


Fine, it's a regular type. It's still not the type I think it is. If it's an Int -> Int when I think it's an Int, that's still a problem, no matter how much Int -> Int is an "actual result".


Come on, just write

    let f :: Int = foobinade a b
And the compiler immediately tells you that you are wrong: your type annotation does not unify with compiler’s inferred type.

And if you think this is verbose, well many traditional imperative languages like C have no type deduction and you will need to provide a type for every variable anyways.


I spent the last three years on the receiving end of mass quantities of code written by people who knew what they were writing but didn't do an adequate job of communicate it to readers who didn't already know everything.

What you say is true. And it works, if you're the author and are having trouble keeping it all straight. It doesn't work if the author didn't do it and you are the reader, though.

And that's the more common case, for two reasons. First, code is read more often than it's written. Second, when you're the author, you probably already have it in your head how many parameters foobinade takes when you call it, but when you're the reader, you have to go consult the definition to find out.

But if I was willing to do it, I could go through and annotate the variables like that, and have the compiler tell me everything I got wrong. It would be tedious, but I could do it.


Doesn’t that just imply that your tooling is inadequate? In LINQPad (and, I assume VS though I have done it in a while), when you hover over a “var” declaration a tooltip tells you the actual type the compiler inferred.


If 0 and a function that always returns 0 are the same thing, does that make `lambda: lambda: 0` also the same? I suppose it must do, otherwise `0` and `lambda: 0` were not truly the same.


In a non-strict language without side-effects, having a function with no arguments does not make sense. Haskell doesn't even let you do that.

You can write a function that takes a single throw-away argument (eg 0 vs \ () -> 0) and, while the two have some slight differences at runtime, they're so close in practice that you almost never write functions taking a () argument in Haskell. (Which is very different from OCaml!)


Another way to make the point: when you write 0, which do you mean?

In a pure language like Haskell, 0-ary functions <==> constants


Yes, and that becomes more intuitive when you "un-curry" the nested lambdas into a single lamba with twice the number of arguments. The point is that the state of a constant does not depend whatsoever on the state of the (rest of the) world, how much ever of that state piles on.


It’s not at all clear or the same to the new reader of the code.


Sure—but that’s a property of the inferred types moreso than the mere application syntax. It can be hard to revisit or understand the type of JS or unannotated Python expressions, too—but unlike those cases, the unknown-to-the-reader type of the Haskell code will always be known on the compiler/LSP side.


and a few weeks down the line authors also turn back into new readers...


I have a lot more long term memory than that.


Well, I totally disagree with this. One of the main benefits of currying is the ability to chain function calls together. For example, in F# this is typically done with the |> operator:

    let result =
        input
            |> foobinade a b
            |> barbalyze c d
Or, if we really want to name our partial function before applying it, we can use the >> operator instead:

    let f = foobinade a b >> barbalyze c d
    let result = f input
Requiring an explicit "hole" for this defeats the purpose:

    let f = barbalyze(c, d, foobinade(a, b, $))
    let result = f(input)
Or, just as bad, you could give up on partial function application entirely and go with:

    let result = barbalyze(c, d, foobinade(a, b, input))
Either way, I hope that gives everyone the same "ick" it gives me.


You can still do this though:

  let result = (barbalyze(c, d, $) . foobinade(a, b, $)) input
Or if you prefer left-to-right:

  let result = input
    |> foobinade(a, b, $)
    |> barbalyze(c, d, $)
Maybe what isn't clear is that this hole operator would bind to the innermost function call, not the whole statement.


Even better, this method lets you pipeline into a parameter which isn't the last one:

  let result = input
    |> add_prefix_and_suffix("They said '", $, "'!")


Yeah, especially in F#, a language that means to interpolate with .Net libraries (most not written with "data input at last" mindset.) now I'm quite surprised that F# doesn't have this feature.


Wow, this convinced me. It's so obviously the right approach when you put it this way.


This is essentially how Mathematica does it: the sugar `Foo[x,#,z]&` is semantically the same as `Function[{y}, Foo[x,y,z]]`. The `&` syntax essentially controls what hole belongs where.


For pipelines in any language, putting one function call per line often works well. Naming the variables can help readability. It also makes using a debugger easier:

  let foos = foobinate(a, b, input)
  let bars = barbakize(c, d, foos)
Other languages have method call syntax, which allows some chaining in a way that works well with autocomplete.


> Naming the variables can help readability

It can, or it can't; depending on the situation. Sometimes it just adds weight to the mental model (because now there's another variable in scope).


Sure, I like chained method calls too, for simple things. But it gets ridiculous sometimes where people write a ten-stage pipeline in a single expression and then call that "readable."


I'm with you 100%. The main thing is that sometimes a "break point" (using a variable rather than _more_ chain) can help readability. And sometimes it makes things worse. It's really a case-by-case type of thing.


> Local-first was the first kind of app. Way up into the 2000s, you'd use your local excel/word/etc, and the sync mechanism was calling your file annual_accounts_final_v3_amend_v5_final(3).xls

To be precise, these apps where not local-_first_, they where local-_only_. Local-first implies that the app first and foremost works locally, but also that it, secondly, is capable of working online and non-locally (usually with some syncing mechanism).


That sync mechanism was called save to floppy and hand it to whoever you want to share your changes with.


I never understood why people are so keen to do that in TypeScript. With that definition a `UserID` can still be silently "coerced" to a `string` everywhere. So you only get halfway there to an encapsulated type.

I think it's a much better idea to do:

    type UserID = { readonly __tag: unique symbol }
Now clients of `UserID` no longer knows anything about the representation. Like with the original approach you need a bit of casting, but that can be neatly encapsulated as it would be in the original approach anyway.


Yes, absolutely. I did programming competitions back in high-school (around 10 years ago) and common folklore was that back in the days knowing dynamic programming could win you a medal, but today it was just basic expected knowledge.


There's a lot of variety in DP. Knowing about DP doesn't help much with solving DP problems. I'm sure you've seen all the fun problems on codeforces or leetcode hards. The one quote I remember from Erik Demaine's lecture on DP is where he comments, "How do we solve this with DP ? - With difficulty."


That's my impression as well.

I think it's because access to knowledge has become a lot easier, so contestants know more; plus, the competitions themselves have become more organized.

For example, last I checked, the IOI had a syllabus outlining the various algorithms contestants needed to know. During my time, it was more of a wild west.


That's how Competition Makes All Great Again. Looks like it's by God's design. ^_^


> I guess Google’s years of experience led to the conclusion that, for software development to scale, a simple type system, GC, and wicked fast compilation speed are more important than raw runtime throughput and semantic correctness.

I'm a fan of Go, but I don't think it's the product of some awesome collective Google wisdom and experience. Had it been, I think they'd have come to the conclusion that statically eliminating null pointer exceptions was a worthwhile endeavor, just to mention one thing. Instead, I think it's just the product of some people at Google making a language they way they wanted to.


And indeed they did come to that conclusion - for Dart 2.

Go is the product of like 3 Googlers' tastes. It isn't some perfect answer born out of the experience of thousands of geniuses.

I think they got a lot right - fantastic tooling, avoiding glibc, auto-formatting, tabs, even the "no functional programming so you have to write simple code" thing is definitely a valid position. But I don't think anyone can seriously argue that Go's handling of null is anything but a huge mistake.


But those people at Google were veteran researchers who wanted to make a language that could scale for Google's use cases; these things are well documented.

For example, Ken Thompson has said his job at Google was just to find things he could make better.


They also built a language that can be learned in a weekend (well, now two) and is small enough for a fresh grad hire to learn at the job.

Go has a very low barrier to entry, but also a relatively low ceiling. The proliferation of codegen tools for Go is a testament of its limited expressive power.

It doesn't mean that Go didn't hit a sweet spot. For certain tasks, it very much did.


> A quick Google search with "flutter setstate is not refreshing" reveals a struggle that you will face quite often when running Flutter. It sounds like an easy fix, but the nature of Flutter using a bunch of nested Widgets creates, naturally, lasagna code that makes it hard to reason about this.

Can you expand on this OP? I've never had problems with `setState` nor "lasagna code" in Flutter. From a quick search I mostly seem to find questions from people who are still learning Flutter and getting basic things wrong.


State management in flutter can be done in so many ways so I get why it could feel complex, we’ve used Hooks a lot and it simplified a ton of stuff. Can also go the whole BLOC route too. Having issues with setState means you’re doing something that’s an anti-pattern


Yeah I think the point is that you have to become a state management expert in Flutter... even if the end result is not very complex there are so many options and so many pitfalls you still have to do a ton of thinking and learning to get there.

With egui you pretty much don't have to think about it at all.


How often do you break your phone that you've save sooo much? Mine is at least 2 years older (I got it 2 years before the Fairphone 4 was released) and I've spend 0$ dollars repairing it.


In a laptop context the advent of soldered on RAM and SSDs has made this a more significant issue though.

That 1TB I thought was enough might not be, and suddenly I need to buy a whole new machine to upgrade.


Are the SSDs often soldered? I thought that was only the RAM.


Depends on the brand/model, of course, but I think for a 13"-sized laptop, it's pretty common to solder in the storage as well.


Embarrasingly often! Things I've done specifically:

- Getting cement in the charger port

- Dropping the phone on its screen breaking it

With some phones both of those could be full replacements.


So you haven’t even changed the battery? That would be impressive durability.


RAID is not backup, but in some circumstances it's better than a backup. If you don't have RAID and your disk dies you need to replace it ASAP and you've lost all changes since your last backup. If you have RAID you just replace the disk and suffer 0 data loss.

That being said, the reason why I'm afraid of not using RAID is data integrity. What happens when the single HDD/SSD in your system is near its end of life? Can it be trusted to fail cleanly or might it return corrupted data (which then propagates to your backup)? I don't know and I'd be happy to be convinced that it's never an issue nowadays. But I do know that with a btrfs or zfs RAID and the checksuming done by these file systems you don't have to trust the specific consumer-grade disk in some random laptop, but instead can rely on data integrity being ensured by the FS.


You should not propagate changes to your backup in a way that overwrites previous versions. Otherwise a ransomware attack will also destroy your backup. Your server should be allowed to only append the data for new versions without deleting old versions.

Also, if you're paranoid avout drive behavior, run ZFS. It will detect such problems and surface it at the OS level (ref "Zebras All The Way Down" by Bryan Cantrill)


I was a bit confused by the remark that comptime is referentially transparent. I'm familiar with the term as it's used in functional programming to mean that an expression can be replaced by its value (stemming from it having no side-effects). However, from a quick search I found an old related comment by you [1] that clarified this for me.

If I understand correctly you're using the term in a different (perhaps more correct/original?) sense where it roughly means that two expressions with the same meaning/denotation can be substituted for each other without changing the meaning/denotation of the surrounding program. This property is broken by macros. A macro in Rust, for instance, can distinguish between `1 + 1` and `2`. The comptime system in Zig in contrast does not break this property as it only allows one to inspect values and not un-evaluated ASTs.

[1]: https://news.ycombinator.com/item?id=36154447


Yes, I am using the term more correctly (or at least more generally), although the way it's used in functional programming is a special case. A referentially transparent term is one whose sub-terms can be replaced by their references without changing the reference of the term as a whole. A functional programming language is simply one where all references are values or "objects" in the programming language itself.

The expression `i++` in C is not a value in C (although it is a "value" in some semantic descriptions of C), yet a C expression that contains `i++` and cannot distinguish between `i++` and any other C operation that increments i by 1, is referentially transparent, which is pretty much all C expressions except for those involving C macros.

Macros are not referentially transparent because they can distinguish between, say, a variable whose name is `foo` and is equal to 3 and a variable whose name is `bar` and is equal to 3. In other words, their outcome may differ not just by what is being referenced (3) but also by how it's referenced (`foo` or `bar`), hence they're referentially opaque.


Those are equivalent, I think. If you can replace an expression by its value, any two expressions with the same value are indistinguishable (and conversely a value is an expression which is its own value).


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You