pinaraf's comments

pinaraf · 2025-10-15T16:59:29 1760547569

And the latest driver available for Jetson Thor doesn't have the fixes for these two CVEs because they decided to fork their own driver...

pinaraf · on March 18, 2024

Yeah, well, sorry, I should have been more explicit here: the issue is with PostgreSQL, not LLVM. The JIT compiler has to inject direct memory addresses, making the generated code specific to your query and process.

weliveindetail · on March 18, 2024

Interesting, because we store relocatable objects. And process symbols can be resolved by name if you really want. It might be yet another performance trade-off though.

pinaraf · on March 18, 2024

Indeed, and right now it's the only possible way since it remains in a single session, doing otherwise would be very hard.

cbsmith · on March 18, 2024

Unless you count stored procs...

pinaraf · on March 18, 2024

Honestly I thought the same as you, then I wrote this, and I now understand it's going to be really hard to do. To make it very simple: there are pointers to query parts "leaking" everywhere across the execution engine. Removing them will require a significant overall of the execution engine, the planner and who knows what else. Even in a single session, two compiled queries will have different compiled code because of that (both llvm and my copyjit have to inject the adresses of various structs in asm code)

adzm · on March 18, 2024

Just going to say, I'm blown away by how simple this JIT is though. Really quite a beautiful JIT approach.

pinaraf · on March 18, 2024

Same for me, that's why I did this after finding out this research paper. With the proper compiler settings and small tricks you can remove some parts and already end up faster than the interpreter (because you remove some branches and a few memory accesses) and it's even possible to create "super-stencils" covering typical opcodes series and optimizing them further. Or the opposite, "sub-stencils" in order to do some loop unrolling for instance.

pinaraf · on March 18, 2024

Author here. Thanks for submitting my article on hackernews. I'll do my best to answer any question.

winternewt · on March 18, 2024

Is there a fundamental difference between copy and patch with C and what compilers do when they target intermediate representations? It seems to me that traditional compilation methods are also "copy and patch" but with another intermediate language than C.

tetha · on March 18, 2024

I think conceptually, there is no real difference. In the end, a compiler outputting machine code uses very small stencils, like "mov _ _", which are rather simple to patch.

Practically though, it's an enormous difference, as the copy and patch approach re-uses the years of work going into clang / gcc supporting platforms, optimizations for different platforms and so on. The approach enables a much larger pool of people ("People capable of writing C" vs "People capable of writing assembly / machine code") to implement very decent JIT compilers.

pinaraf · on March 18, 2024

The real difference is in the possible optimizations. If you consider the full scope of JIT compilation in for instance a web browser or the JVM, you could use copy and patch as a tier 0 compiler, and once really hot paths are identified, trigger a complete compiler with all the optimizer steps. Some optimizations are more complicated to implement with copy-patch, esp. if you can't use all the tricks described in the paper (for instance they use the ghccc calling convention to get a much finer register allocation, but from the documentation I don't think it's going to make it for PostgreSQL).

But as you say, yes, this enables people capable of writing C and reading assembly (or you have to be perfect and never have to go into gdb on your compiled code), and it makes the job so much faster and easier... Writing several machine code emitters is painful, and having the required optimization strategies for each ISA is quickly out of reach.

aengelke · on March 19, 2024

Thanks for the blog post, it's always nice to see performance improvements in Postgres! I'm curious: how much time is spent for LLVM on some real queries and how is LLVM configured (i.e., which passes, which back-end optimization, etc.)? In our experience [1], LLVM can be reasonably fast when optimized for compile time without optimizations and the -O0 back-end pipeline, but obviously still 10-20x slower compared to other approaches.

Also, in our experience, copy-and-patch-generated code tends to be quite slow and hard to optimize (we tried some things [2; Sec. 5], but there's still quite a gap (see Fig. 3 for a database evaluation)). Do you have some numbers on the run-time slowdown compared to LLVM? Any future plans for implementing multi-tiering, so dynamically switching from the quickly compiled code to LLVM-optimized code?

[1]: https://home.in.tum.de/~engelke/pubs/2403-cgo.pdf [2]: https://home.in.tum.de/~engelke/pubs/2403-cc.pdf

o11c · on March 19, 2024

Is copy-and-patch really a new idea, or just a new name for an old idea?

When I learned programming (and interpreters particularly) around 2010, I thought it was well-known that you could memcpy chunks of executable code that your compiler produced if you were careful ... the major gotcha was that the NX bit was just starting to take off at the time (Even on Linux, most people still assumed 32-bit distros and might be surprised that their CPUs even supported 64-bit. At some point I ended up with a netbook that didn't support 64-bit code at all ...).

Unfortunately I ended up spending too much time on the rest of the code to actually look deeply enough into it to build something useful.

aengelke · on March 19, 2024

It is an old idea with a new name. For example, QEMU orginally worked like this [1] and they already used relocations to patch in constants, before they later moved to TCG for higher-quality code.

[1]: https://www.usenix.org/legacy/event/usenix05/tech/freenix/fu...

pgaddict · on March 18, 2024

Would be a great topic for pgconf.eu in June (pgcon moved to Vancouver). Too bad the CfP is over, but there's the "unconference" part (but the topics are decided at the event, no guarantees).

mattashii · on March 18, 2024

Did you mean pgconf.dev in May (which has the unconference), or pgconf.eu in October (which doesn't have an unconference, but the CfP will open sometime in the - hopefully near - future)?

pgaddict · on March 18, 2024

Yeah, I meant May. Sorry :-( Too many conferences around that time, I got confused.

That being said, submitting this into the pgconf.eu CfP is a good idea too. It's just that it seems like a nice development topic, and the pgcon unconference was always a great place to discuss this sort of stuff. There are a couple more topics in the JIT area, so having a session or two to talk about those and how to move that forward would be beneficial.

frizlab · on March 18, 2024

Not a question, but I love this. I’m eager to see its evolution.

can3p · on March 18, 2024

Nice post, thanks! Do I read it right that using jit results in the worst max times? What could be a reason in your opinion?

pinaraf · on March 18, 2024

Two parts: I did the benchmark on a laptop and didn't spend enough time forcing its runtime PM in a fixed state, I'll run a real pgbench on my desktop once I implement all required opcodes for it. And since JIT requires a minimum amount of time (about 300us on my tests), on such small runtimes this can quickly overcome the benefits.

pinaraf · on Aug 12, 2016

PostgreSQL streaming : send disk diffs over network, raw binary, absolutely unusable if you don't have the same postgresql version on the other side Logical : send data diffs over network. Could be used for replication, but also audit or sending to different databases...

pinaraf · on Aug 9, 2016

Static checking ? Python 3.5 Asyncio ? Python 3.4

If these had been available in Python 3.2 or even better 3.0, the switch would have been far easier for corporate users that need benefits before accepting the cost of change...

raverbashing · on Aug 9, 2016

3.4. gave a good virtualenv by default. That helps a lot

brianwawok · on Aug 9, 2016

I still use normal virtualenv, what am I missing by not using pyenv?

raverbashing · on Aug 9, 2016

3.4 gave us pip installed by default https://docs.python.org/3/library/venv.html

Before that it would give you an empty venv, not even setuptools

Yes, you can use the Python 2 virtualenv on Py 3 but I remember that there were some problems

takeda · on Aug 9, 2016

The usage is essentially the same, just a different command. Huge benefit though is that if you have Python 3.4+ installed you already have the virtual env and pip installed, so things are much simpler.

stavros · on Aug 9, 2016

Which one is that? I'm not aware of it.

captainmuon · on Aug 9, 2016

    python -m venv

I'm not sure what the differences are to old virtualenv, but the biggest feature is that it works out of the box as long as you have Python > 3.4. No more googling "how to install virtualenv" or "easy_install pip; pip install virtualenv" stuff.

rspeer · on Aug 10, 2016

Unless you're on Ubuntu, where they ship a non-functional version of venv and require you to "sudo apt install" python-venv to get the working version, defeating the entire point of a simple module in the stdlib that lets you manage your Python environment as a user.

The big improvement on 16.04 is at least the error message explains what's going on.

brettcannon · on Aug 9, 2016

The venv module in the stdlib: https://docs.python.org/3/library/venv.html#module-venv

takeda · on Aug 9, 2016

in addition to what other wrote you also have pyvenv command which is much convenient way of using it in cli (the usage is same as of virtualenv).

pinaraf · on Aug 9, 2016

Well, it weights in favor of Python 2, and considering that until recently the incencitives for Python 3 were still light... It's not really about having users, but having the possibility of switching from CPython to PyPy if you face performance issues.

joejev · on Aug 9, 2016

PyPy is still not really a performance option for numerical computing.

xapata · on Aug 9, 2016

Are you not satisfied with NumPy and associated tools?

dagw · on Aug 10, 2016

Very satisfied, and that is why I can't really switch to PyPy

pinaraf · on Aug 1, 2016

So much work where there are many other solutions out there that don't have that limitation : PostgreSQL, several NoSQL databases...

yladiz · on Aug 1, 2016

While MySQL isn't the best db, and does have many flaws, it's asinine to say that other solutions are wholly better. Everything has a set of limitations, so just because something like Postgres doesn't have this specific limitation doesn't mean it doesn't have others. It also may be more worth it to work around the limitations of MySQL than to migrate all data to another database.

samlambert · on Aug 1, 2016

Exactly this.

bdcravens · on Aug 1, 2016

Keep in mind that Postgresql wasn't as in vogue when Github was first built, and most or all of the NoSQL solutions you're thinking of didn't even exist. Up until 2 years ago, they were running Rails 2.3, and then is when they moved to Rails 3 (Rails 4 had been out a year at that point). I am very assured that Github is very careful about their technology choices and not quick to chase trends, given the importance of their application.

brianwawok · on Aug 2, 2016

Really?

To me postgres and MySQL have been in a similar spot for 10 years at least. MySQL with a few more users but not massive.

bdcravens · on Aug 3, 2016

Not by my recollection, and most ways of looking at trends seem to suggest the same: (unfortunately many don't go back that far, but let you see some trends)

http://www.indeed.com/jobtrends/q-postgresql-q-mysql.html

http://db-engines.com/en/ranking_trend

http://readwrite.com/2013/09/10/postresql-hits-93-new-levels...

brianwawok · on Aug 3, 2016

I guess I forget that 70% of the web is PHP and 95% of PHP is MySQL, so I tend to not get a fair picture of life.

It appears that MySQL has held the 10x postgre spot for a long time, at least in search trends..

ceejayoz · on Aug 1, 2016

Literally right at this moment there's a post on the front page from Postgres detailing the valid technical reasons they lost Uber to MySQL.

emj_ · on Aug 2, 2016

And if you read carefully through their post, and the comments, and the commentary on pgsql-hackers, you'd see that: 1) it was an engineering decision based on their particular situation and use case (as it should be) 2) that use case may be pretty specific and/or unique, and not well suited for Postgres (which is fine) 3) they don't explain what all their tradeoffs are, just the ones they're making arguments against (which makes the post much less useful than it could be)

I would not take that blog post as a general "MySQL is better than Postgres" argument. It really needed more info on what they're doing, why they're doing it that way, and what tradeoffs they were willing to make (speed vs. data integrity, etc.).

thedaniel · on Aug 1, 2016

Other systems with their own sets of limitations...

jerguismi · on Aug 1, 2016

Many people are tied to MySQL for other reasons, it might be too costly investment to switch, etc...

pinaraf · on April 19, 2015

Debian sid, what else... On desktop, laptop, servers... Tools : QtCreator / KDevelop for C++, Netbeans for Java, vim or Kate for everything else

HN For You