More

dbreunig · 2025-05-03T22:21:54 1746310914

Yes. The difference between provisioning a server and running 'install spatial' in a CLI is night and day.

Docker has been a big improvement (when I was first learning PostGIS, the amount of time I had to hunt for proj directories or compile software just to install the plugin was a major hurdle), but it's many steps away from:

``` $ duckdb D install spatial; ```

Demiurge · 2025-05-03T23:47:34 1746316054

What do you mean by "provisioning a server"? That's a strange requirement. You can install Postgis on a macbook in one command, or actually on all 3 major OS's in one command: "brew install postgis", "apt-get install postgresql-postgis, and "choco install postgis-9.3". Does DuckDB not require a "server" or a "computer"? What does Docker have to do with anything? This is a very confusing train of thought.

debugnik · 2025-05-04T09:40:42 1746351642

> Does DuckDB not require a "server"

No, it works like SQLite.

Doctor_Fegg · 2025-05-04T07:18:20 1746343100

PostGIS is included in Postgres.app which is a single executable for Mac. DuckDB appears also to be a single file download for Mac. I’m not sure your “when I was first learning PostGIS” experience reflects the current situation.

https://postgresapp.com/

frainfreeze · 2025-05-03T22:38:56 1746311936

I mean I like duckdb but this feels like you're pushing for it. On my system postgis comes from apt install, and it's one command to activate the "plugin". Is the night and day part not having to run random sh script from the internet to install software on my system?

tomnipotent · 2025-05-03T22:56:24 1746312984

DuckDB doesn't require a running server. I run duckdb in a terminal, query 10,000 CSV or parquet files and run SQL on them while joining to data hosted in sqlite, a separate duckdb file using its native format, or even Postgres.

dbreunig · 2025-05-03T23:03:55 1746313435

That’s great! The difference is you’re familiar and know how to do that

Getting started from 0 with geo can be difficult for those unfamiliar. DuckDB packages everything into one line with one dependency.

_boffin_ · 2025-05-04T02:08:10 1746324490

https://postgresapp.com

dbreunig · 2025-05-03T22:18:49 1746310729

Author here: what's special is that you can go from 0 to spatial data incredibly quickly, in the data generalist tool you're already using. It makes the audience of people working with geospatial data much bigger.

(Geopandas is great, too.)

dopidopHN · 2025-05-03T23:11:01 1746313861

I’m very familiar with Postgres and spinning one with postgis seems easy enough. Do I get more with duckdb?

Most of the time I store locations and compute distance to them. Would that being faster to implement with duckdb

wenc · 2025-05-04T00:12:29 1746317549

Probably no difference for your use-case (ST_Distance). If you already have data in Postgres, you should continue using Postgis.

In my use case, I use DuckDB because of speed at scale. I have 600GBs of lat-longs in Parquet files on disk.

If I wanted to use Postgis, I would have to ingest all this data into Postgres first.

With DuckDB, I can literally drop into a Jupyter notebook, and do this in under 10 seconds, and the results come back in a flash: (no need to ingest any data ahead of time)

  import duckdb
  duckdb.query("INSTALL spatial; LOAD spatial;")
  duckdb.query("select ST_DISTANCE(ST_POINT(lng1, lat1), ST_POINT(lng2, lat2)) dist from '/mydir/*.parquet'")

vasco · 2025-05-04T05:00:51 1746334851

I haven't yet understood this pattern (and I tried using duckdb). Unless you're only ever going to query those files once or twice in your life, importing them into postgres shouldn't be that long and then you can do the same or more than with DuckDB.

Also as a side note, is everyone just using DuckDB in memory? Because as soon as you want some multiple session stuff I'd assume you'd use DuckDB on top of a local database, so again I don't see the point but I'm sure I'm missing something.

wenc · 2025-05-04T06:12:56 1746339176

> importing them into postgres shouldn't be that long and then you can do the same or more than with DuckDB.

Usually new data is generated regularly and would require creating a separate ETL process to ingest into Postgres. With DuckDB, no ETL is needed. New Parquet files are just read off the disk.

> Also as a side note, is everyone just using DuckDB in memory?

DuckDB is generally used as a single-user, and yes in-memory use case is most common. Not sure about use cases where a single user requires multiple sessions? But DuckDB does have read concurrency, session isolation etc. I believe write serialization is supported in multiple sessions.

With Parquet files, it's append-only so the "write" use-cases tend to be more limited. Generally another process generates those Parquet files. DuckDB just works with them.

indeyets · 2025-05-04T08:35:14 1746347714

> Usually new data is generated regularly

This part was not obvious. In a lot of cases geodata is mostly stable and reads/searches dominate over appends. And that’s why we keep this in DB (usually postgis, yes).

So DuckDB is optimised for very different use case and it is not always obvious when it’s mentioned

wenc · 2025-05-04T15:35:31 1746372931

This is the more trivial use case (static data, heavy reads) and DuckDB is absolutely optimized for this use case.

DuckDB also provides a vectorized, parallelized engine. When I run a query all of my 32 cores light up on htop.

simlevesque · 2025-05-04T10:11:07 1746353467

But DuckDB works just as well with static data.

lugarlugarlugar · 2025-05-04T05:58:00 1746338280

I think the main point is not having to store a duplicate of the 600GB of input data.

touisteur · 2025-05-04T10:18:28 1746353908

And now I'm curious whether there's a way to actually index external files (make these queries over 600GB faster) and have this index (or many indices) be persistent. I might have missed that when I looked at the docs...

wenc · 2025-05-04T17:57:11 1746381431

If the data is in Parquet they are already indexed in a sense. No further indexing necessary.

If they are stored in DuckDB’s native format (which I don’t use), it supports some state of the art indices.

https://duckdb.org/docs/stable/sql/indexes.html

I find Parquet plenty fast though.

touisteur · 2025-05-06T05:38:47 1746509927

Ah thanks, of course. I was thinking of dealing with millions of (Geo)JSON files adding up to terabytes, without copying/duplicating them though, mostly indexing. I used to do that with postgres foreign data wrappers and had hopes for duckdb :-). But that's a question for SO or other forum.

dbreunig · 2025-05-03T22:16:31 1746310591

Author here: the beauty of DuckDB spatial is that the projections and CRS options are hidden until you need them. For 90% of geospatial data usage people don't and shouldn't need to know about projections or CRS.

Yes, there are so many great tools to handle the complexity for the capital-G Geospatial work.

I love Felt too! Sam and team have built a great platform. But lots of times a map isn't needed; an analyst just needs it as a column.

PostGIS is also excellent! But having to start up a database server to work with data doesn't lend itself to casual usage.

The beauty of DuckDB is that it's there in a moment and in reach for data generalists.

korkoros · 2025-05-03T23:04:19 1746313459

My experience has been that data generalists should stay away from geospatial analysis precisely because they lack a full appreciation of the importance of spatial references. I've seen people fail at this task in so many ways. From "I don't need a library to reproject, I'll just use a haversine function" to "I'll just do a spatial join of these address points in WGS84 to these parcels in NAD27" to "these North Korean missiles aren't a threat because according to this map using a Mercator projection, we are out of range."

DuckDB is great, but the fact that it makes it easier for data generalists to make mistakes with geospatial data is mark against it, not in its favor.

groggo · 2025-05-04T01:09:57 1746320997

I remember learning about the infamous missile threat map in a GIS class -

https://georeferenced.wordpress.com/2014/05/22/worldmapblund... https://www.economist.com/asia/2003/05/15/correction-north-k...

jparishy · 2025-05-03T22:25:34 1746311134

I think we're mostly making the same point about complexity, ya.

To me, I think it's mostly a frontend problem stopping the spread of mapping in consumer apps. Backend geo is easy tbh. There is so much good, free tooling. Mapping frontend is hell and there is no good off the shelf solution I've seen. Some too low level, some too high level. I think we need a GIS-lite that is embeddable to hide the complexity and let app developers focus on their value add, and not paying the tax of having frontend developers fix endless issues with maps they don't understand.

edit: to clarify, I think there's a relationship between getting mapping valued by leadership such that the geo work can be even be done by analysts, and having more mapping tools exist in frontend apps such that those leaders see them and understand why geo matters. it needs to be more than just markers on the map, with broad exposure. hence my focus on frontend web. sorry if that felt disjointed

dbreunig · 2025-05-03T23:05:10 1746313510

Not disjointed at all. That last topic is the big challenge to solve.

febed · 2025-05-05T00:21:55 1746404515

Last I checked DuckDB spatial didn’t support handling projections. It couldn’t load the CRS from a .prj file. This makes it useless for serious geospatial stuff.

dbreunig · on March 28, 2025

Don't forget Blackwater!

sorokod · on March 28, 2025

Currently known as Constellis.

https://en.m.wikipedia.org/wiki/Blackwater_(company)

cosmicgadget · on March 28, 2025

Did Elon buy the Xe name from them or something?

dbreunig · on March 24, 2025

Nice! Great idea and focus.

I have a similar function built into my app, which takes the proposed name for a checklist and description and uses it to generate steps. Have heard many use it for executive function management: https://steplist.app

novoreorx · on March 25, 2025

In the age of AI, human-written instructions are not only intriguing and valuable but also more dependable and meaningful than those created by AI.

jaggs · on March 24, 2025

Took a look. So annoying that you can't delete any of those lists. What happens if I don't like/drink coffee?

dbreunig · on March 24, 2025

When you log in you can create your own lists, bookmark ones you might like, and only see the ones you want.

dbreunig · on Dec 13, 2024

Altman appears to say AGI is far away when he shouldn't be regulated, right around the corner when he's raising funds, or is going to happen tomorrow and be mundane when he's trying break a Microsoft contract.

dbreunig · on Dec 6, 2024

If everyone has the same terms and roughly equivalent models, enterprises will continue choosing Microsoft and Amazon.

dbreunig · on Dec 5, 2024

When You're Raising $6.6B: "It is possible that we will have superintelligence in a few thousand days."

When You Want Out of Your Contract w/ Microsoft: "My guess is we will hit AGI sooner than most people in the world think and it matter much less."

dbreunig · on Nov 1, 2024

Pichai inherited one of the most difficult challenges: he had to figure out how to grow revenue right as search revenue was plateauing and money was becoming expensive, all against the headwinds of Google's culture. Your comment is a good example of the headwind: I can't think of another company where a greater share of its employees actively despised the industry that paid for them all to be there.

Google's culture wasn't a problem – and maybe a good thing! – when you had a gigantic money printer working in the background, paying for everything. But it was there, ready to be a problem once times got tough.

Malidir · on Nov 1, 2024

Google were too risk averse with AI.

Should never have led on LLM and not let OpenAI get so big

krackers · on Nov 1, 2024

The one time they _should_ have created a new chat app.

bananapub · on Nov 1, 2024

google was leading on LLM, Search just (correctly) didn't want to tarnish their good name with the disaster that putting matrix-multiplied text generation on it would case.

in a sense, google solved this by trashing it's own reputation, so it's not actually a "net" problem that it spews garbage on the result page now.

danudey · on Nov 1, 2024

"We need to integrate generative AI into our search results."

"But that will make the product objectively worse."

"But it's what customers want."

"Is it?"

"It's what clueless investors want the customer to want."

"Fair enough."

seanhunter · on Nov 1, 2024

I don't think that's quite correct. I have my screenshot from years back when google was "correcting" your query so I asked google what year Marilyn Munroe shot JFK and it said 1962. I'm reasonably sure this is before the transformers paper.

bananapub · on Nov 1, 2024

> Pichai inherited one of the most difficult challenges: he had to figure out how to grow revenue right as search revenue was plateauing and money was becoming expensive

see, that's just untrue. you could see the company turning all sorts of knobs to increase revenue over the last ten years - cutting down on perks, cutting down on food quality, increasing ad density, making promo harder, etc, etc - this was all happening and it was all happening at a high enough rate to let Ruth announce records almost every quarter. it was a finely oiled machine. and people grumbled, but quietly - it was fine!

doing mass layoffs absolutely changed the game - it showed that Pichai would be pushed around by random "researchers" publishing revenue guesses, it showed that the company would provide dire consequences for employees who were on projects that didn't pan out or were bad ideas, etc. the cost for all that is now everyone is correctly more nervous and pissed off!

> I can't think of another company where a greater share of its employees actively despised the industry that paid for them all to be there.

this is absolutely true, but as you say, it was fine - people still ran the ads machine and processed the firehose of cash, and a thousand other machines besides.

dbreunig · on Nov 1, 2024

Your first quote leaves off the important bit: against the headwinds of Google culture. I agree, the first half of the sentence is a common scenario among CEOs, it's the 2nd bit that makes it especially challenging.

I disagree that it was "fine" because the people who were respected and held up as prime examples of Googlers – for decades – almost always had nothing to do with the ads business. This sends a signal to the rest that the ads business doesn't really matter, it's not the important work, it's akin to the janitorial work that has to happen so the rest of us can work here. If you do that long enough you end up with people optimizing towards things that aren't aligned with the needs of the business. It was on autopilot, relatively, compared to other areas.

burningChrome · on Nov 1, 2024

>> Your first quote leaves off the important bit: against the headwinds of Google culture.

I would also add all of the companies that were bought by Google in hopes they would ease some of the laser focus on their search/ad business and never panned out and then were subsequently killed by them.

I think this one of the overlooked issues they've had. When they've repeatedly tried to venture into other areas by buying smaller companies in hopes they could grow their revenue and create another revenue stream, the failure rate of those companies is pretty staggering.

All three of these have not made it easy for them to try and re-brand themselves as anything but a search/ad company.

dbreunig · on Oct 5, 2024

Thanks! This is a near perfect use case for Ractors since we chunked all the files and there’s no need for the file processing function to share any context.

HN For You