Yes. The difference between provisioning a server and running 'install spatial' in a CLI is night and day.
Docker has been a big improvement (when I was first learning PostGIS, the amount of time I had to hunt for proj directories or compile software just to install the plugin was a major hurdle), but it's many steps away from:
What do you mean by "provisioning a server"? That's a strange requirement. You can install Postgis on a macbook in one command, or actually on all 3 major OS's in one command: "brew install postgis", "apt-get install postgresql-postgis, and "choco install postgis-9.3". Does DuckDB not require a "server" or a "computer"? What does Docker have to do with anything? This is a very confusing train of thought.
PostGIS is included in Postgres.app which is a single executable for Mac. DuckDB appears also to be a single file download for Mac. I’m not sure your “when I was first learning PostGIS” experience reflects the current situation.
I mean I like duckdb but this feels like you're pushing for it. On my system postgis comes from apt install, and it's one command to activate the "plugin". Is the night and day part not having to run random sh script from the internet to install software on my system?
DuckDB doesn't require a running server. I run duckdb in a terminal, query 10,000 CSV or parquet files and run SQL on them while joining to data hosted in sqlite, a separate duckdb file using its native format, or even Postgres.
Author here: what's special is that you can go from 0 to spatial data incredibly quickly, in the data generalist tool you're already using. It makes the audience of people working with geospatial data much bigger.
Probably no difference for your use-case (ST_Distance). If you already have data in Postgres, you should continue using Postgis.
In my use case, I use DuckDB because of speed at scale. I have 600GBs of lat-longs in Parquet files on disk.
If I wanted to use Postgis, I would have to ingest all this data into Postgres first.
With DuckDB, I can literally drop into a Jupyter notebook, and do this in under 10 seconds, and the results come back in a flash: (no need to ingest any data ahead of time)
I haven't yet understood this pattern (and I tried using duckdb). Unless you're only ever going to query those files once or twice in your life, importing them into postgres shouldn't be that long and then you can do the same or more than with DuckDB.
Also as a side note, is everyone just using DuckDB in memory? Because as soon as you want some multiple session stuff I'd assume you'd use DuckDB on top of a local database, so again I don't see the point but I'm sure I'm missing something.
> importing them into postgres shouldn't be that long and then you can do the same or more than with DuckDB.
Usually new data is generated regularly and would require creating a separate ETL process to ingest into Postgres. With DuckDB, no ETL is needed. New Parquet files are just read off the disk.
> Also as a side note, is everyone just using DuckDB in memory?
DuckDB is generally used as a single-user, and yes in-memory use case is most common. Not sure about use cases where a single user requires multiple sessions? But DuckDB does have read concurrency, session isolation etc. I believe write serialization is supported in multiple sessions.
With Parquet files, it's append-only so the "write" use-cases tend to be more limited. Generally another process generates those Parquet files. DuckDB just works with them.
This part was not obvious. In a lot of cases geodata is mostly stable and reads/searches dominate over appends. And that’s why we keep this in DB (usually postgis, yes).
So DuckDB is optimised for very different use case and it is not always obvious when it’s mentioned
And now I'm curious whether there's a way to actually index external files (make these queries over 600GB faster) and have this index (or many indices) be persistent. I might have missed that when I looked at the docs...
Ah thanks, of course. I was thinking of dealing with millions of (Geo)JSON files adding up to terabytes, without copying/duplicating them though, mostly indexing. I used to do that with postgres foreign data wrappers and had hopes for duckdb :-). But that's a question for SO or other forum.
Author here: the beauty of DuckDB spatial is that the projections and CRS options are hidden until you need them. For 90% of geospatial data usage people don't and shouldn't need to know about projections or CRS.
Yes, there are so many great tools to handle the complexity for the capital-G Geospatial work.
I love Felt too! Sam and team have built a great platform. But lots of times a map isn't needed; an analyst just needs it as a column.
PostGIS is also excellent! But having to start up a database server to work with data doesn't lend itself to casual usage.
The beauty of DuckDB is that it's there in a moment and in reach for data generalists.
My experience has been that data generalists should stay away from geospatial analysis precisely because they lack a full appreciation of the importance of spatial references. I've seen people fail at this task in so many ways. From "I don't need a library to reproject, I'll just use a haversine function" to "I'll just do a spatial join of these address points in WGS84 to these parcels in NAD27" to "these North Korean missiles aren't a threat because according to this map using a Mercator projection, we are out of range."
DuckDB is great, but the fact that it makes it easier for data generalists to make mistakes with geospatial data is mark against it, not in its favor.
I think we're mostly making the same point about complexity, ya.
To me, I think it's mostly a frontend problem stopping the spread of mapping in consumer apps. Backend geo is easy tbh. There is so much good, free tooling. Mapping frontend is hell and there is no good off the shelf solution I've seen. Some too low level, some too high level. I think we need a GIS-lite that is embeddable to hide the complexity and let app developers focus on their value add, and not paying the tax of having frontend developers fix endless issues with maps they don't understand.
edit: to clarify, I think there's a relationship between getting mapping valued by leadership such that the geo work can be even be done by analysts, and having more mapping tools exist in frontend apps such that those leaders see them and understand why geo matters. it needs to be more than just markers on the map, with broad exposure. hence my focus on frontend web. sorry if that felt disjointed
Last I checked DuckDB spatial didn’t support handling projections. It couldn’t load the CRS from a .prj file. This makes it useless for serious geospatial stuff.
I have a similar function built into my app, which takes the proposed name for a checklist and description and uses it to generate steps. Have heard many use it for executive function management: https://steplist.app
Altman appears to say AGI is far away when he shouldn't be regulated, right around the corner when he's raising funds, or is going to happen tomorrow and be mundane when he's trying break a Microsoft contract.
Pichai inherited one of the most difficult challenges: he had to figure out how to grow revenue right as search revenue was plateauing and money was becoming expensive, all against the headwinds of Google's culture. Your comment is a good example of the headwind: I can't think of another company where a greater share of its employees actively despised the industry that paid for them all to be there.
Google's culture wasn't a problem – and maybe a good thing! – when you had a gigantic money printer working in the background, paying for everything. But it was there, ready to be a problem once times got tough.
google was leading on LLM, Search just (correctly) didn't want to tarnish their good name with the disaster that putting matrix-multiplied text generation on it would case.
in a sense, google solved this by trashing it's own reputation, so it's not actually a "net" problem that it spews garbage on the result page now.
I don't think that's quite correct. I have my screenshot from years back when google was "correcting" your query so I asked google what year Marilyn Munroe shot JFK and it said 1962. I'm reasonably sure this is before the transformers paper.
> Pichai inherited one of the most difficult challenges: he had to figure out how to grow revenue right as search revenue was plateauing and money was becoming expensive
see, that's just untrue. you could see the company turning all sorts of knobs to increase revenue over the last ten years - cutting down on perks, cutting down on food quality, increasing ad density, making promo harder, etc, etc - this was all happening and it was all happening at a high enough rate to let Ruth announce records almost every quarter. it was a finely oiled machine. and people grumbled, but quietly - it was fine!
doing mass layoffs absolutely changed the game - it showed that Pichai would be pushed around by random "researchers" publishing revenue guesses, it showed that the company would provide dire consequences for employees who were on projects that didn't pan out or were bad ideas, etc. the cost for all that is now everyone is correctly more nervous and pissed off!
> I can't think of another company where a greater share of its employees actively despised the industry that paid for them all to be there.
this is absolutely true, but as you say, it was fine - people still ran the ads machine and processed the firehose of cash, and a thousand other machines besides.
Your first quote leaves off the important bit: against the headwinds of Google culture. I agree, the first half of the sentence is a common scenario among CEOs, it's the 2nd bit that makes it especially challenging.
I disagree that it was "fine" because the people who were respected and held up as prime examples of Googlers – for decades – almost always had nothing to do with the ads business. This sends a signal to the rest that the ads business doesn't really matter, it's not the important work, it's akin to the janitorial work that has to happen so the rest of us can work here. If you do that long enough you end up with people optimizing towards things that aren't aligned with the needs of the business. It was on autopilot, relatively, compared to other areas.
>> Your first quote leaves off the important bit: against the headwinds of Google culture.
I would also add all of the companies that were bought by Google in hopes they would ease some of the laser focus on their search/ad business and never panned out and then were subsequently killed by them.
I think this one of the overlooked issues they've had. When they've repeatedly tried to venture into other areas by buying smaller companies in hopes they could grow their revenue and create another revenue stream, the failure rate of those companies is pretty staggering.
All three of these have not made it easy for them to try and re-brand themselves as anything but a search/ad company.
Thanks! This is a near perfect use case for Ractors since we chunked all the files and there’s no need for the file processing function to share any context.
Docker has been a big improvement (when I was first learning PostGIS, the amount of time I had to hunt for proj directories or compile software just to install the plugin was a major hurdle), but it's many steps away from:
``` $ duckdb D install spatial; ```