More

mattforrest · 2026-02-07T19:00:43 1770490843

I wrote a post about using H3 or any DGGS for that matter. Yes it speeds things up but you loose accuracy. If search is the primary concern it can help but if any level of accuracy matters I would just use a better engine with GeoParquet to handle it. https://sedona.apache.org/latest/blog/2025/09/05/should-you-...

mattforrest · 2026-01-09T16:52:50 1767977570

SpatialBench is an open benchmark suite for spatial SQL. The goal is to compare engines on price-performance, since serverless makes raw “x times faster” claims hard to interpret.

I used it to compare Databricks SQL Serverless (Medium) vs Databricks Jobs clusters with Apache Sedona 1.7 across 12 queries (from simple filters to joins, distance joins, multi-way joins, KNN) at SF100 and SF1000 (SF1000 is roughly 500GB uncompressed Parquet).

TLDR apart from one query, Sedona was up to ~6x better on cost per query, and also covered more queries under the same 10 hour timeout guardrails. Some queries didn’t finish or errored on either side, so there is a capability matrix in the post.

mattforrest · 2025-09-24T17:49:43 1758736183

I wrote a book on PostGIS and used it for years and these single node analytical tools make sense when PostGIS performance starts to break down. For many tasks PostGIS works great, but again you are limited by the fact that your tables have to live in the DB and can only scale as much as the computing resources you have allocated.

In terms of number of functions PostGIS is still the leader, but for analytical functions (spatial relationships, distances, etc) having those in place in these systems is important. DuckDB started this but this has a spatial focused engine. You can use the two together, PostGIS for transactional processing and queries, and then SedonaDB for processing and data prep.

A combination of tools makes a lot of sense here especially as the data starts to grow.

larodi · 2025-09-25T16:27:21 1758817641

Not saying these shouldn't be used together, but even then, increased complexity will pay only in very limited scenarios. The generic SQLite can perhaps handle 80% of all wordpress needs.

Postgres made gigantic leaps in recent years - both in performance and feature-set. I don't think ever comparing the new contenders with daddy is fair. But then there are the DuckDB advocates who claim it pioneered spatial, which is so much not true.

Postgres is amazing system, which is also available free. We don;t have too many of these, and too many aging that well.

th0ma5 · 2025-09-24T18:04:52 1758737092

I think this is a great perspective in my professional experience it was very common to be using multiple tools. ESRI for some things, GDAL for others, and then some hacks here and there like most complex analytical systems. Some of it vendor shenanigans but some of it is specific features.

mattforrest · 2025-09-05T12:53:09 1757076789

I put together a tutorial for Apache Sedona which brings geospatial to Spark.

A project that crunches real estate and satellite imagery data with scalable spatial joins

Sedona basically makes spatial at scale way more accessible. Instead of rolling your own hacks, you get Sparks distributed compute with geospatial APIs baked in.

mattforrest · on Feb 7, 2025

They aren’t reliable correct actually. The boundaries that the Census publishes are called Zip Code Tabulation Areas which are approximations of zip codes and include overlaps.

wombatpm · on Feb 7, 2025

ZCTA5 roughly corresponds to the area of a 5 digit zip code. Problem is there are large areas of the west that don’t have permanent residents and no mail delivery. Plus they change over time.

mattforrest · on Feb 7, 2025

Just use a spatial query. That’s what they are made for.

mattforrest · on Feb 7, 2025

Yeah but we have that already in the census hierarchy. Plus you have to pay to access Zip+4 geospatial data and it changes sometime as frequently as quarterly

mattforrest · on Feb 7, 2025

Well put

mattforrest · on Feb 7, 2025

Yeah but Zip+4 represent a collection of houses not a polygon so not useful for aggregations or statistical work

mattforrest · on Feb 7, 2025

Not necessarily true. The population isn't balanced at all between many. Census units are.

ellisv · on Feb 7, 2025

Absolutely this. Use other Census areal units if you can and ZCTAs only if you have to.

HN For You