For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more pqdbr's commentsregister

Are you using a dedicated pg instance for vector or you keep all your data in a single pg instance (vector and non-vector)?


The biggest selling point to using Postgres over qdrant or whatever is that you can put all the data in the same db and use joins and ctes, foreign keys and other constraints, lower latency, get rid of effectively n+1 cases, and ensure data integrity.


I generally agree that one database instance is ideal, but there are other reasons why Postgres everywhere is advantageous, even across multiple instances:

- Expertise: it's just SQL for the most part - Ecosystem: same ORM, same connection pooler - Portability: all major clouds have managed Postgres

I'd gladly take multiple Postgres instances even if I lose cross-database joins.


Yep. If performance becomes a concern, but we still want to exploit joins etc, it's easy to set up replicas and "shard" read only use cases across replicas.


Postgres supports the Foreign Data Wrapper concept from SQL/MED. If you configure this you can do joins across instances, even!

https://www.postgresql.org/docs/current/postgres-fdw.html


All in one of course. That’s the biggest advantage. And why postgres is great - it covers virtually all standard use cases.


Came in the comment section looking to see if it was just me. Had to read it 4 times


I had a lot of fun like you as well, until I got my first DDoS and bot attacks. There's a reason Cloudflare has 20% of internet traffic.


Any project that starts gaining any bit of traction get's hammered with bots (the ones that try every single /wp url even tough you don't even use Wordpress), frequent DDoS attacks, and so on.

I consider my server's real IP (or load balancer IP) as a secret for that reason, and Cloudflare helps exactly with that.

Everything goes through Cloudflare, where we have rate limiters, Web firewall, challenges for China / Russian inbound requests (we are very local and have zero customers outside our country), and so on.


I had just deployed. Started reverting commits like crazy.


+1 for GitX. For some reason the most recent version in GitHub doesn't work well for me anymore, so I keep an extremely old version (Version 0.15.1964 dev (0.15.1964) in a Dropbox folder and it's my daily driver for years.


Id love to read a blog post like this about S3 Vector buckets. Does anyone have experience with it in production?


The service is still in preview, so AWS are explicitly telling people not to put it into production.

From my non-production experiments with it, the main limitation is that you can only retrieve up to 30 top_k results, which means you can't use it with a re-ranker, or at least not as effectively. For many production use cases that will be a deal breaker.


My issue with it is that it requires a lot of duplication between it and a traditional rdbms; you can’t use it alone because it doesn’t offer filtering without a search vector (i.e. what some vendors call a scroll function).


I dropped cursor for the precise reason you mention: reliability.

Countless times my requests in the AI chat just hang there for 30+ seconds more until I can retry them.

When I decided to give Claude Code a try (I thought I didn't need it because I used Claude in Cursor) I couldn't believe how faster it was, and literally 100% reliable.

EDIT: given today's release, decided to give it a go. The Composer1 model _is_ fast, but right at the second new agent I started I got this:

> Connection failed. If the problem persists, please check your internet connection or VPN


Sounds like you have a network problem. Did you try checking the network diagnostic in settings? They default to http2 which can throw a wrench in some corporate networks.

I would be willing to bet money your issue is on your side. I am a daily user since the beginning and cannot recall when I have had issues like you describe unless it was related to my corp network.


A lot of progress is being made here on the Cursor side I encourage you to try it again.

(Cursor dev)


This is the exact reason I left Cursor for Claude Code. Night and day difference in reliability. The Windows experience might be especially bad, but it would get constantly hung or otherwise fail when trying to run commands. I also had to babysit Cursor and tell it to continue for mid sized tasks.


They've improved performance dramatically in the last few weeks, might have fixed your issues.


Its clear they've been shipping a lot of windows updates.


It does seem significantly better on Windows. I'll give it another chance over the next couple weeks.


I use cursor daily, my business partner uses CC. Without a doubt, CC is certainly better, I'm just not willing to let go of the flow I spent the last year fine tuning. I'll probably make the leap after we finish the latest release.


Great article, specially for this part:

> What can go wrong with using UUIDv7 Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit Unix timestamp as its most significant part, meaning the identifier itself leaks the record's creation time.

> This leakage is primarily a privacy concern. Attackers can use the timing data as metadata for de-anonymization or account correlation, potentially revealing activity patterns or growth rates within an organization. While UUIDv7 still contains random data, relying on the primary key for security is considered a flawed approach. Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.


> Experts recommend

What experts? For what scenarios specifically? When do they consider time-of-creation to be sensitive?


Or just generate them in bulk and take them from a list?


> Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.

So then what's the point? How I always did things in the past was use an auto increment big int as the internal primary key, and then use a separate random UUID for the external facing key. I think this recommendation from "experts" is pretty dumb because you get very little benefit using UUIDV7 (beyond some portability improvements) if you're still using a separate internal key.

While I wouldn't use UUIDV7 as a secure token like I would UUIDV4, I don't see anything wrong with using UUIDV7 as externally exposed object keys - you're still going to need permissions checks anyway.


I asked a similar question, and yeah it seems like this is entirely for distributed systems, even then only some of them. Your basic single DB Postgres should just have a serial PK.


For distributed databases where you can't use autoincrement.

Or where, for some reason, the ID needs to be created before being inserted into the database. Like you're inserting into multiple services at once.


Many distributed databases have mechanisms to use an auto-increment, actually - often, generating large chunks at a time to hand out.


Our “distributed database” at a fortune 90 company spans at least 10 different database products.

UUIDv4 lets us sidestep this.

Is it bad design? Probably. Is it going to happen at huge companies? Yes.


You’re not wrong, of course. It’s a natural consequence of the eschewing of DBAs, and the increasingly powerful compute available - even if someone did notice that the slowdown was due to the PK choice, they can often “fix” that by paying more money.


I wish Postgres would just allow you look up records by the random component of the field, what are the chances of collisions with 80 bits of randomness? My guess is it’s still enough.


You can certainly create that index.


Yes, just obviously if it’s automated and part of Postgres people will use it without having to think too much and it removes one of the objections to what I think for most large systems is a sensible way to go rather than controversial because security.


What could be better is to allow to create a type with custom display, in/out and internally set the native type IN SQL (this require to do it in c)


> growth rates

I honestly don't see how.


Would like to know more about your postgres offering: does it offer streaming replicas and streaming backup? Or just dump stored to s3?


Yes we offer clusters with auto failover and replicas can be in multiple regions and even in multiple providers.

We support postgres but also MySQL, redis, opensearch, Clickhouse and many more.

About backups we offer differential snapshots and regular dumps that you can send to your own S3 bucket

https://docs.elest.io/books/databases/page/deploy-a-new-clus...


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You