More

GeorgeCurtis · 2025-05-14T04:44:44 1747197884

We don't have any benchmarks against them but from what I've just read about there bench marks, we should be just as good as them.

That is just heresy though, am interested myself now and will run some proper benchmarks

GeorgeCurtis · 2025-05-14T04:24:57 1747196697

So far, we've tested it for up to ~10B connections and 50 odd million nodes. We didn't run in to any problems with it yet.

GeorgeCurtis · 2025-05-14T03:38:13 1747193893

gel is a relational database, have you been building with it under a graph type philosophy?

GeorgeCurtis · 2025-05-14T03:32:40 1747193560

Neo4j first of all is very slow for vectors, so if performance is something that matters for your user experience they definitely aren't a viable option. This is probably why Neo4j themselves have released guides on how to build that middleman software I mentioned with Qdrant for viable performance.

Furthermore, the vectors is capped at 4k dimensions which although may be enough most of the time, is a problem for some of the users we've spoken to. Also, they don't allow pre filtering which is a problem for a few people we've spoken to including Zep AI. They are on the right track, but there are a lot of holes that we are hoping to fill :)

Edit: AND, it is super memory intensive. People have had problems using extremely small datasets and have had memory overflows.

mauvo59 · 2025-05-14T22:55:17 1747263317

Hey, want to correct some of your statements here. :-)

Neo4j's vector index uses Lucene's HNSW implementation. So, the performance of vector search is the same as that of Lucene. It's worth noting that performance suffers when configured without sufficient memory, like all HNSW vector indexes.

>> This is probably why Neo4j themselves have released guides on how to build that middleman software I mentioned with Qdrant for viable performance.

No, this is about supporting our customers. Combining graphs and vectors in a single database is the best solution for many users - integration brings convenience, consistency, and performance. But we also recognise that customers might already have invested in a dedicated vector database, need additional vector search features we don't support, or benefit from separating graph and vector resources. Generally, integrating well with the broader data ecosystem helps people succeed.

>> Furthermore, the vectors is capped at 4k dimensions

We occasionally get asked about support for 8k vectors. But so far, whenever I've followed up with users, there doesn't seem to be a case for them. At ~32kb per embedding, they're often not practical in production. Happy to hear about use cases I've missed.

>> Also, they don't allow pre filtering which is a problem for a few people we've spoken to including Zep AI.

We support pre- and post-filtering. We're currently implementing metadata filtering, which may be what you're referring to.

>> AND, it is super memory intensive.

It's no more memory-intensive than other similar implementations. I get that different approaches have different hardware requirements. But in all cases, a misconfigured system will perform poorly.

acefaceZ · 2025-05-28T02:24:55 1748399095

Neo4j performance is horrendous unless you have huge amounts of memory. I would wager that anyone who has used Neo4j for anything related to graphrag or used its vector features knows it’s not a great solution. Anyone can verify this quite easily.

GeorgeCurtis · 2025-05-15T07:47:56 1747295276

Thanks for clearing up some of these points. Really pleased the post has reached the industry leaders, and I genuinely appreciate your response :)

With regards to the prefiltering, I was referring to filtering during the neighbor search in the HNSW. If you wanted 10 vectors, but with specific conditions, you'd have to retrieve surplus vectors and then perform the filter, hoping you were left with enough. Does that sound right? I suppose that is metadata filtering.

I should've been more specific about the memory issue. Not tying to slate you here, just that a lot of complaints I've read online were about issues with memory overflows using the vectors. But of course, a misconfigured system would definitely perform poorly :)

Thanks again for the response!

GeorgeCurtis · 2025-05-14T03:21:21 1747192881

The fortunate thing about our vector DB, like I mentioned in the post, is that we store the HNSW on disk. So, it is much less intense on your memory. Similar thing to what turbo puffer has done.

With regard to the graph db, we mostly use our laptops to test it and haven't run into an issue with performance yet on any size dataset.

If you wanna chat DM me on X :)

GeorgeCurtis · 2025-05-13T22:24:44 1747175084

Partly because they're working with a monolith that I imagine is difficult to iterate on and it's written in Java. We've had the benefit of working on this in Rust which lets us get really nitty and gritty with different optimisations.

My friend who I worked on this with is putting together a technical blog on those graph optimisations so I'll link it here when he's done

GeorgeCurtis · 2025-05-13T22:19:08 1747174748

This is definitely a problem we want to work on fixing quickly. We're currently planning an MCP tool that can traverse the graph and decide for itself at each step where to go to next. As opposed to having to generate actual text written queries.

I mentioned in another comment that you can provide a grammar with constrained decoding to force the LLM to generate tokens that comply with the grammar. This ensures that only valid syntactic constructs are produced.

GeorgeCurtis · 2025-05-13T22:00:26 1747173626

Wow! I enjoyed reading it a lot and it was definitely inspiring for this project!

Would love to talk to you about it and make sure we capture all of the pain points if you're open to it? :)

rohanrao123 · 2025-05-13T22:16:36 1747174596

Absolutely, will DM you on X!

GeorgeCurtis · 2025-05-13T21:50:21 1747173021

We can build an ingestion engine for you :)

We've built SQL and PGVector ones already, just waiting for someone who could make use of other ones before we build them.

Let us know! Twitter in my bio

GeorgeCurtis · 2025-05-13T21:48:14 1747172894

General consensus is it's really slow, I like the concept of surreal though. Our first, and extremely bare bones, version of the graph db was 1-2 orders of magnitude faster than surreal (we haven't run benchmarks against surreal recently, but I'll put them here when we're done)

datastorydesign · 2025-05-14T13:04:10 1747227850

Hey George, Alexander from SurrealDB here.

Congratulations on the launch! This is a very exciting space, and it's great to see your take on it.

Running fair benchmarks, not benchmarketing, is a significant effort and we recently put in this effort to make things as fair and transparent as possible across a range of databases.

You can see the results and links to our code in the write-up here: https://surrealdb.com/blog/beginning-our-benchmarking-journe...

We'd be very interested in seeing the benchmarks you'd run and how we compare :)

You can sacrifice many things for faster performance, such as security, consistency levels or referential integrity.

I'm genuinely curious to learn what design decisions you will make as you continue building the database. There are so many options, each with its pros and cons.

If you would like to have a chat where we can exchange ideas, happy to do that :)

riku_iki · 2025-05-16T13:25:36 1747401936

> You can see the results and links to our code in the write-up here: https://surrealdb.com/blog/beginning-our-benchmarking-journe...

page says your benchmark runs on 5M of records only. Is it incredibly small dataset in current world, and is it more micro-benchmarking?

Also, count(*) query having 5s latency on 5m records is very underwhelming if I understand your tables correctly.

GeorgeCurtis · 2025-05-15T01:26:23 1747272383

I've only just seen this! Thanks so much for the response, would definitely love to chat with you guys.

You're definitely right btw, those weren't concrete benchmarks and I'm excited to see how we compare now :)

mattturck · 2025-05-15T00:45:03 1747269903

I don't think that's all the general consensus

GeorgeCurtis · 2025-05-15T01:28:13 1747272493

I was just going off of what we've heard from users and people in the space, I admit that "general consensus might've been the wrong terminology.

HN For You