More

romange · on May 30, 2022

We will be able. It's just matter of time and priorities.

romange · on May 30, 2022

I do not know much about RDMA. Our goal is to provide a memory store that is fully compatible with Redis/Memcached protocols so that all the existing frameworks could work as before. I am not sure how RDMA fits this goal.

romange · on May 30, 2022

Thanks for providing your feedback. As Redis Manifesto states - our goal is to fight against complexity. antirez - you are our inspiration and I seriously take your manifesto close to heart.

Please allow the possibility that Redis can be improved and should be improved. Otherwise other systems will eventually take its market apart.

I appreciate your comments very much. I've wrote about you in my blog. I am an engineer and I disagree with some of the design decisions that were made in Redis and I decided to do something about it :) to your points:

1. DF provides full compatibility with single node Redis while running on all cores, compared to Redis cluster that can not provide multi-key operations across slots.

2. Much stronger point - we provide much simpler system since you do not need to manage k processes, you do not need to *provision* k capacities that managed independently within each process and you do not need to monitor those processes, load/save k snapshots etc. Our snapshotting is point in time on all cores.

3. Due to pooling of resources DF is more cost efficient. It's more versatile. We have a design partner that could reduce its costs by factor of 3 just because he could use x2gd machine with extra high memory configuration.

Regarding your note about memcached - while we provide similar performance like memcached our product proposition is anything unlike memcached and it's more similar to Redis. Having said that - I will add comparison to memcached. I do believe that memcached as performant as DF because essentially it's just an epoll loop over multiple threads.

Re you comment about snapshotting. We also push the data into serialization sink upon write, hence we do not need to aggregate changes until the snapshot completes. The complex part is to ensure that no key is written twice and that we ensure with versioning. I do agree that there can be extreme cases where we need to duplicate memory usage for some entries but it's only for the entries at flight - those that are being processed for serialization.

Update: re versioning and memory efficiency. We use DashTable that is more memory efficient that Redis-Dict. In addition, DashTable has a concept of bucket that is comprised of multiple slots (14 in our implementation). We maintain a single 64bit version per bucket and we serialize all the entries in the bucket at once. Naturally, it reduces the overhead of keeping versions. Overall, for small value workloads we are 30-40% more efficient in memory than Redis.

antirez · on May 31, 2022

Thanks for the nice words romange.

The complexity here can be seen in two ways: complexity of deploying more Redis instances, or complexity of the single instance. It's a trade off. But I think that Redis may go fully threaded soon or later, and perhaps your project may accelerate the process (I'm no longer involved, just speculating).

1. Your point about Cluster, I addressed it many times: the point is, soon or later even with multi-threading you are going to shard among N machines. So I believe that to have this problem ASAP is better and more "linear".

2. Already addressed in "1" and my premise.

3. Yep there are advantages in certain use cases related to cloud costs and so forth, that's why maybe Redis will end fully threaded as well.

About memory efficiency, what I meant is that to have versioned data structures, that is an approach to do user-space copy on write even in the case of multiple changes to large single keys (big sorted set example), you need more memory likely, to augment the data structure. Otherwise the trick is to copy the whole value, that has other issues. It's a tradeoff.

phamilton · on May 31, 2022

> soon or later even with multi-threading you are going to shard among N machines

In a world where cloud providers offer instances with terabytes of memory and 128 vCPUS (e.g. aws x2iedn.32xlarge family maxes out at 4TB, gcp m2 family maxes out at 12TB) is that really inevitable? Applications serving 10s of millions of users likely won't come anywhere close to that limitation.

romange · on June 1, 2022

It's interesting comment, phamilton. I agree with antirez and I agree with you. full discloser - I am ex-googler. I believe that "horizontal scale" movement staryted from Google. You could see it in their GFS and Mapreduce papers from early 2000s. And by that time they were completely right, of course. Geniuses of Jeff Dean and Sanjay Gwattemat put Google year ahead compared to other state of the art for more than a decade. There was not even one system developed in Google that is not horizontally scalable (i am omitting acquisitions here). We used to joke in 2009 that the most expensive server we have is our perforce server. Nowdays it's internally developed source control system that is backed by Bigtable. So of course, antirez is right, of course! If you need infinite scale - you must go horizontally.

But the reality is that most companies and most use-cases do not need terrabytes of data. I would say that today the comfort zone for Dragonfly is upto 512GB per instance (1). So dragonfly solves the issue for... I would say 99% percent of the use-cases. Only the last percentile would need horizontal scale, and probably their business is already big enough, so that they can affort a high-quality eng team to work with horizontal clusters.

(1) We need to improve some things (mainly around serialization format of rdb) to reach another magnitude of 4TB. Nobody wants to wait for days to load a 4TB snapshot.

phamilton · on June 1, 2022

512GB is probably good enough for 99%. 4TB is probably good enough for 99.9%.

Context for all this: I lead tech at Remind, where we support 30M MAU with around 1 engineer per million MAU. Years ago we focused heavily on horizontally scaling, including our datastores. We stored a few hundred GB in clustered redis, dozens of TB in dynamoDB, a half dozen postgres clusters, etc.

The past year we've reversed course and doubled down on AWS Aurora. Every time we've moved data into Aurora our stability has improved, our devs can move faster and our costs go down.

We've got an order of magnitude headroom in Aurora and frankly our code is far from well optimized. There's so much to be gained from simplicity.

sanjayio · on June 1, 2022

I’m in nerd heaven.

reconditerose · on May 31, 2022

As one of the folks that currently works on Redis, I want to highlight the "Redis can be improved and should be improved". There is a lot of really good ideas put forth that are likely worth consideration in the Redis project as well. There has been a lot of conversations about renewing multi-threading, especially to address the point of simplifying management and better resource utilization.

Glad to see you guys made a lot of progress, although a little disappointing you chose to go down the path of building yet another source available DB and not contributing to open source.

romange · on May 31, 2022

I think at this point of time and the state of this 14 years old project the status quo can be changed only from the outside.

If chrome was not born you would still use microsoft explorer with aspx sites.

reconditerose · on May 31, 2022

I think that is a pretty baseless claim, even more so given that the project had a complete leadership change only 2 years ago. There is a lot of interest in revamping the internals of Redis while still trying to maintain the original tenets that established Redis. There isn't much justification for saying it could only be changed from the outside when, AFAIK, you didn't try to engage with the system at all.

romange · on May 31, 2022

Believe me, I kinda tried to engage with the system. Maybe a bit different system ;)

I based my clain on my personal experience and it's not related to Redis specifically. People are people. They become defensive when something new tries to replace old views. Usually it does not work when you have no leverage and challenge the status quo within the system that has no good reason to change i.e loose money and margins due to increased efficiency of the underlying infrastructure. It's a classic innovator's dillema. Imho...

teach · on May 31, 2022

I just want to say that this comment alone has put me off looking into Dragonflydb.

romange · on June 1, 2022

Sorry to hear that. But let me quote antirez response from this topic: "...But I think that Redis may go fully threaded soon or later, and perhaps your project may accelerate the process". Lets at least agree that if Dragonfly will push redis to evolve, it's a good thing.

tayo42 · on May 31, 2022

Can someone/outsider realistically show up and start working on making redis multi threaded?

reconditerose · on May 31, 2022

Yes, the core folks in Redis now have outlined a plan for moving towards multi-threaded awhile ago. It's honestly not made a lot of progress because raw performance matters a lot less than is argued here. As was succinctly mentioned by antirez already, Redis scales comfortably to 100s of millions of QPS with cluster mode. So, it's really building a lot of custom functionality to support better vertical scaling. Which is useful, especially when the vertical scaling keeps you on a single process.

The conversation happened here, https://github.com/redis/redis/issues/8340, and it's not like the most pressing issue for the project. It's also not as complex as what was implemented for dragonfly, which basically has native support from the ground up for concurrent programming during command execution. It would be hard to do in C as well.

tayo42 · on May 31, 2022

> raw performance matters a lot less than is argued here

It matters a lot for where i work, we believe multithreading is holding redis peformance back.

> Redis scales comfortably to 100s of millions of QPS with cluster mode.

is there somewhere i can read more about that? curious about the server needs to do that. i worth key/value clusters a little larger then that. if possible it would be cool to use redis for it.

reconditerose · on May 31, 2022

> It matters a lot for where i work, we believe multithreading is holding redis peformance back.

Do you run stand alone Redis or Redis cluster? Multithreading is a strategy, but it isn't the only strategy to improving raw performance.

> is there somewhere i can read more about that? curious about the server needs to do that. i worth key/value clusters a little larger then that. if possible it would be cool to use redis for it.

Redis scales to about ~1000 nodes with each node supporting about 200k qps, math gets you to around 200m qps as the practical upper bound for Redis these days. Multithreading would potentially help us push clusters to the billion RPS boundary.

romange · on May 31, 2022

Do you actually have a customer that runs a cluster of 1000 nodes? No need to answer this question. I can guess the answer.

romange · on May 31, 2022

@tayo42 - we would love to engage and see how Dragonfly can fit the use-cases of your company.

tayo42 · on May 31, 2022

Hey sure don't mind chatting. How should I reach out?

romange · on June 1, 2022

@tayo42 roman@dragonflydb.io

daniele_dll · on May 31, 2022

https://github.com/danielealbano/cachegrand :)

I am not going to get into what's better and what's not, especially because I haven't released the v0.1 yet and therefore it's not usable, but I am working on cachegrand which is aims to be (also) a redis compatible platform.

I have done A LOT of research and development before picking up the current architecture (you can see it from the amounts of commits) and I am trying to test as much as possible (Almost 1000 unit tests so far, but there is still plenty to do).

if you look at the repository please bare in mind that: - there is no v0.1, the code available in the repo only supports the basic GET, SET and DELETE (apart from a few additional commands like HELLO, QUIT, PING)

- the code in main currently supports only storing the data on the disk, which is also why the tests are failing, I am doing some general refactoring and need to bring back the in-memory storage (issue n. 88)

- there are some general performance metrics available on the repo

- don't enable verbose logging, it's currently synchronous :) - cachegrand is able to fully run in single thread mode so I can actually compare it to redis (well when it will make sense)

- only linux, requires a kernel 5.8 at least (e.g. it's provided by ubuntu 20.04.2 lts, but I didn't really care too much as it will take quite a bit more before I get the first stable version and by that point the kernel requirement will not be an issue anymore)

What I can say is that the project really focus ONLY on performances, therefore is not as memory saavy as redis or similar platforms, and it actually aims more to compete with Redis Enterprise long term than just Redis, on the other end it implements a number of things from the ground to boost massively the performances:

- cachegrand architecture follows almost the share nothing principle with the only exception of the hashtable because it has been built around that need

- I implemented from ground an hashtable capable to deliver lock-free and wait-free GET operations and which uses localized spinlocks for the SET and GET operations, basically the contention is spread across the hashtable instead of being bound to X queues

- the hashtable also support SIMD operations (AVX, AVX2 and AVX512F), it's heavily optmized to reduce the memory accesses is able to embed short strings in the bucket to further reduce memory accesses

- cachegrand will support both memory and ad-hoc backend for the storage that is going to be basically a time-series database (cachegrand is not bound to redis functionalities, the redis command set is just a way to expose these for now)

- I implemented from scratch a fiber library able to do a context switch in just 7ns

- the network and storage backend are modular, currently it really only support io_uring but the goal is to also add XDP+the FreeBSD network stack support for the network (e.g. similar to what has been done with F-Stack and DPDK) and then io_uring with the NVME passthrough for the storage (not sure if I will also add support for SPDK)

- I have also implement an ad hoc memory allocator which waste some memory but it's able to do memory allocations and free in O(1) (here a nice chart https://www.linkedin.com/posts/danielesalvatorealbano_dublin...)

- most of the code is built aiming to be zero-copy (there are a few places where it happens right now as I need to fix a couple of things)

Just to underline it, currently it's not possible to play with it, until I merge the branch I am working on, because performances would be terrible (only on-disk storage and currently without caching), the tests are broken for the same reason.

kristoff_it · on May 31, 2022

It seems to me you didn't address the main point from parent: did you benchmark your multithreaded implementation vs a single core Redis? Nevermind the amazing advantages that having to spawn 1 process vs N brings, the question is how does your software compare when Redis is used as inteded.

romange · on May 31, 2022

I benchmarked DF vs single core Redis. If there is a constructive suggestion for a different benchmark that compares similar product propositions I will happily oblige and do that. i.e. what do you mean by using Redis as intended?

Game_Ender · on May 31, 2022

So two options I am curious about:

- If it's a normal configuration partitioning a single large node with multiple instances using Redis cluster

- A cost equivalent cluster of machines with a similar memory size running on Redis cluster

romange · on June 1, 2022

1. I think this is how Redis the company designed their enterprise solution. You can find architecture documentation on their site. 2. Based on my knowledge it should be more or less equivalent. The reason they put it on the same machine (i am guessing here) is because shards on the same node are behind their Redis proxy, that kinda hides the complexity of connecting to each node separately. it's like a gateway to that machine and to all its redis processes.

romange · on May 30, 2022

Ok you got us. We chose dragonglydb and not dragonflystore just because the former sounds better on tongue :)

Having said that we carefully choose to write everywhere in the docs thay we are in-memory store (and not the database).

Btw, I reserve full rights to provide full durability guarantees for DF and to claim the database title in the future.

vvern · on May 30, 2022

dragonflycache sounds reasonable.

romange · on May 30, 2022

If I would choose another language it would be Rust. Why I did not choose Rust?

1. I speak fluently C++ and learning Rust would take me years. 2. Foodchain of libraries that I am intimately fimiliar with in C++ and I am not familiar with in Rust. Take Rust Tokyo, for example. This is the de facto the standard for how to build I/O backends. However if you benchmark Tokyo's min-redis with memtier_benchmark you will see it has much lower throughput than helio and much higher latency. (At least this is what I observed a year ago). Tokyo is a combination of myriad design decisions that authors of the framework had to do to serve the mainstream of use-cases. helio is opinionated. DF is opinionated. Shared-nothing architecture is not for everyone. But if you master it - it's invincible. (and yeah - there is zero chance I could write something like helio in Rust)...

jen20 · on May 30, 2022

Tokio is not a shared-nothing model - you’d be looking at Glommio [1] (from one of the contributors to Seastar) for that.

[1]: https://github.com/DataDog/glommio

romange · on May 30, 2022

And no, it's not just because of io_uring it is faster. It's also because it's multi-threaded, has absolutely different hashtable design, uses a different memory allocator and many other reasons (i.e design decisions we took on the way).

romange · on May 30, 2022

we use io_uring for everything: network and disk. Each thread maintains its own polling loop that dispatches completions for I/O events, schedules fibers etc. Everything is done via io_uring API. All socket writes are done via ring buffer etc. If you run strace on DF you won't see almolst any system calls besides io_uring_enter

romange · on May 30, 2022

Yes, we use io_uring for networking and for disk. io_uring provides a unified interface on linux to poll for all I/O events. Re disk - we use it for storing snapshots. We will use it for writing WALs.

And we have more plans for using io_uring in DF in the future.

romange · on May 30, 2022

The fact that we based our core hashtable implementation on paper from 2020 does not justify it?

akie · on May 30, 2022

I think the key takeaway here is that people are allergic to that word.

sirsinsalot · on May 30, 2022

Not really, it just implies that the competition is not modern, without qualification. I think asking for qualification in this case is fair if we are to conclude Redis and Memcache have aged to the point of needing a replacement.

Modern is used here as a selling point

AtNightWeCode · on May 30, 2022

The word is often tagged on anything new somebody tries to sell. Better to be specific. The problem is that most “modern” things are very old things sold as new ideas. Cause biz. So, nothing specific against this proj.

romange · on May 30, 2022

I do not even know how to to that comparison. Redis and DF share the same protocol and the same API so it's each to compare with memtier_benchmark.

HN For You