More

eventreduce · on April 16, 2020

The biggest usecase for EventReduce is realtime applications. Most technologies for these like Firebase, AWS AppSync etc. work on non-relational data. If you want to use EventReduce with relational queries, you have to make them non-relational before, for example by using materialized views. If you do not want to do that, you should not use this algorithm in its current featureset.

danpalmer · on April 16, 2020

My guess would be that if you're at a scale where you're thinking about these sorts of things, you are also at a scale where you're running on multiple machines. How does EventReduce share writes across the cluster?

eventreduce · on April 16, 2020

EventReduce is an algorithm and not a database-wrapper. It will not care about your writes or if your database layer is a cluster and so also not affect them.

danpalmer · on April 16, 2020

Sorry I wasn't clear in my original post.

I'm thinking about the application layer. If you have an application that writes data to a table, it's typical to run multiple instances of that application to support scale and reliability requirements.

If I send a write to one instance, how does it communicate and synchronise that write with the other application instances?

I ask because this can be a tricky thing to do, especially when consensus is required, as consensus algorithms such as Raft/Paxos require a number of network roundtrips which will introduce latency, and actually account for much of that latency in the database examples given in some cases.

eventreduce · on April 16, 2020

EventReduce is a simple algorithm. It does not care or affect how you handle propagation of writes or how you handle your events, transactions or conflicts.

See it as a simple function that can do oldResults+event=newResults like shown in the big image on top of the readme.

danpalmer · on April 16, 2020

This means then that if you run multiple application servers, which most do, that you’ll need to implement a data distribution mechanism of some sort.

I must admit, with limitations like this I’m struggling to figure out the use cases for this.

Edit: so I guess this is easier using the change subscriptions you mention in other comments. That does mean many subscribers, but hopefully that’s minimal load. This has the trade-off that it’s now eventually consistent, but I suppose that’s not a problem for many high read applications.

I’m still feeling like this could be solved in a simpler way with just simple data structures and a pub sub mechanism. Now I think of it, we do a similar thing with Redis for one service, and a custom Python server/pipeline in another, but we’ve never felt the need for this sort of thing.

Do you have more details about specific applications/use cases, and why this is better than alternatives?

eventreduce · on April 16, 2020

I think the best example for why this is useful is described by david glasser at his talk about the oplog driver used in meteor.js https://www.youtube.com/watch?v=_dzX_LEbZyI

cntlzw · on April 16, 2020

thank you for the clarification

eventreduce · on April 16, 2020

This is a great question. So EventReduce is only the algorithm that calculates your new results. The parsing of SQL is not done by it, you have to bring it by yourself. This works over providing some information about the query like Sort-fields and query-matchers. This is described good in the JavaScript implementation [1]. Providing this functions works easy for NoSQL-Queries because they are better composable. For SQL-queries you have to do some work before you can use EventReduce.

Also see the limitations of EventReduce in the readme.

[1] https://github.com/pubkey/event-reduce/tree/master/javascrip...

eventreduce · on April 16, 2020

There is a big difference between a change-stream and a realtime query. For example mongodbs cursor-stream is a good way to observe the events that happen to a specific collection or documents that match some criteria. If you want the realtime-results of a query that has sorting, skip limit etc. than it is really hard to warp the changestream into this. In fact this is exactly what EventReduce could do for you.

For more information about the difference I recommend the video "Real-Time Databases Explained: Why Meteor, RethinkDB, Parse & Firebase Don't Scale" https://www.youtube.com/watch?v=HiQgQ88AdYo&t=1703s

truth_seeker · on April 16, 2020

>> There is a big difference between a change-stream and a realtime query. For example mongodbs cursor-stream is a good way to observe the events that happen to a specific collection or documents that match some criteria. If you want the realtime-results of a query that has sorting, skip limit etc. than it is really hard to warp the changestream into this.

Have you actually tried it from mongo shell or any mongodb client driver ?

In the official document link which i shared it is clearly mentioned it supports aggregation pipeline. Any operator which is compatible with aggregation pipeline framework including "$sort" and "$skip" can be used. You can also use JOIN like operator "$lookup" or "$graphLookup".

See this link for info https://docs.mongodb.com/manual/core/aggregation-pipeline-op...

eventreduce · on April 16, 2020

Yes I used it. I actually know it really well. I also did performance comparisons with mongodb and mongodbs change stream and cursors. What I posted here is just an algorithm. You could now compare it to mongodb (a product) and say it is a "more flexible solution" but I do not see the point in directly comparing it simply based on the documentation of both.

truth_seeker · on April 16, 2020

>> Yes I used it. I actually know it really well. I also did performance comparisons with mongodb and mongodbs change stream and cursors.

Can you share the link for the code and data in Database against which you are querying to prove your claim ?

eventreduce · on April 16, 2020

No and I also do not want to "claim" something. Feel free to do your own tests.

eventreduce · on April 16, 2020

There is a big difference between a database with an event stream and a 'realtime query' that can be created with event reduce.

scott_s · on April 16, 2020

What is that difference?

eventreduce · on April 16, 2020

I recommend the video "Real-Time Databases Explained: Why Meteor, RethinkDB, Parse & Firebase Don't Scale" https://www.youtube.com/watch?v=HiQgQ88AdYo

scott_s · on April 16, 2020

That does not answer what differentiates your solution. I work on steaming systems. I am aware of the spectrum of online, latency aware data processing. But what I can tell from your solution is that the changes are coming from the database itself. Since, as I understand it, the database is the still the source of all data, I don’t see why your solution is any faster than continuous queries in a database.

eventreduce · on April 16, 2020

Yes this is correct. The performance benefit comes from doing all this stuff on the CPU instead of using disc-io. Also the internal binary decision diagram of EventReduced is optimized in a way to run less logic then a query would do. This makes it even faster then running the query again with an in-memory database.

ComodoHacker · on April 16, 2020

And the main cost of this (questionable IMO) benefit is losing consistency, which is losing any change to DB not coming from the calling app. You haven't mentioned this cost anywhere.

eventreduce · on April 16, 2020

The writes are not tunneled somehow through this algorithm. You still use the database like your normally would do. So the consistency is not affected.

Also this is an open source project, not something I want to sell you. Feel free to make a PR/issue with any open topics that are not mentioned in the docs.

ComodoHacker · on April 16, 2020

>The writes are not tunneled somehow through this algorithm

Then I fail to understand how it works. How Event-Reduce becomes aware of these "write events"?

>this is an open source project, not something I want to sell you

You made it open source so others can use it, right? They better be making an informed decision whether your solution suits their needs.

eventreduce · on April 16, 2020

You have to provide the events by yourself. See EventReduce as a simple function that can do oldResults+Event=newResults.

And yes, you should always do testings before you use open source stuff. There is no warranty use it on your own risk.

ComodoHacker · on April 16, 2020

OK, so you don't "tunnel writes through" EventReduce, you "tee" them to EventReduce.

Anyway, to maintain consistency, you have to limit yourself to one process of your app. No sharding, load-balancing etc. This is significant limitation, and it's not obvious. I encourage you to mention it in README.md.

eventreduce · on April 16, 2020

I encourage you to read the readme and check out the demo. EventReduce is nothing magically drills out your database and affects the consistency of your write-accesses.

It is a simple algorithm that is implemented as a function with two inputs and one output.

EdwardDiego · on April 16, 2020

> How Event-Reduce becomes aware of these "write events"?

Some DBs expose an event stream, for example, PG:

https://www.postgresql.org/docs/current/logicaldecoding-expl...

twic · on April 16, 2020

> For the different implementations in common browser databases, we can observe an up to 12 times faster displaying of new query results after a write occurred.

Is this intended to be an optimisation on top of localStorage and so on? If so, at least you don't have to worry about multiple writers.

eventreduce · on April 16, 2020

localStorage is no database, check out the demo page.

eventreduce · on April 16, 2020

I do not think this has many in common. Lambda is used for stream processing much data. EventReduce is used for optimizing the latency of much (repeating) queries.

eventreduce · on April 16, 2020

No, see the FAQ in the readme:

Materialized views solve a similar problem but in a different way with different trade-offs. When you have many users, all subscribing to different queries, you cannot create that many views because they are all recalculated on each write access to the database. EventReduce however has a better scalability because I does not affect write performance and the calculation is done when the fresh query results are requested not beforehand.

dahauns · on April 18, 2020

Others have pointed this out already: This is wholly dependent on the RDBMS used, and Oracle offers incremental refresh on demand. I have to admit though that I mistakenly thought that MSSQL would as well...

eventreduce · on April 16, 2020

1. No I do not have a paper. I thought a lot about publishing a paper first but then decided against it, because I think that good code and tests and demos are more valuable.

2. EventReduce is mostly useful for realtime applications. I myself use it in a NoSQL database (RxDB). There you stream data and events and a single document write is the most atomic 'transaction' you can do. If you need transactional serial writes and reads that depend on each other, you would not use EventReduce for that.

3. EventReduce is just the algorithm that merges oldResults+Event. It assumes that you feed in the events in the correct order. Mostly this is meant to be used with databases that provide a changeStream where you can be sure that the order is correct.

4. Sort order matters because EventReduce promises you to always return the same results as a fresh query over the database would have returned. When the sort order is not predictable, the returned rows from a query depend on how the data is stored in the database. This order cannot be predicted by EventReduce which means it will then return a wrong result set.

PS: BDDs are awesome :)

DarkWiiPlayer · on April 16, 2020

It would probably be a good idea to write a paper at some point; it's simply easier to read a document explaining the algorithm with some pseudocode than to dig through an actual codebase with all the messy language-details in between the parts that actually matter.

eventreduce · on April 16, 2020

I understand that reading the plain source code is more painful then reading a paper.

There are many different trade-offs between a paper and the current repository with source code. For me the biggest argument was that EventReduce is a performance optimization. So to be sure if it really works and is faster, you always need an implementation since you cannot predict the performance from a paper.

Because I did not have time for both, I only created the repository with the implementation. Maybe a paper will be published afterwards.

Laakeri · on April 16, 2020

The value of a paper is the peer review by experts.

throwaway_pdp09 · on April 16, 2020

So if I get you, it's for append-only data - probably no updates, definitely no no deletions? Still don't get how you don't need logical clocks to pick out the delta(s), but thanks for your prompt answer.

Edit: your example gives replaceExisting() so that's supporting an update of some kind.

eventreduce · on April 16, 2020

No it is explicitly not for append-only data. It works with inserts, updates and deletes. I think I have problems understanding what exactly you mean by the need for a logical clock. The algorithm is feeded with the old query results plus one event, and then returns the new query results. Since there is only one event at each point of time, it does not have to order or maintain them.

tyingq · on April 16, 2020

I had the same question as #2. Basically, it has to be the front-end to any event that reads/writes the data, in strict order of occurrence?

eventreduce · on April 16, 2020

Not exactly. To use EventReduce you must have a changestream out of all writes to your data int the correct order. You can do that by wrapping a frontend over your database.

But easier you do that by using a database that already provides a changestream like couchdb, Postgres, mongodb and so on.

superpermutat0r · on April 16, 2020

BDDs you are using, are they zero-supressed decision diagrams or it was not necessary to do these kinds of optimizations?

eventreduce · on April 16, 2020

Yes the BDD is minimized with the two rules (reduction and elimination). Also the sorting of the boolean functions is optimized via plain brute forcing.

The was no good JavaScript implementation for BDDs so I had to create my own one https://github.com/pubkey/binary-decision-diagram

superpermutat0r · on April 16, 2020

Cool, I've checked your code and it's not zero suppressed BDDs, although it might not be a performance gain if you used ZDDs. (zero supressed BDDs are very good at representing sets of permutations/combinations etc. but as far as I understand you have all the possible permutations encoded in the BDD, not a subset of all possible permutations).

eventreduce · on April 16, 2020

Thanks for pointing that out. I never heard of materialize.io but I will dive into it. On the first glance it looks like materialize is more like a full product while EventReduce is just something that you use on top of your existing solutions.

eventreduce · on April 16, 2020

I do not think so. If you check the example schema, it is very hard to understand it. What does ':lat' mean? is it a string or a number? What does '!b' mean, can it only be false?

philliphaydon · on April 16, 2020

If you read the whole example:

lat = custom validator

b = shorthand for boolean (as is s for string)

! = optional

So for me reading the whole example its very easy to understand and digest.

eventreduce · on April 16, 2020

Yes, reading the docs explains what these keywords mean. But by just looking at the schema it is impossible to understand what that is. And if you check the definition of clean code which is something like "intuitive understandable" then it comes clear that this is not clean.

philliphaydon · on April 16, 2020

Ah I see where you’re coming from. Good point.

catlifeonmars · on April 16, 2020

The “!” for optional is the only thing that confused me. Why not “?” for optional?

yusufnb · on April 16, 2020

Updated to ? in 1.0.3

philliphaydon · on April 17, 2020

Nice that makes more sense!

HN For You