For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | zardosht's commentsregister

Another thing to keep in mind is that not all systems are meant to be CP systems.


Another engineer at Tokutek here. As you see, we are up to 2.4, and have been investigating 2.6 and Geo. With all possible features, whether they be from MongoDB 2.6 or things we innovate on our own like partitioned collections, we prioritize and address them based on customer and user feedback.

Also, 2.6 is not an all or nothing proposition that needs to be done in one release. Features with the most demand (whether it be the new write commands or aggregation framework improvements) will be done before others. We've done this before. When we released 1.0 that was based on 2.2, we also released hash based sharding with it which was a 2.4 feature. We did so because users demanded it.

As for pushing bug fixes upstream, we file bugs when we see them. Our VP of engineering was a winner in the MongoDB 2.6 bug hunt with SERVER-12878. SERVER-9848 and SERVER-14382 are among the bugs I've filed.


Thanks for the response, I read a post on the mongo-user group , and that's what I noticed, that a number of features are ported as and when necessary. Don't read what I say in a very negative sense, because I'm mostly curious, and it's my opinion that sometimes the little that we (I) get exposed to regarding TokuMX specifically is that it's superior to Mongo, that it's a "choose us or lose out" thing, but that happens when one doesn't follow a certain topic, but only sees it being mentioned here and there (understandable since Mongo has been the subject of "my start-up failed, and I blame it on Mongo; so burn Mongo" kind of discussions).

One more question if you don't mind: since MongoDB will support various storage engines from 2.8, including Tokutek's storage engine (can't remember its name); notwithstanding other innovations on TokuMX, would switching from mmap to Tokutek's storage engine mean that one ends up with Mongo having geo-indices and other bells, while having TokuMX's main feature?


Your last question is a bit loaded with a bunch of "ifs", so let's unwind it. I don't know what MongoDB will "support" as far as other engines go. But assuming we, Tokutek, release something that we support that is our engine plugged into 2.8 using MongoDB's storage engine plugin, then according to the design we heard about at MongoDBWorld, that product will be what you think it is: Mongo with geo and "other bells", and TokuMX's compression + write performance.

But 2.8 is a bit away and the storage engine API is a very fresh development. I don't think anyone is in a position to be able to really guarantee what it would look like and how TokuFT (https://github.com/Tokutek/ft-index/) will plug into it. I definitely cannot make any promises.

If you are interested in TokuMX + some missing features from MongoDB (sounds like geo), and don't mind discussing your needs and use cases with our sales guys, please give us feedback at http://www.tokutek.com/contact/. As I mentioned previously, user feedback drives what we do, so at the very least, you can provide some additional data points.


We didn't need GEO indexing but what Toku does offer is pretty exciting. Primary wins for us include multi-query transactions, compression, fractal tree indexes (thus overall insert and query performance), and clustering indexes.


Not for single-server performance. The database level lock severely limits MongoDB's single server performance. Just look up the sysbench benchmark comparing MongoDB with TokuMX (which I work on)


TokuMX, which I work on, has document level locking and compression right now.


trying amisaserver found it somewhere in the comments below. claims to have MVCC.


sweet then, will give tokumx a shot as well. Thanks


TokuMX does have MVCC


(I work for Tokutek)

Write concurrecy: yes, TokuMX does not have a database level reader/writer lock.

Index Building: yes, fractal trees can write data much more efficiently, so if index building is a problem, I bet TokuMX solves it.

Practically reducing file size: to be honest, I am not sure because due to our great compression, this has not been a general issue for our users. Our reindex command could reduce file size, but I cannot point to examples.

One of our big goals is to address storage issues MongoDB has.


ddorian, Can you elaborate what that means?


What i mean, every node is the same, no mongos , you just connect to one random mongod and it handles the mongos funcionality.

So if you grow, you add 1 node, not a replica-set(that could be 3 nodes if you have 3x replication)


Unfortunately, this would break compatibility with existing MongoDB applications more than we would probably be willing to do. However, there's no reason RethinkDB couldn't use Fractal Tree indexing instead of B-trees, given some engineering effort.


but rethinkdb doesn't have range-sharding (they had it but they are/did change it to random(id), also no sharding(custom_field))


Roger,

I work at Tokutek (and wrote the post above). I'm sorry you ran into issues trying out TokuMX. I assure you, we are "ready", as we have users running in production.

Nevertheless, you ran into problems and that is unfortunate. If you have details, can you please share them with the tokumx-user google group? We might be able to help. I suspect the transition to using a transactional system like TokuMX where entire statements are transactional is resulting in some "gotchas", but that is just an educated guess.

-Zardosht


I mean ready in the sense that pointing code that worked flawlessly against MongoDB to TokuMX then just works flawlessly too.

I uninstalled Toku and went back to MongoDB so I can't provide any further testing. (The mongorestore takes days.)

I can tell you want code was running at the time. It reads events sorted by user id and timestamp, and then discovers session boundaries in that. A new session object (in a different collection) is written out with all the events as a subdocument list. (In rarer cases an existing session object is updated.) This was happening in 8 separate processes all in Python/pymongo. There are no statements running that affect more than one document, nor any need for transactions.


If you were using upserts I expect you were having problems due to the optimizer retrying all possible plans (including table scan) periodically. This is reflected in https://github.com/Tokutek/mongo/issues/796 and is fixed in 1.4.0. If you'd like to try another evaluation, get in touch with us and we can help you track down whatever problems you see.

Not all mongodb code will optimally use tokumx without any changes. Concurrency is hard and mongodb encourages some patterns that are bad for any concurrent database. For example, count() for an entire collection is not, and could never be, as cheap in a concurrent database like tokumx as it is in mongodb.


Thanks for the offer, but the mongorestore times (against MongoDB) being over a week makes this too risky.

The code making changes was insert (mostly) with a few upserts, but the latter was by _id. My hypothesis as to the cause is that tokumx adds implicit transactions and then there are some arbitrary restrictions around those transactions (eg how many outstanding at once, timeouts in lock acquisition) and after a few hours one of those was hit. The error message was something about being unable to start a transaction.

> Not all mongodb code will optimally use tokumx without any changes

The goal wasn't to be optimal or anything like that. It was initially about space consumption (where you did really well) and verifying the same client code ran correctly. We have two setups so one would run toku and one mongodb and data processing results compared.


Ok. Well, you said you were waiting for it to be ready, and I think it is. We'll be here when you get a week free to tinker.


MongoDB 2.2 and 2.4


Just as in our TokuDB for MySQL product, we have zlib and lzma compression available.


Staying up to date is not an all-or-none proposition. We use feedback to drive direction. For example, even though we used 2.2 as a base, user feedback compelled us to include hash-based sharding, a 2.4 feature, in this release.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You