More

deepu_256 · on July 16, 2011

A quick question ---

can i say use pymongo in Brubeck and you are saying the blocking nature of pymongo won't affect the async nature of Brubeck ??

j2labs · on July 16, 2011

For some reason my other account is blocked from responding so I created this one.

That is correct about pymongo. Eventlet will convert any drivers that are written entirely in Python into nonblocking driver. Gevent, an alternative to eventlet, can do the same. Brubeck supports both.

In addition to that it also makes ZeroMQ nonblocking. The combination of ZeroMQ support and pymongo, pyredis and pyriak all being available entirely as python (bson is in c tho) is what convinced me I had to write a new framework.

astrofinch · on July 17, 2011

Your account is probably blocking you from responding because of some flamewar prevention feature Paul put in a while ago that makes it difficult for commenters to have nested discussions with one another--the deeper the discussion, the longer you have to wait before making a reply to a user who just replied to you.

espeed · on July 17, 2011

Ahh...did not know that :)

deepu_256 · on July 16, 2011

wonderful.

Can you also shed light on thrift protocol ? i have struggled with the blocking nature of thrift(client).

Does eventlet convert blocking thrift client written in python to non-blocking ?

inportb · on July 16, 2011

You know more about thrift than I do. If the "blocking" code is in python, eventlet should be able to fix it. You can't do anything about the native machine code, though. It appears that the client library includes a c-based codec to improve performance, but it should not be a problem if the transport is based on python sockets.

inportb · on July 16, 2011

I use pymongo with gevent. A small hack is necessary to prevent pymongo from making a new connection for each thread that accesses mongodb (because with gevent, you can have a massive number of threads). I'd imagine the same is true with eventlet.

robspychala · on July 18, 2011

BTW, how is the performance compared to sync pymongo in your use? I'm testing with eventlet and seeing it about 10-20% slower.

https://gist.github.com/1088452

deepu_256 · on July 17, 2011

Do you mind sharing the details of that hack ??

inportb · on July 17, 2011

Sure! https://github.com/inportb/playground/blob/master/Pinako/gev...

deepu_256 · on July 16, 2011

I guess what he means is that for drivers like mysql which are blocking only there is little support in tornado to not freeze the tornado instance during the request response cycle.

inportb · on July 16, 2011

If you do use mysql, you're in luck... here's a pure python client module that can easily be monkey-patched for gevent/eventlet:

http://code.google.com/p/pymysql/

j2labs · on July 16, 2011

That is correct. Eventlet has database pooling for python drivers that are written in C.

See my response to your other comment for how eventlet handles drivers written entirely in Python. In short, it converts them.

deepu_256 · on April 1, 2011

So they are ok with making their tweets and posts on tumblr being exposed to public but they are not ok with their parents being on facebook ??

notahacker · on April 1, 2011

Difference is that Twitter accounts are pseudonymous and deniable, and your parents probably don't know what Tumblr is...

That said, it first became obvious how big Facebook would become when they calmly ignored the protests of those who didn't want to be connected with their younger siblings at high school and thought newsfeeds were intrusive.

dhs · on April 1, 2011

I can assure you that at least some parents do know what Tumblr is.

deepu_256 · on March 31, 2011

This is from my experience of trying to build a news feed (we are about to launch one for our site which gets about a million unique visitors a month).

The approach you suggested is the first approach that i tried. And frankly it worked great when i am testing the feature on my dev macbook with just a couple of 1000's of users. But when i started testing it on our server loading entire legacy data redis ended up taking more than 6GB of RAM for abt half a million users.

i told myself to just plan for 2 times our current traffic and then later on think of a better solution(best thing wud be to plan for 5-10 times your current traffic) and told my CEO that we might have to bump up RAM on the redis machine to 32 or 64 GB in the future(the easiest way but not an elegant way to solve the problem is throw more resources). Also mind that you need atleast 2 such machines to provide failover. i just hoped salvatore(a great guy BTW) would release a cluster solution for redis and will save me.

While testing the feed we came to know we needed to add more such actions from the user and every time you add a new type of action for every user you are looking at increasing the memory of Redis process by substancial amount. you will run out of memory faster than you have planned. i thought VM is the solution to all of this as user feeds typically use only latest data and the entire feed doesn't need to be in the memory. But quite frankly VM has it's own set of problems -- http://groups.google.com/group/redis-db/browse_thread/thread....

Not telling that your solution isn't good. Just wanted to warn you about some potential problems from experience of trying out your solution. Finally i am using cassandra for news feed. Working great as of now. At least much better than my experience with implementing the same with redis. The major problem with cassandra as of now is implementing pagination and counting. distributed counters are coming in 0.8 . pagination is still a major problem though. But for news feed you might not need pagination. providing a load more button as facebook and twitter does should be good enough.

Hope it helps.

mrduncan · on March 31, 2011

I'm not sure what version of Redis you were using, but depending on the data you're storing, 2.2 uses significantly less space than previous versions.

Some data from GitHub: http://technoweenie-ruby-onales.heroku.com/#34

technoweenie · on March 31, 2011

It looks like our approaches are very different. Stratocaster stores lists of integer IDs, which Redis optimizes for pretty heavily. We're taking the list of IDs and doing a multi-get from either memcache or mysql.

It looks like this lib uses Sets of the actual event data. It won't be able to take advantage of the same optimizations.

pjscott · on March 31, 2011

On the plus side, switching from storing zsets of actual event data to storing integer ids should be a pretty minor code change.

waxman · on March 31, 2011

Thanks for the feedback. I really appreciate it.

I do think space could be an issue. To mitigate against this risk I've decided to only keep ~8 pages of each feed in Redis, and compress the values (including removing attributes that won't be shown in the feed). Also, Redis 2.2 is a lot more space efficient than earlier versions.

Our primary datastore is actually MongoDB, which has some similar (NoSQL) characteristics, but I've found that storing the feeds in Redis is actually cheaper than storing the same data in Mongo. That said, I really like Mongo for its richer querying (and greater persistence guarantees), and use it for canonical storage of users, posts, etc.

deepu_256 · on March 21, 2011

This looks really useful.

In the coming weeks i will try and use it in one of the apps i am building on tornado.

Kudos to Greplin.

Just one question. I see Tornado related repos in Greplin Github account. Are you using it in one of your Tornado apps ? In that case any gotchas regarding blocking ?

rwalker · on March 21, 2011

We do use it in Tornado apps. We haven't seen any blocking issues so far as the time to write the tiny file is pretty small plus this happens on a small percentage of requests. It'd probably be worth looking in to an asynchronous model at some point though.

deepu_256 · on Feb 21, 2011

How ??

For the entire site or only some parts ?

just curious.

equark · on Feb 21, 2011

Not sure. I just gathered that much from the mailing list. My impression was they are using it for everything though.

deepu_256 · on Feb 21, 2011

Hi nechmads, Are you willing to review the business plan/provide letters of recommendation if you really like the application ??

saw that Letters of Recommendation can be from StartUp Chile Alumni. Hence asking.

deepu_256 · on Feb 20, 2011

Well, may not exactly big but Quora is using Tornado in production.

tzury · on Feb 21, 2011

Yes, but not the whole Quora anymore, Right now, they use it just for pushing chunks of javascript to browsers

[1] http://www.quora.com/Has-Quora-made-any-modifications-to-Tor...

[2] http://www.quora.com/How-does-Quora-use-Tornado

deepu_256 · on Feb 13, 2011

Wow. Hearing about this guy for the first time. Inspiring ....

deepu_256 · on Feb 5, 2011

i played with gizzard for one of our services with redis as backend. twitter engineers on #twinfra IRC channel helped me a lot on understanding gizzard and flockdb src code. Not yet using in production though.

You have any questions ?

HN For You