This is great. Wasn't too clear on exactly how this is used from the documents (e.g. do I just run 'python') - didn't seem that there where any 'imports' either.
I was considering linking to the README (the index of the readthedocs page) but I wasn't sure if that would highlight the C inlining as much as I wanted. To summarize from the readme: After `pip install`ing the package, you can run quasiquoted code by including `# coding: quasiquotes` and then running under the normal CPython interpreter.
This still doesn't make it easy to understand how it is executing. Is there a hook on import that detects the file in which it's imported, searches for the # coding language pragma line, and then actually alters the text of the source code before the Python interpreter resumes parsing the rest of that file?
Initially I thought this was something like James Powell's "astexpr" literals example (towards the end of < http://www.youtube.com/watch?v=2O7yj-Nh6AY >), which would require actually implementing a new C-API type, adding parser logic for its literals, and recompiling Python.
If you can add a comment or two to clear up my confusion, I would really appreciate it.
I think some of this might be addressed in the implementation section of the docs.
http://quasiquotes.readthedocs.org/en/latest/impl.html
To summarize: there is no import hook or ast parsing happening. Instead, this works by manipulating the way python reads the bytes and converts it to strings. When the raw bytes are read, it looks for token patterns that match the syntax of a quasiquote. It then replaces those patterns with normal function call syntax that is valid. A cute thing that can be done (that I need to document) is that you can "translate" any python from quasiquotes by doing something like:
'[$qq|this is the body|]'.encode('utf-8').decode('quasiquotes')
"qq._quote_expr(0,' this is the body')"
Ahh, that makes sense now. So it's leveraging the built-in hook for custom decoding. I actually did not know that the `# coding: utf-8` happened at parse time, nor that it allowed for custom decoding function.
I'd be very happy to help better document this stuff in the code. Searching around, there are not very many good documents describing how to create and register a custom decoder.
I have read and re-read your setup.py file a few times now, and followed the trail to the .pth file and to the locations where codecs.register is used. But it's still not clear to me how this functionality gets "installed" such that the decoder hook that parses your decoder's name ("quasiquotes") is able to map that back to the search_function from the installed module's source code.
Basically, how does the codec registry become permanently aware that "quasiquotes" maps to the search function in your installed package?
The .pth imports a file which at module scope registers a search function with the codec system. This function checks if the name is 'quasiquotes', if not, return None otherwise return a custom object that defines all the methods needed to turn bytes into string objects or go the other way. Because it is a pth in the site-packages, it gets executed at interpreter start. If you wanted to chat more about this you can email me with the email on my gh page. Not sure how long hn threads normally stay up.
Did you consider Summingbird? Seems like a lot of what you are doing might have been simplified by using Summingbird rather than building separate speed and batch layers in Hadoop / Spark.
That's a great point for our next session - Summingbird would be perfect for everyone trying to implement Lambda Architecture without having to produce separate code for streaming and batch.
Content free - there is really no information in this article outside of the fact they say Norway's Sovereign fund use "Block Trades" which is really what dark-pools are supposed to be all about. Dark pools and block networks have been around for at least the last 5 years, and introduce their own set of issues (not getting your orders filled, showing Goldman Sachs your order on their promise that they won't don't anything with that information :-)
Well, they do point out why you don't want to be a little player on IEX:
“Trying to find liquidity without having an impact when you’re doing it is an over-arching challenge we will always have,”
Huge players don't want to have a price impact before they trade - they want the price impact to happen after. I.e., it'll happen to the little guy rather than to them.
A block trade is simply a large coherent set of orders; it's the purchase or sale of a big number of tradable instruments, which usually requires a lot of little orders to execute.
This is pure pedantry on my part, but I wouldn't want the thread to leave the impression that "block trades" are some technological feature of electronic markets. They're a fundamental problem of trading.
After building a large Hadoop simulation environment and contemplating how to use the same codebase in a streaming fashion, Summingbird seems like an exciting answer. I am concerned that I don't see much discussion around its use online and wonder what its adoption is like. Are people using it? I wonder if the usage of new terms like "moniod" scares people away.
Understood that Amazon needs to make a profit, but these are still dollars that I need to pay for, that I could be feasibly be using for something else (hiring great developers perhaps.)
I think the advantage of EC2 is exactly fast scaling, and an easy sell to management (no big upfront.) But as we started using it at size and consistently, it is just cheaper to run our own hardware as the blog attests to.
We do all the racking and cabling using "remote hands" directed from our SAs in India, and so far for us this has not been the complex part. Getting the Hadoop configured and our software running efficiently if several orders of magnitude harder, and EC2 doesn't I think help here. If anything it hinders as we are dealing with virtual hardware. There have been several posts about "lemon" EC2 instances and how you should test your instance before using it.
Using EC2 also takes away a part of the risk. If your startup fails in a year, then you don't get stuck in the end with a pile of hardware (for which you paid big bucks).
I guess that the best would be to use EC2 in the beginning, when you don't really know how much hardware you need, and later on use EC2 only for demand spikes.
Yes - this is what we are doing. With this 4x cost differential however, we calculated that it takes only 6-8 months to have the hardware pay for itself, so if you plan to be in business in that time frame you are better of buying.
Found this example which seems a little clearer: https://pypi.python.org/pypi/quasiquotes/0.2.0