Former Uber engineer/EM here: I worked on the Rider app.
The “there are only a few screens” is not true. The app works in 60+ countries, with features shipped in the app that often for a country, and - in rare cases - a city.
The app has thousands of scenarios. It speaks to good design that each user thinks the user is there to support their 5 use cases, not showing all the other use cases (that are often regional or just not relevant to the type if user - like business traveler use cases).
Uber builds and experiments with custom features all the time. An experimental screen built for London, UK would be part of the app. Multiply this by the 40-50 product teams building various features and experiments outside the core flows you are talking about (which core flows are slightly different per region as well).
I worked on payments, and this is what screens and components are in the Uber app:
- Credit cards (yes, this is only a a few screens)
- Apple Pay / Google Pay on respective platforms
- PayPal (SDK)
- Venmo (SDK)
- PayTM (15+ screens)
- Special screens for India credit cards and 2FA, EU credit cards and SCA, Brazil combo cards and custom logic
- Cash (several touch points)
- AMEX rewards and other credit card rewards (several screens)
- Uber credits & top-ups (several screens)
- UPI SDK (India)
- We used to have Campus Cards (10 screens), Airtel Money (5), Alipay (a few more), Google Wallet (a few) and I other payment methods I forget about. All with native screens. Still with me? This was just payments. The part where most people assume “oh, it’s just a credit card screen”. Or people in India assume “oh it’s just UPI and PayTM”. Or people in Mexico “oh, it’s just cash”. And so on.
Then you have other features that have their own business logic and similar depths behind the scenes when you need to make them work for 60 countries:
- Airport pickup (lots of specific rules per region)
- Scheduled rides
- Commmuter card functionality
- Product types (there are SO many of these with special UI, from disabled vehicles, vans, mass transport in a few regions etc)
- Uber for Business (LOTS of touchpoints)
- On-trip experience business logic
- Pickup special cases
- Safety toolkit (have you seen it? Very neat features!)
- Receipts
- Custom fraud features for certain regions
- Customer support flows
- Regional business logic: growth features for the like of India, Brazil and other regions.
- Uber Eats touchpoints
- Uber Family
- Jump / Lime integrations (you can get bikes / scooters through the app)
- Transit functionality (seen it?)
- A bunch of others I won’t know about.
Much of the app “bloat” has to do with how business logic and screens need to be bundled in the binary, even if they are for another region. E.g. the UPI and PayTM SDKs were part of the app, despite only being used for India. Uber Transit was in a city or two when it launched, but it also shipped worldwide.
And then you have the binary size bloat with Swift that OP takes about.
This is a wonderful idea, and I hope many people start working on it right away.
Although Massalin has never published her code, according to my memory of her thesis, Synthesis's runtime code generation was mostly extremely simple, more like linking than what we think of as "code generation" — it copied a template method into the appropriate slot in the newly-generated quaject, then overwrote specific bytes in the generated code with pointers to the relevant callout (or, in some cases, the initial value of an instance variable for that quaject). Parts of the code that did not benefit from being specialized in this way were factored into ordinary functions the quaject method would just call.
This meant that only a small amount of code was generated for each quaject, and the runtime code generation was very nearly as fast as memcpy(), which meant that it was reasonable to use it on every quaject instantiation.
Massalin also talked about applying some optimizations to the generated code, such as the constant-folding and dead-code removal John mentions, but I have the intuition that only a minority of quaject instantiations involved such more aggressive optimizations. Since she never published Synthesis, it's impossible to know for sure. (I'm not questioning her integrity or claiming that the impressive benchmarks reported in her dissertation are faked; I'm saying that we unfortunately can't see the exact mixture of interesting things you need to do to get those kickass benchmarks; so, like an insecure Intel CPU, I'm reduced to speculation.)
Later implementations inspired by Massalin's approach included Engler's VCODE (which, to my knowledge, has also never been published; Engler's PLDI paper cites Massalin in the second sentence of the abstract), which was used to implement Engler's `C, and GNU Lightning (inspired by Engler's published papers about VCODE), used in a number of modern JIT compilers.
I suspect that, by contrast, John's idea of using LLVM is inevitably going to have much higher overhead — if only from the CPU cache devastation brought about by any LLVM invocation — so will only be a win for much-longer-lived objects, where the large instantiation overhead can be amortized over a much larger number of invocations. An intermediate approach like Engler's `C might be more broadly applicable.
John suggests this early on in his "for deployment" comment, but I think that it's probably necessary for prototyping too, since the objective of the whole exercise would be to get an order-of-magnitude speedup, and the objective of the prototype would be to find out if that's a plausible result. A prototype that makes all your programs run slower due to LLVM wouldn't provide any useful evidence about that.
I asked Perry what he thought about the above, and he replied with this gem:
So you’re correct that the code generation was mostly “template
instantiation”. I think that was key to having calls like open()
function in reasonable time. I also suspect LLVM is a blunt instrument
for this work. That said, it would have been difficult for anyone but
Massalin to work with the approach in Synthesis. It was very much the
product of a person who was both intensely brilliant and completely
comfortable with writing “weird code” in the instruction set they were
working in.
So there’s then the question of how one can take the ideas from
Synthesis and make them a practical thing that ordinary programmers
could build and contribute to. And that is almost certainly going to
involve compiler tooling. As a prototype, making this work by using
LLVM is probably a good approach. Ultimately, I think that one is
going to have to do the magic at kernel build time and have something
fairly lightweight happen at runtime. But to figure out what that is,
one needs to play. And the easiest tools right now for playing
involve LLVM. If, for example, you can use LLVM successfully to
specialize a write call’s instruction path down an order of magnitude
or more, or to do similar things in the networking code, one can then
ask how to do this better.
There are, of course, a number of interesting paths towards playing
with this. I suspect that none of them end anywhere near where they
start. But the only way to see what might be possible with much better
tooling is to start, and you have to start somewhere.
BTW, I think the time is right, or even over-right, for this. Single
processor core performance is stalled out, and while in 1992 one could
just say “well, we’ll have another factor of ten performance
improvement in a few years, who needs the trouble”, that’s no longer
the case. Note that this argument also applies, to a considerable
extent, to other parts of the modern software ecosystem. When you
can’t just say “we could spend a couple of months optimizing this, but
the next generation of processors will be out by then”, things change.
Anyway, not sure if this answers your call for comments, but if you
are interested in specific areas around this, I’ve no shortage of
opinions. Many would say I have far too many opinions...
You can quote any subset or the entire thing. So long as you don’t
distort my intent I have no problem with people using my words that
way.
The “there are only a few screens” is not true. The app works in 60+ countries, with features shipped in the app that often for a country, and - in rare cases - a city.
The app has thousands of scenarios. It speaks to good design that each user thinks the user is there to support their 5 use cases, not showing all the other use cases (that are often regional or just not relevant to the type if user - like business traveler use cases).
Uber builds and experiments with custom features all the time. An experimental screen built for London, UK would be part of the app. Multiply this by the 40-50 product teams building various features and experiments outside the core flows you are talking about (which core flows are slightly different per region as well).
I worked on payments, and this is what screens and components are in the Uber app:
- Credit cards (yes, this is only a a few screens)
- Apple Pay / Google Pay on respective platforms
- PayPal (SDK)
- Venmo (SDK)
- PayTM (15+ screens)
- Special screens for India credit cards and 2FA, EU credit cards and SCA, Brazil combo cards and custom logic
- Cash (several touch points)
- AMEX rewards and other credit card rewards (several screens)
- Uber credits & top-ups (several screens)
- UPI SDK (India)
- We used to have Campus Cards (10 screens), Airtel Money (5), Alipay (a few more), Google Wallet (a few) and I other payment methods I forget about. All with native screens. Still with me? This was just payments. The part where most people assume “oh, it’s just a credit card screen”. Or people in India assume “oh it’s just UPI and PayTM”. Or people in Mexico “oh, it’s just cash”. And so on.
Then you have other features that have their own business logic and similar depths behind the scenes when you need to make them work for 60 countries: - Airport pickup (lots of specific rules per region)
- Scheduled rides
- Commmuter card functionality
- Product types (there are SO many of these with special UI, from disabled vehicles, vans, mass transport in a few regions etc)
- Uber for Business (LOTS of touchpoints)
- On-trip experience business logic
- Pickup special cases
- Safety toolkit (have you seen it? Very neat features!)
- Receipts
- Custom fraud features for certain regions
- Customer support flows
- Regional business logic: growth features for the like of India, Brazil and other regions.
- Uber Eats touchpoints
- Uber Family
- Jump / Lime integrations (you can get bikes / scooters through the app)
- Transit functionality (seen it?)
- A bunch of others I won’t know about.
Much of the app “bloat” has to do with how business logic and screens need to be bundled in the binary, even if they are for another region. E.g. the UPI and PayTM SDKs were part of the app, despite only being used for India. Uber Transit was in a city or two when it launched, but it also shipped worldwide.
And then you have the binary size bloat with Swift that OP takes about.