I've done it twice, once in 2010 and again in 2012.
At the first company we had been using PPTP and L2TP for remote access but our employees were having tons of problems establishing connections and staying connected. Then we switched to the OpenVPN server built into our Vyatta router, along with the standard OpenVPN client. This required a ton of manual configuration on each client, but the connections were rock solid.
In 2012 I helped my new employer deploy OpenVPN Access Server, the commercial version. This one didn't need the clients to be pre-configured. Instead our employees went to a webpage, entered their LDAP username and password, and then were given a download that would do everything. The automatic configuration worked perfectly on Mac OS, Windows, and iOS. (I don't recall if anyone tried it on Linux or Android, but I never heard any problems about those.)
OpenVPN Access Server is ridiculously under-priced -- it's $10 per concurrent connection. If your usage is typical, where most users only connect occasionally, then you can connect hundreds of employees for under $1000. Even if it was a lot more expensive, my experience was quite positive and I'd still recommend this.
I doubt they have quite that much spare capacity in any AZ. In the past (2012) they seemed to be capacity constrained when a whole AZ went offline. In a situation like that, some customers would move to a different AZ quickly, and others would just wait it out. Spot Instance pricing would also jump quickly, which would free up some capacity when low-budget workloads shut down.
CloudFlare and Amazon CloudFront are both origin pull CDNs. (They pull content from your origin server as it is requested from users in different locations.) It looks like they should work just fine with WP Rocket. What makes you think they won't?
Amazon rebooted lots of PV guests. Presumably they collocate HVM and PV guests on the same box. If there were any HVM guests on the box, then there could be the possibility of an attack. (I guess they could forcibly kick off the HVM guests, but that wouldn't be very nice.)
HVM, at least in the past, had a bunch more code that the guest DomU interacts with vs. fully pv guests. This has security implications.
Now, my knowledge of HVM is a few years... or more like half a decade out of date, for example, I don't even know how to force a HVM guest to only use PV drivers (which would solve 90% of the problem.) and i know that more and more of this has moved into hardware, so it's possible that what was true five years ago is not true now, but... yeah, I don't let untrusted users on HVM guests for the same reason I don't let untrusted users use pygrub or load untrusted kernels directly.
HVM guests at Amazon will default to using PV drivers for IO and networking. (Unless using SRIOV/"Enhanced Networking", which will not use the PV drivers)
PVH is actually PV on top of an HVM container and is a bit different. You can think of it as PV sitting on top of enough HVM bits to take advantage of the hardware extensions Intel and AMD have invested so heavily in while still being majority PV. This gives you the best of both worlds, including the remaining PV performance benefits related to interrupts and timers that PV drivers on HVM can't utilize.
I was pretty unhappy with how Rackspace handled the maintenance window.
1. Rackspace's maintenance announcement was sent at 9:00 PM on Friday night (Pacific time). Seriously?! I had already left for a weekend vacation without my laptop, so I couldn't do anything to get my company prepared. Even if the patch wasn't ready until Friday night, Rackspace could have scheduled the maintenance windows and announced them to customers much earlier.
2. The maintenance window for all three USA regions were scheduled at the same time. We couldn't just move to a different region without going to another continent.
3. Each maintenance window was 24 hours -- that's just too long. Even though our servers were only down for 10 minutes, we had to be on call and ready for 24 hours.
4. Although we have redundant servers in every region, we still couldn't guarantee that those redundant servers wouldn't be rebooted at the same time. As it turns out, we did lose both of our servers in ORD at the same time.
We push DNS into Route 53 at Runscope, and I trust our approach for the reasons you mentioned. Route 53 has a great track record for DNS uptime. If our sync system breaks, Route 53 will keep serving DNS records while we fix the sync system.
On a side note, we actually have two DNS sync systems at Runscope. One handles DNS for individual hosts (like web01.runscope.com) and the other handles public DNS (like www.runscope.com).
Our public sync daemon is tied into ZooKeeper, which keeps a list of active hosts for each of our public services. The daemon updates entries in Route 53 whenever the hosts change in ZooKeeper. We have 3 different types of public domains, and the tool handles all 3. We have standard domains with load balancing, geo-routed domains for services in multiple regions around the world, and regional domains for routing to a specific region. The sync daemon has been very helpful in keeping our infrastructure nimble.
If this interests you, I'll be giving a talk about our cloud infrastructure in a few weeks in San Francisco, along with our founder/CEO John Sheehan. Full announcement coming soon...