shinydevops's comments

shinydevops · on Dec 22, 2021

Here's my "it's not stupid if it works" fix (that doesn't involve restarting Teams). Click on any message in a chat, click CTRL- (or CMD-) A, and congrats, select all works again. Please don't read this as a defense of Teams -- I'm just another frustrated user.

weikju · on Dec 22, 2021

THANK YOU, I couldn't figure out why sometimes it would work and sometimes it wouldn't.

shinydevops · on Dec 2, 2021

As someone who lives in one of the areas in this study that instituted a mask mandate, I think the clearest explanation of the lower case rates preceding the mandates is the general willingness of the population to mask prior to the dates noted on the graph. Yes, the most recent mask mandates weren't instituted until July 26th, but, for whatever my anecdata is worth, masking use in the more urban areas was already substantially higher than it was in the areas without the mandates. There's also the small matter of the more populous areas having higher vaccination rates than the state's more rural areas.

To me, the story here is not the data itself; it's another example of elected officials hiding data that doesn't agree with their position. Is the headline the best interpretation of the data? Not at all. But it's one more data set showing that the areas that are most closely following relevant recommendations are seeing better outcomes than areas that aren't.

shinydevops · on Nov 5, 2020

Hopefully I'm not misunderstanding your use case, but I would think you could accomplish this by publishing a durable message to a highly available, durable queue with no consumers, setting a TTL on the message with a policy to publish to a "dead letter exchange" that fronts the queue or queues with your eventual consumers. The durability flags ensure both message and queue survive a restart, while the high availability policy on the queue ensures that each node in the cluster has a copy. And I'm sure there are a few similar patterns that would garner the same desired behavior.

I'm not taking a position on whether this is an acceptable level of complexity for the desired feature, of course, just pointing out how one might accomplish it if Rabbit is otherwise desirable.

shinydevops · on Aug 31, 2016

As another user with nothing but negative experiences with Riak-CS in production, I thought I'd take a stab here. We had a 12-node cluster with ~10TB per node, fwiw. In no particular order:

- The restart times of the Riak process ranged from 10 minutes to 3+ hours, during which time the cluster was basically useless. Not a single suggestion from support sped up this process.

- Every single night from 0800 - 0900 UTC, the cluster would grind to a halt (as measured by canaries measuring upload/download cycle times). This continued even after we migrated all customer data and traffic off of the cluster.

- Riak-CS ships with garbage collection disabled despite it being a critical feature. I inherited a cluster that had been run for some months without gc enabled. Turning it on caused the cluster to catastrophically fail. Basho Support, over a period of close to a year, was unable to find a single solution that would get our cluster back to health. If our cluster were a house on a show like Hoarders, the garbage in it would be considered load bearing.

- We attempted to upgrade our way out of our un-garbage-collect-able mess, but the transfer crashed. Every. Single. Time.

- Even had transfers worked, all of the bloated manifests have to be copied in their entirety, so you can't gc the incoming data on the new cluster.

- Even while babying the cluster, it would become unusable at least once a month, requiring a restart of all nodes. The slowest node took 3+ hours to start, followed by another 3+ hours of transferring data. This was 6+ hours of system downtime every month.

- During these monthly episodes, we attempted to engage with support and try to debug the processes (we were a team of seasoned Erlang developers). We could attach Observer and/or use the REPL to grab stats, but not a single support resource was able or willing to engage.

- For giggles, once we had migrated all users off of the cluster, we attempted to let gc run. It never completed. Not once. We let this go on for a few months before nuking the entire cluster.

Now, I absolutely realize that we got ourselves into that mess by running the cluster without gc for an extended period. But in the grand scheme of things, this cluster wasn't storing a very large amount of data -- tens of TB spread over tens of millions of objects. Having the cluster get into a state where gc can never run and where this causes snowballing instability is unacceptable.

We switched to Ceph. We've never looked back.

shinydevops · on Aug 30, 2016

+1 for Ceph. We're running several ~3.5 PB clusters in production. We've not taken advantage of the new RGW features in Jewel, but it works well as an object storage solution.

HN For You