More

Oculus · on Nov 1, 2014

Earlier in the summer I decided to create a phone background with these numbers so whenever I had free time I could work on memorizing them: https://twitter.com/EmilStolarsky/status/496298288325599233

Oculus · on Nov 1, 2014

I've been told by family anecdotes that Pepsi & Coke were a prized possession in the USSR. Whenever they were able to acquire a bottle, you would never throw it out immediately after finishing, but rather refill it with water as there was still some residual taste.

Oculus · on Oct 31, 2014

I recently happened upon an interview of Elon Musk on his work ethics: http://youtu.be/4Fl9LRgG3_A

The main takeaway was work as hard as you can (i.e. 80 - 100 hour weeks). Even if you're doing the same thing as the competition (assuming they work 40 hours), you'll still accomplish in 4 months what they would in a year. Hard work, tenacity, and a little luck is what separates the smart from the brilliant.

mattm · on Nov 1, 2014

It doesn't work like that though. I'm sure most developers have had the situation of not being able to solve a problem at the end of the day but the next morning you end up solving it in 10 minutes. People need rest. Without rest and recovery, your decision making process gets worse and slower.

The people I've worked for that were putting in 12 hour days always seemed to be the people that wasted the most time at work.

ACow_Adonis · on Nov 1, 2014

Yeah, except at least on that last point Musk is wrong (and i know saying that on HN is kinda like saying that jesus isn't the son of god in Rome). He's just repeating his own self congratulatory narrative.

Human's are biological creatures. Just like when we run, when you go for 30 seconds vs 3 minutes vs 3 hours vs 3 days, the relationship of what happens is not a simple linear one of input. We tire, we lose concentration, we actually get worse if we go all out for long periods than if we act in controlled and conscious burn with good amounts of rest and down time.

As another poster has said, what does happen is that the ones that put in the long hours, in my experience, are often the most incompetent, the magical thinkers, the ones most out of touch with how much they're doing, and the ones putting on the show. Of course, in this world, the ones that put on the show are the one's that often get the most rewards...

Musk is though, spot on with the first two points though. Things other people value. Tenacity.

Hard work, is of course needed, but baring some kind of ridiculous genetics which border on not existing, your productivity drops magnificently once you start going longer hours. That's primarily just for show...

Oculus · on Oct 29, 2014

I feel as though this is one step above just a web page with an email box. Aside from listing a suite of popular OSS, you haven't explained what this app does. Is it similar to Panamax in that it offers you a UI, but only for one box / Flynn or Deis which is trying to create an open source Heroku / something completely different?

Nowaker · on Oct 29, 2014

Flynn, Deis, Heroku are PaaS - they are focused on applications.

VirtKick is like DigitalOcean or Linode - they are focused on machines (and containers, that you think of as machines). You install it on your Linux desktop (for localhost hacking), or home or dedicated server (for something more).

Will try to express it a little bit better. Thanks for your feedback.

Oculus · on Oct 26, 2014

In later blog posts they discuss moving over to ext4 despite the 16TB size limit.

brianwski · on Oct 27, 2014

Brian from Backblaze here.

Correct, we are entirely on ext4 now (there might be one or two old pods running JFS left, I would need to double check).

We figured out a way to work around the 16 TByte volume limit originally, but now ext4 even supports larger volumes so that isn't a problem at all anymore.

More random history: At the time we chose JFS it looked like XFS might lose support (XFS was created by a now defunct company I used to work for called SGI). Well, the world is a strange place and XFS now has more support than JFS, despite SGI going out of business. But in the end, ext4 is the "most commonly used" linux file system and therefore is arguably the most supported with extra tools and such, and the performance of ext4 in many areas seems higher than JFS (at least for our use case) so the decision becomes easier and easier to go with ext4.

Oculus · on Oct 26, 2014

I've seen their hard drive reliability blog posts, but I never knew they were so open about their storage server designs. I think it's really cool how open they are about something that is definitely a competitive advantage.

brianwski · on Oct 26, 2014

Brian from Backblaze here.

> open they are about something that is definitely a competitive advantage.

From our perspective, our biggest competitor is apathy. Something like 90 percent of people simply don't backup their computers. So pretty much anything we can do to have people hear about Backblaze and possibly overcome their apathy and start backing up their laptops is good for us. These blog posts have worked very well for us - people enjoy the information, and by disclosing the info we have potential customers hear our name and we get a bump in subscriptions. It's a win-win, the BEST kind of win. :-)

aftbit · on Oct 26, 2014

The real competitive advantage is the software and IT principles that make managing a dumb array of disks possible.

falsestprophet · on Oct 26, 2014

Is it though? Running OpenStack Swift on these pods would get you pretty far.

http://docs.openstack.org/developer/swift/

caraboga · on Oct 26, 2014

Having inherited an openstack install with the swift implementation implemented on earlier generation of pods, these are the issues I and my cohorts ran into:

1) The object replication between the pods will be hard on the array, as the object replication is simply an rsync wrapped inside nested for loops. If you have a bunch of small files with a lot of tenants, it'll hurt.

2) While swift allows you to simply unmount a bad disk, change around the ring files and let replication do its thing, there are actual issues. First of all, out of band smart monitoring of bad sectors actually cases the disk to pre-empt some sata commands and do the smart checks first. On a heavily loaded cluster, under a smart check for a bad block count could kill the drive, and take out the sata controller along with it. We've downed storage pods before by that way. The only way we got around it was to take a pod offline once a week, run all our smart checks, then put it back.

3) To replace a drive, you have to open the machine up and power it off. As any old operator will tell you, drives that have been running for awhile do not like to be turned off. If you are to power off a machine to replace a bad drive, do realize that you might actually break more drives just from the power cycle.

4) Once you change a drive out and use the ring replication of files to rebuild your storage pods, your entire storage cluster will take a non-trivial hit.

5) Last but not least, it is almost of tantamount importance to move the proxy, account and container services away from the hardware that also host the object servers. It's probably good to note that the account and container meta data is stored in sqlite files, with fsync writes. If you add and remove a bunch of files to multiple accounts, the container service is going to get hit first, then the object service. Further more, every single transaction to every meta data/data service, including replication, is federated through the proxy servers. If you look at a swift cluster, the proxy services take up a large chunk of the processing space.

Source: Was Openstack admin for a research group in Middle Tennessee. Ran an openstack cluster with 5 gen1.5 pods for the entire swift service, then moved account/container/proxy to three 12 core 2630s with 64 gigs of ram. Cluster was for a DARPA vehicular design project, first part fielded 3000 clients, second part fielded about 1/10 of that, but with more files and bigger files (these were cad and test results respectively).

rdkls · on Oct 27, 2014

Thanks caraboga. I was contemplating this setup a year or so ago, and great to learn your insights from actual implementation. It looked to me like Supermicro JBOD enclosures would be superior (mainly hot swappability) if a bit more expensive, would you agree?

caraboga · on Oct 27, 2014

My predecessor went with Backblaze clones for the drive and mb enclosure. You'll run into issues with the backblaze way as they have two power supplies, but one is for the motherboard and the boot drives, and the other power supply is for the actual drives. Further more, as this was a backblaze pod clone, there's no ipmi on the pod motherboards. It makes certain things a bit more annoying than they have to be.

If I were to do it again, I would stay away from using enclosures that were inspired by the BackBlaze models. Supermicro enclosures are fine.

rincebrain · on Oct 27, 2014

I'd be curious to know what drives and controllers you had that pre-empted requests due to SMART queries...

caraboga · on Oct 27, 2014

I don't recall the model, but they were 3 terabyte Seagates that were bought 3 years ago. I think my predecessor employed the same controller cards as version 1 of the pods. You could tell that the smart queries pre-empted normal io as the smart query would be executed, and disk activity for certain swift object servers would just stall. The object-servers would not return requests for at least some of the drives until the smart query was finished.

Curiously enough, I've also ran smartctl against Samsung 840Pros using Highpoint RocketRaid controller cards during the same project. Sometimes this crashes the controller card.

(This was for a gluster cluster, when gluster had broken quorum support. A copy of a piece of data, when out of sync with the other copies in a replica set, would be left unchecked. This was produced after one smartctl command took out the controller card, and then the machine along with it.)

Oculus · on Oct 26, 2014

Can someone explain to me why no one has scraped all the Twitpic photos and stored them?

Cameron_D · on Oct 26, 2014

ArchiveTeam has been, first we were grabbing full pages and images and storing them, but wound up with IP bans (Not unexpected), so a couple of people went through and grabbed the first 500 million images directly from CloudFront, they're still sitting on that 55tb of data.

Following that TwitPic then removed all images from showing on their site and required signed requests to load images from CloudFront so the remaining 300m images can't be fetched yet.

Today TwitPic restored the images and such to their site so AT is stepping back, rewriting their scripts to properly grab pages/images/metadata and will start from the most recent image working backwards and properly store them/removing the earlier grabs as we replicate them.

In the end the data will probably reside in offline storage at the Internet Archive until something happens to the TwitPic site.

toomuchtodo · on Oct 26, 2014

Props to the #quitpic team for working on this.

nkozyra · on Oct 26, 2014

Twitpic was brazenly (but not openly) limiting that, ostensibly to preserve whatever remaining value is there.

The will-they-won't-they nature of this shutdown is only being dragged out in an attempt to get an infusion of cash or a small acquisition (that would still be relatively large for the remaining stakeholders).

In other words, there was probably an expectation of acquisition before, it didn't happen, money got scarce and now there's a short sale. I don't know if it will work, but if not it will be a long time before the data is opened up for posterity.

hkmurakami · on Oct 26, 2014

I'm not sure if this is logical or reasonable for me to think this way, but this whole fiasco is evocative of a terrorist threat.

I know that Twitpic "owns" the data and isn't beholden to users or Twitter, but their attitude really reminds me of a hostage situation.

jacquesm · on Oct 26, 2014

If you upload your data to a server then that data ceases to be under your control. This is the major downside of any cloud based service and you're going to have to decide before you upload/invest if that's what you want to do.

Terrorist threats hit random bystanders in violent ways, this has 0 to do with any of that, everybody involved should have known what they were getting into.

bdcravens · on Oct 26, 2014

Twitpic blocked access to those trying:

https://news.ycombinator.com/item?id=8472047

logicalman · on Oct 26, 2014

Who says no one has done that?

Oculus · on Oct 22, 2014

I don't think they mean move to a place more affordable out side of SF, but move away from SF to another part of the country.

potatolicious · on Oct 22, 2014

Where though? At the end we're really talking about arbitrage. Places with jobs are expensive to live in, places that are cheap to live in don't have jobs.

There's no way to win, except maybe temporarily exploiting an arbitrage opportunity (i.e., a place with jobs but the cost of living hasn't caught up yet). Expecting people to do this is unreliable at an individual level, and disastrous as public policy.

Ultimately the only mid- to long-term solution is to help make job centers cheaper to live in.

DamnYuppie · on Oct 22, 2014

There are many affordable and large and fast growing cities with great pay. Dallas, Houston, Boise, and SLC to name a few.

Oculus · on Oct 22, 2014

From a sideline viewers perspective, persistence in production for Docker containers is a big problem that I've yet to see a good solution for. You end up having to keep DB servers and such outside of your container pool.

vidarh · on Oct 22, 2014

You don't. That's the point of volumes. You do need to be careful to ensure you mount volumes or everything that needs to persist, but in practice that's not a very onerous limitation.

Since volumes are bind-mounted from the outside, you can put the volumes on whatever storage pool you want that you can bring up on the host (as long as it meets your apps requirements, e.g. POSIX semantics, locking etc).

E.g. at work we have a private Docker repository that runs on top of GlusterFS volumes that are mounted on thehost and then bind mounted in as a volume in the container.

I also run my postgres instances inside Docker, with the data on persistent volumes.

We could certainly use better tools to manage it, though.

Another pattern I ought to have mentioned (that I've used myself) is to set up "empty" containers whose only purpose is to act as a storage volume for another container. I don't like that as much, mostly since I've not had as much long-term experience with how the layering Docker uses would impact it.

Rafert · on Oct 22, 2014

https://github.com/clusterhq/flocker seems to be doing some interesting stuff in that direction.

cpuguy83 · on Oct 22, 2014

And we're working with the folks @ clusterhq to bring some of that into Docker.

eropple · on Oct 22, 2014

Having persistent containers strikes me as the infrastructural equivalent of a code smell. I don't use in-container storage for anything that needs to be persisted at all, and I'm not sure why you would in a modern environment. Everything can fail and fail hard, and writing meaningful (i.e., volume'd) data to disk seems like asking for trouble. Ephemeral containers just seems to fit the model of Docker's capabilities much better than maintaining state.

The exception to this would be, I guess, you could wedge in Docker containers for database server isolation or something, but my databases don't run on multi-tenant instances so there isn't a huge win to it. (I use RDS most of the time, let somebody else manage that problem.)

vidarh · on Oct 22, 2014

> Having persistent containers strikes me as the infrastructural equivalent of a code smell.

Which is why pretty much all advice regarding Docker is to use volumes, so that the persistent data is managed from outside the container, and the container itself can be discarded at will without affecting the data volume.

My preferred method is bind-mounted volumes from the host. They are not part of the containers, and the purpose is exactly to remove the need or persistent containers. A lot of the examples I gave in my article relies heavily on this.

This leaves the form of persistence up to the administrator. And of course that means you can do stupid things, or not. On my home server that means a mirrored pair of drives combined with regular snapshots to a third disk + nightly offsite backups. At work, we're increasingly using GlusterFS on top of RAID6, so we lose multiple drives per server, or whole servers before the cluster is in jeopardy (and even then we have regular offsite snapshot throughout the day + nightly backups).

If you are referring to the pattern of creating "empty" containers to act as storage volumes, then I sort-of agree with you, but mostly because of the maturity of Docker. After all nothing stops you from putting the docker storage itself on equally safeguarded storage. It's not really the risk of losing storage that makes me prefer stateless containers, but that separating state and data substantially reduces the data volume that needs to be secured (since we can spin up new stateless containers in seconds, we only really care about preventing loss of the persistent data volumes).

eropple · on Oct 22, 2014

Persistent containers is the wrong term. "On-host storage" might be a better one. I don't store anything on my compute nodes. Everything is in a HA datastore or in S3. I kind of feel like architectures that rely on storing any data you can't immediately blow away without shedding a tear are, in modern environments, dangerous, and volumes seem (to me) to lead to the kind of non-fault-tolerant statelessness that'll bite you in the end.

I respect GlusterFS, so it sounds reasonable in your own use case, but it still makes me intensely uncomfortable to have apps managing their own data. I try to build systems where each component does one thing well, and for the components that I think work well in a Docker container, data storage is then kind of out-of-scope. YMMV.

peteridah · on Oct 22, 2014

We decided not to use docker containers for our postgres DB in production; Volume mounts just don't make me sleep easy. We use an S3-backed private docker registry to store our images/repositories.

vidarh · on Oct 22, 2014

What is your issue with volumes? Bind mounts have been battle tested over many years. I e.g. have production Gluster volumes bind-mounted into LXC containers that have been running uninterrupted for 5+ years.

peteridah · on Oct 22, 2014

Nothing wrong with mounting volumes, but certainly not for production DB's, at least not at this point. I don't have much insight to share on this, besides my intuition based on several years working on HA systems. The clusterhq folks are working on this problem though, and I am following keenly.

shykes · on Oct 23, 2014

Docker volumes are just bind-mounted directories. Of all the things you should worry about when running HA systems, bind-mounts is definitely not one of them. They are essentially overhead-free and completely stable. This is completely orthogonal to Docker - after setting up the bind-mounts it gets completely out of the way of the critical path.

We have definitely used them at large scale in production for several years with no issues.

peteridah · on Oct 23, 2014

Thanks shykes for chiming in on this. We use docker heavily for our applications in production (and all other envs), and would absolutely like to extend that to our Postgres DBs. Are there any examples you can point us to with a pattern for Postgres HA clusters with docker? Thanks.

klochner · on Oct 22, 2014

With docker specifically, I've had frustrations with the user permissions on the volume between {pg container, data container, host OS}, with extra trouble when osx/boot2docker is added to the stack.

Also, docker doesn't add as much value for something like postgres that likely lives on it's own machine.

vidarh · on Oct 22, 2014

> I've had frustrations with the user permissions on the volume between {pg container, data container, host OS}

Then don't use data containers. I don't see much benefit from that either. The stuff we put on data volumes is stuff we want to manage the availability of very carefully, so I prefer more direct control.

And so when I use volumes it's always bind mounts from the host. Some of them are local disk, some of them are network filesystems.

We have some Gluster volumes that are exported from Docker containers that imports the raw storage via bind mounts from their respective hosts, for example, and then mounted on other hosts and bind-mounted into our other containers, just to make things convoluted - works great for high availability (I'm not recommending Gluster for Postgres, btw.; it "should" work with the right config, but I'd not dare without very, very extensive testing; nothing specific with Gluster, just generally terrified of databases on distributed filesystesms).

> for something like postgres that likely lives on it's own machine.

We usually colocate all our postgres instances with other stuff. There's generally a huge discrepancy between the most cost-effective amount of storage/iops, RAM and processing power if you're aiming for high density colocation, so it's far cheaper for us that way.

cpuguy83 · on Oct 22, 2014

I'd be interested to know what your concerns are with volumes.

cpuguy83 · on Oct 22, 2014

So, we're making volumes better! Please see: https://github.com/docker/docker/pull/8484 And that is only the beginning.

It may be presumptuous to say the referenced PR will be in 1.4, but that's certainly what I'm pushing for.

vidarh · on Oct 22, 2014

Any plans on making it possible to control what's backing those volumes more directly? For me, when I'm using data volumes, it's generally because I have specific requirements (e.g. want my database on my expensive SSD array's; want my high availability file storage on a Gluster volume or similar) that doesn't make it interesting to have them co-located with the container storage.

I don't want to care if the container storage is totally destroyed - I want to just re-create it from our registry on a different host. I keep threatening the devs I work with that I'll wipe the containers regularly, for a reason, and they're specifically and intentionally not backed up.

From what I hear, the idea of keeping the containers totally disposable is a key appeal of Docker for many (me included), so I would just not use any volume related functionality that co-mingles the data volumes other than in very special circumstances (e.g. lets say we had static datasets that we'd like to mix with an application in different configurations; but I don't have any actual, real-life use cases for that at the moment; basically I'd only do that if I then also could push those volumes into our registry)

cpuguy83 · on Oct 22, 2014

First, volumes are 100% separated from containers. If you remove a container, the volumes are not removed unless you explicitly told docker to (docker rm -v <container>). And even then, it won't remove the volume if other containers are using it.

That said, it is hard to re-use a volume right now if you removed the last container referencing that volume. The linked PR solves this issue.

You can mount these SSD's on the host and then use bind-mounts (docker run -v /host/path:/container/path) to get those into the container. Or you can add the devices to the containers directly ("docker run --device /dev/sdb", for example), but you'd need privilaged access to actually mount the device.

The referenced PR itself doesn't really make using things like specialized disks any easier, except you can register them with the volumes subsystem and it will track them form you, ie `docker volumes create --path /path/to/data --name my_speedy_disks`.

There was some discussion around being able to deal with devices directly with Docker instead of expecting the admin to handle mounting those devices onto the host so they can be used as volumes. Nothing finalized here. Here is where that discussion happened: https://botbot.me/freenode/docker-dev/2014-10-21/?msg=239115...

vidarh · on Oct 22, 2014

I think the "docker volumes create --path ..." solves my use case just fine. Thanks for providing details - that seems quite useful.

dugmartin · on Oct 22, 2014

I haven't used Docker in production but isn't the standard practice for persistence to mount a host directory as a data volume?

Oculus · on Oct 21, 2014

Probably the biggest feature over Confd is the ability to run arbitrary commands when a template update completes.

kelseyhightower · on Oct 21, 2014

Not to take anything away from consul-template, but that has been a feature of confd for a while:

https://github.com/kelseyhightower/confd/blob/master/docs/te...

Oculus · on Oct 21, 2014

My bad, I misspoke.

HN For You