More

cscotta · on Feb 2, 2012

It is unfortunate that many young tech companies, in an effort to differentiate themselves, work to invent a new term and cloak it with the appearance of a movement or industry trend. It's further unfortunate when those to whom this term is being marketed recoil with laughter and amusement at its implications, forcing its purveyors to double down and attempt to re-assert control over the meaning of a term which may please analysts and folks with Twitter accounts who fashion themselves "thought leaders," but whose roots and meaning are firmly in marketing rather than industry and art.

So let's talk about the marketing term "No-Ops."

In AppFog's attempt to assert a meaning over the marketing term, we're told that it means "developers can code and let a service deploy, manage and scale their code." I have yet to meet a single company facing a sizable technical challenge whose performance and availability needs could be met by a strait-jacket PaaS with an autoscale button. I'm not saying that many smaller companies don't use such services successfully – that's very much true. But let's be clear: the dream of push-button autoscaling while letting "somebody else" handle deployment, monitoring, instrumentation, and anything that may go wrong in the middle of the night is a marketing dream. As engineers, we have a business need and emotional need to own our availability [1]. Placing the sum total of your operations into a PaaS providers hands, biting down hard on that marketing dream of NoOps, and throwing the pager out the window doesn't mean you have nothing to worry about. It just means that you don't care, and can do nothing about it.

But it's not enough to stop there. Instead, the author sees fit to posit his / her marketing term as the continuation of a history in the evolution of web operations, and proclaim that the term is one that "traditional operations" personnel revile. If "No-Ops" is a success of any sort, it's a marketing win, not a technical one – and certainly not an operational one. If you think that you can place every single egg in your company's basket in the hands of PaaS providers and never worry again until you have to twiddle the auto-scale dial after which everything will be fine, you're only fooling yourself.

Who will migrate data on an oversubscribed Postgres shard to two more shards by night by partitions of account IDs? Who will enable dual-writes, run a migration, then cut over reads as we move into Riak? Who will notice a spike in await on the Kafka RAID, recognize the week-over-week trends pointing to your team running out of iops, order, and rack a set of new boxes with SSDs before it's too late? Who's watching the switches and keeping track of which racks have GigE and which have 10GigE uplinks to the next rack to avoid oversubscribing the network?

It's rare in our industry to see a promise so removed from reality. Indeed, if NoOps is a movement at all, it's powered only by the dream of not having to do one's job which is to ensure that a company is able to deliver on their business value. Who among us isn't tempted by such a promise? [2]

[1] http://www.whoownsmyavailability.com/

[2] http://www.youtube.com/watch?v=fM9o8MxLX3Q

kd1220 · on Feb 2, 2012

I've seen, in my decade of experience in tech, that a well-worded and keyword-packed marketing ploy is as irresistible as gravity to many. As the seductive promises and successful toy examples accrete around the neutron soup of buzzwords, a marketing mass finally reaches an influential Schwarzschild radius. Businesses that come within this radius are pulled in and obliterated, but they have no knowledge of their destruction. They've passed the event horizon and are now in a place where causality and reason have been up-ended. Their demise would only be detectable outside in the faint Hawking radiation that is drowned out by the background noise of self-congratulatory back-patting.

These pie-in-the-sky companies are necessary in the tech galaxy. Their primary purpose is to weed out those businesses who don't have the wisdom or tenacity to take on the challenges themselves. Their secondary purpose is to teach future tech leaders a hard lesson: there is no silver bullet.

malachismith · on Feb 2, 2012

One correction... I believe that Forrester "invented" the term NoOps and did so quite a while ago.

cscotta · on Nov 5, 2011

>> "It doesn't really fail - none of the issues described were "failures" really."

These absolutely were failures.

The author listed several instances in which the database became unavailable, the vendor-supplied client drivers refused to communicate with it, or both. Some of these scenarios included the primary database daemon crashing, secondaries failing to return from a "repairing" to an "online" state after a failure (and unable to serve operations in the cluster), and configuration servers failing to propagate shard config to the rest of the cluster -- which required taking down the entire database cluster to repair.

Each of the issues described above would result in extended application downtime (or at best highly degraded availability), the full attention of an operations team, and potential lost revenue. The data loss concern is also unnerving. In a rapidly-moving distributed system, it can be difficult to pin down and identify the root cause of data loss. However, many techniques such as implementing counters at the application level and periodically sanity-checking them against the database can at minimum indicate that data is missing or corrupted. The issues described do not appear to be related to a journal or lack thereof.

Further, the fact that the database's throughput is limited to utilizing a single core of a 16-way box due to a global write lock demonstrates that even when ample IO throughput is available, writes will be stuck contending for the global lock, while all reads are blocked. Being forced to run multiple instances of the daemon behind a sharding service on the same box to achieve any reasonable level of concurrency is embarrassing.

On the "1GB / small dataset" point, keep in mind that Mongo does not permit compactions and read/write operations to occur concurrently. As documents are inserted, updated, and deleted, what may be 1GB of data will grow without bound in size, past 10GB, 16GB, 32GB, and so on until it is compacted in a write-heavy scenario. Unfortunately, compaction also requires that nodes be taken out of service. Even with small datasets, the fact that they will continue to grow without bound in write/update/delete-heavy scenarios until the node is taken out of service to be compacted further compromises the availability of the system.

What's unfortunate is that many of these issues aren't simply "bugs" that can be fixed with a JIRA ticket, a patch, and a couple rounds of code review -- instead, they reach to the core of the engine itself. Even with small datasets, there are very good reasons to pause and carefully consider whether or not your application and operations team can tolerate these tradeoffs.

rbranson · on Nov 5, 2011

Just to be 100% clear -- so people don't misunderstand your explanation of Mongo's compaction: Mongo does have a free space map that it uses to attempt to fit new data or resized documents into "holes" left by deleted data. However, compaction will still eventually have to be ran as the data will continue to fragment and eventually things get bad.

cscotta · on Oct 20, 2011

If you're curious about Ordasity's load balancing strategies, check out Section 4: "Distribution / Coordination Strategy" in the docs: https://github.com/boundary/ordasity

The code is also fairly straightforward to follow if you'd like to take a glance in the repo as well.

Egregore · on Oct 20, 2011

Thank you, I thought that you somehow found a way to measure CPU load of java processes, in fact you leave end user to measure the work load in end users units.

cscotta · on Oct 20, 2011

Hey Mitchell,

Thanks for asking -- in short, the use cases are a bit different. Finagle is a framework for building asynchronous RPC systems (e.g., services or APIs with transports over HTTP or Thrift), and Ordasity is a library for cluster membership, load balancing, and distribution.

Twitter's put together a nice toolkit atop Netty for building reliable services and describing communication between them. These services are generally stateless and might be balanced by something like HAProxy (when using an HTTP transport), or via a round-robin approach in the case of Thrift clients. A closer comparator for Finagle would be something like Scalang, our library for building hybrid Erlang / Scala distributed systems (though it's also excellent for pure-Scala systems as well): https://github.com/boundary/scalang

Ordasity is designed for describing stateful clusters in which individual nodes are responsible for claiming longer-lived work units -- think of it in terms of "a processing shard" of the system's total load. The library's primary goals are to help you:

- Describe the cluster in simple terms such that when each node comes online, it joins the cluster and is aware of all other nodes (and vice versa)

- Distribute work across the cluster by directing each node to claim an even "count" of work units or "even distribution" of the total load imposed by those work units.

- Automatically rebalance work units across nodes in the cluster as the amount / intensity of work or cluster topology changes.

- And gracefully manage maintenance or downtime scenarios by draining work units to another node.

In the end, Ordasity and Finagle are both tools for building distributed systems on the JVM. However, the types of systems they're designed to describe are a bit different. Hope that helps; let me know if you'd like me to clarify something.

mey · on Oct 20, 2011

Let me propose a use case, and I would appreciate if you indicated if I'm totally off base.

A virtual world, where actors(users) interact with each other, with the cluster providing validation and consistency between actors over a dynamic geography, where each node in the cluster would be handling the realtime requests of actors that are capable of interacting with each other, distant actors would be on different nodes.

(Would this work for a distributed server infrastructure to say an large online game?)

foobarbazetc · on Oct 21, 2011

How does it compare to Norbert: http://sna-projects.com/norbert/ ?

cscotta · on Oct 20, 2011

Hi Roland,

Ordasity could certainly be used in building distributed JRuby applications. However, I'm not sure that pairing it with Rails would be the best use case.

Ordasity is designed for building stateful distributed services -- e.g., systems which can be described in terms of individual nodes serving a partition or shard of a cluster’s total load. Rails applications tend to follow a stateless model in which any application server could serve any request.

Ordasity is more appropriate for building services in which a specific node will be serving all [requests / queries / events] for a work unit in the cluster. In our case, that's our netflow aggregation systems (which pull data together from a network of edge nodes into a single Kafka stream), and our event stream processing system (which we shard by client). In this example, we have two Ordasity clusters -- one comprised of Kafka nodes, and a second containing our event stream processing tier. Ordasity handles work distribution across both tiers, ensures that both clusters are aware of each other and able to communicate, and helps us guarantee that everything is wired up properly (i.e., each node on each tier is communicating with the proper node on another tier) when data is passed between them.

In short -- yes, you could build a distributed service in JRuby (or Clojure, Mirah, Scala, Java, etc.) with Ordasity. Just wanted to offer that note to clarify use cases.

cscotta · on May 3, 2011

I'd like to offer a counterpoint to the author's suggestion that "you'd be an idiot not to buy [and dispose of] a new laptop every year." I'm a software engineer and put a lot on my computer's shoulders, running an entire cluster's worth of software as part of my development environment.

I value longevity and durability in products I buy. It's nice to pick a machine and stick with it. It's a long-term companion. It's about slowing "disposable computing's" cycle of production and obsolescence. It feels good to prove that, with a few upgrades over its lifetime, a well-engineered product can be useful -- even as a primary computer -- for years to come.

My Spring 2008 MacBook Pro (http://cl.ly/2H1l2X1Q2w181Z2P1Y3P) will be three years old this Saturday, and I couldn't be happier with it. My previous machine was a 15" PowerBook G4 purchased in 2004. Both have been fantastic primary computers. An occasional upgrade and maintenance can make all the difference in extending the useful lifespan of a machine.

A few months in, I maxed out the memory to 4GB, which is still sufficient despite running our entire stack and an IDE or VM. Last summer, I replaced the 7200RPM drive with a 160GB X25-M. A few months ago, I added a second 48GB SSD drive via the ExpressCard slot to regain a bit of the storage sacrificed by choosing a faster drive. Over the three-year lifetime of the machine (so far), these upgrades cost about $625.

During that time, the manufacturer has also done a great job standing behind the laptop, replacing the keyboard/top case, one battery, and one power adapter. I'll take it in for one last servicing before the warranty runs out (to fix an unreliable Caps Lock key and clean the DVD-RW drive I never use), and may purchase one more battery at some point. Aside from this, it's in perfect condition and plenty fast enough for Java/Scala/Python/Ruby/Android development and testing.

This computer's followed me from the week I graduated college as an aspiring freelancer through three years of building a career in software engineering. It's got some life in it yet.

DanI-S · on May 3, 2011

As much as I agree with you about 'disposable computing', he's not really advocating disposing of the computer - he's selling it to somebody else. In the process, he's allowing somebody who couldn't otherwise afford it to buy a long-lasting machine like yours.

It may even be true that he's helping the environment by allowing more people to buy longer lasting machines.

sorbus · on May 3, 2011

> he's allowing somebody who couldn't otherwise afford it to buy a long-lasting machine like yours.

Only very slightly. He's selling it for almost the same price he bought it for, but happens to have a student discount that makes it 15% cheaper.

r00fus · on May 3, 2011

Apple makes it VERY easy to migrate your entire system across machines, thus lubricating this entire process.

That combined with resale make this kit very attractive to their customer base who can instantly choose to be in your camp or the author's camp.

On the other hand, the market probably doesn't value modifications or add-ons as well, so re-selling my year-old MBP with optical replaced by SSD, 750GB 7200RPM rust platter, and 8GB RAM would probably not be "cost effective".

evgen · on May 4, 2011

Lucky for you that that RAM, SSD, and additional drive in the Optibay are fairly standard and can follow you to the new system unless Apple uses different RAM in the next rev. I am in a similar boat and the only thing that gnaws at me is that when I bought the 8G of RAM it was around $600 and now you can get it for close to $200 -- the RAM has depreciated faster than any other part of my laptop.

r00fus · on May 4, 2011

Hate to add to your pain, but I actually bought my 8GB (2x4GB) kit for $80 (was about 2 months ago).

Aside from an MB Air I bought for my dad (non-upgradeable RAM), I always tell folks to go minimum on RAM spec and find suitable add-on RAM, often 3-6 mo later.

cscotta · on April 21, 2011

A few of these options are good in principle, but are not necessarily informed by the reality of operational experience with the more-common failure modes of AWS at a medium to larger scale (~50 instances +).

The author recommends using EBS volumes to provide for backups and snapshots. However, Amazon's EBS system is one of the more failure-prone components of the AWS infrastructure, and lies at the heart of this morning's outage [1]. Any steps you can take to reduce your dependence upon a service that is both critical to operation and failure-prone will limit the surface of your vulnerability to such outages. While the snapshotting ability of EBS is nice, waking up to a buzzing pager to find that half of the EBS volumes in your cluster have dropped out, hosing each of the striped RAID arrays you've set up to achieve reasonable IO throughput, is not. Instead, consider using the ephemeral drives of your EC2 instances, switching to a non-snapshot-based backup strategy, and replicating data to other instances and AZ's to improve resilience.

The author also recommends Elastic Load Balancers to distribute load across services in multiple availability zones. Load balancing across availability zones is excellent advice in principle, but still succumbs to the problem above in the instance of EBS unavailability: ELB instances are also backed by Amazon's EBS infrastructure. ELB's can be excellent day-to-day and provide some great monitoring and introspection. However, having a quick chef script to spin up an Nginx or HAProxy balancer and flipping DNS could save your bacon in the event of an outage that also affected ELBs, like today.

With each service provider incident, you learn more about your availability, dependencies, and assumptions, along with what must improve. Proportional investment following each incident should reduce the impact of subsequent provider issues. Naming and shaming providers in angry Twitter posts will not solve your problem, and it most certainly won't solve your users' problem. Owning your availability by taking concrete steps following each outage to analyze what went down and why, mitigating your exposure to these factors, and measuring your progress during the next incident will. It is exciting to see these investments pay off.

Some of these:

– Painfully thorough monitoring of every subsystem of every component of your infrastructure. When you get paged, it's good to know exactly what's having issues rather than checking each manually in blind suspicion.

– Threshold-based alerting.

– Keeping failover for all systems as automated, quick, and transparent as is reasonably possible.

– Spreading your systems across multiple availability zones and regions, with the ideal goal of being able to lose an entire AZ/region without a complete production outage.

– Team operational reviews and incident analysis that expose the root cause of an issue, but also spider out across your system's dependencies to preemptively identify other components which are vulnerable to the same sort of problem.

---

[1] See the response from AWS in the first reply here: https://forums.aws.amazon.com/thread.jspa?messageID=239106&#...

cscotta · on Feb 23, 2011

Pushing a language, its libraries, and extensions which have traditionally presumed linear execution into a threaded/parallel world is a minefield. I'd like to offer a few thoughts on some factors which might have contributed to this complexity, to reframe the question of what might constitute the "best" concurrency strategy in terms of tradeoffs, and conclude with a call for us to take up the task of reasoning about our programs in a parallel context.

We've seen an explosion of interest in non-threaded, single-process approaches to concurrency in the Ruby and Python communities in the past couple years. Much of the difficulty here lies in frameworks, libraries, and development paradigms which were not designed to be threadsafe (or to teach threadsafe programming) from the start. Developers and library authors have for years been able to operate under the assumption that Ruby code in a single process is executed serially and not in parallel. In environments which permit parallel, concurrent execution inside the boundaries of a single process, authors must return to the foundations of what they've created and scrutinize every bit, reconceptualizing their programs in terms of shared state and mutation.

Individual developers relying on these libraries must also scrutinize each and every gem they require to ensure that the code is threadsafe as well. This is not an easy task, as reasoning about state is inherently difficult, especially when the original program may not have been designed with concurrency in mind, and even moreso when one was not the original author. Further, we haven't seen a significant chasm between things which "are" or "are not" threadsafe emerge in the Ruby community, and it is not common to certify one's code as "threadsafe" on release. That said, it's commendable that the Rails team has worked diligently to piece through each component of the framework and certify it clean.

I do not mean to suggest that multithreaded code written in Ruby is uncommon, unsafe, or a bad idea in general. Many applications running in environments without a GIL/GVL such as JRuby utilize this functionality effectively today. What I'm suggesting is that process-based concurrency (Mongrel/Passenger/Unicorn, et al) is often favored by developers because it eliminates a large swath of potential pitfalls (or rather, trades them for increased usage of system resources).

In light of this, we've seen developers in the Ruby and Python communities experimenting with and popularizing alternate concurrency models, not the least of which include cooperatively-scheduled fibers/co-routines and evented programming. These approaches avoid a specific subset of the challenges in concurrent multithreaded execution, while enabling limited concurrency in one process. Matt is quick to point out that these models don't achieve true concurrent parallelism, but do offer significant benefit over standard serial execution. At the same time, they impose a different sort of complexity upon the programmer -- either requiring her to reason about a request or operation in terms of events and callbacks, or to use a library to cooperatively schedule multiple IO-bound tasks within a single VM (which may hide some complexity, but introduce uncertainty by clouding what is actually happening behind the scenes).

I wish Ruby and Python the best here. I have a significant investment in both. But as long as developers must ask of libraries "Is it threadsafe?" with fear and negative presumption, traditional multithreaded concurrency might not be ideal given the investment required to achieve it safely and correctly.

More than anything, I hope for continued complex reasoning and thought on this question. We simply must move beyond reductionist statements such as "threads are hard!" to progress. I'd suggest that threaded programming in and of itself is not hard per se -- reasoning about shared state is, and the developer bears a responsibility to either eliminate, or minimize and encapsulate it properly in her code. Evented and coroutine-oriented programming brings its own challenges. Actor models are interesting and may be appropriate for some contexts. STM's pretty cool too, but it would not be proper to hang one's hopes upon something which is not likely to cure all ills.

There is no panacea for concurrency. There will always be challenges and tradeoffs involved in developing performant, efficient applications within the constraints of both hardware and programmer resources. We would do well to push ourselves toward a greater understanding of the challenges in developing such programs, as well as the tradeoffs involved with each approach. I think this article is a step in the right direction.

mattetti · on Feb 23, 2011

@cscotta as the author of the blog post, I can only agree with the concerns you raised and thank you for your kind words.

Concurrency is something both Ruby and Python have to take seriously pretty quickly if both communities want to mature and play a more important role in the coming years. As you pointed out, there are different ways to get some sort of concurrency and they all present challenges and tradeoffs while none will cure all ills. But at the end of the day what matters is a community of people embracing these approaches and pushing the language further.

By removing the Global Interpreter Lock most of the alternative Ruby implementations are making a statement and the community seems to react. If we educate the community, commonly used code will become threadsafe over time and developers will learn what it means to write threadsafe code. If made easy, co-routines and non-blocking IO will be used. If implemented well and properly explained, actors will be used more often when it makes sense to do so.

My goal was to try to simplify the concurrency problem so the community as a whole can discuss the topic. I think that the concurrency challenge can be improved by increasing the awareness, getting people motivated and having an open discussion. There is a reason why Ruby and Python still have a GIL and why removing it to improve concurrency would have its downside. I want to make sure people understand that before blaming Matz for not removing the GIL already. The Ruby implementation fragmentation also hurts progress done in MRI, if you look at it, alternative implementations often have more people working on them than people working on MRI!

Let's hope that this article is indeed a step in the right direction.

cscotta · on Feb 23, 2011

Absolutely - thanks, Matt. This is a very well-written, thorough article, and it provides a clear, thoughtful survey of the landscape for folks who are interested in parallelizing their programs and learning more about the advantages and tradeoffs implicit in different approaches. This piece has a lot of potential to catalyze thought within the community about how best to move forward.

The explosion of interest in the subject intrigues me. The work that's been going on in terms of making mainstream Ruby libraries threadsafe is great to see, along with fibers and evented approaches. MacRuby's exposure of Grand Central is especially unique. But interest and upvotes must also be followed by action. There will be a lot of false starts. We'll probably see a handful of approaches which bloom and fade from popularity. While the "fragmentation" issue is there, I tend to think of it in terms of spirited experimentation. In any case, people really care now, and it's that drive that forces a community forward. That's good to see.

Keep it up!

cscotta · on Jan 24, 2011

"It's maddening to see an article that you think will let you make progress in a problem you're working on, but unable to access it because of the $15 fee..."

Enough progress to justify spending $15 (less than the price of dinner downtown)? I agree, many find the revenue models / cost structures of professional organizations outmoded. However, I find this rather harsh aversion to paying a few bucks for peer-reviewed, scientific content curious.

If it's of any help to you, many public universities, schools, and local libraries provide full access to ACM content, often both on-site and off. The most common method of providing access to this is via EBSCO's "Business Source Premier" database. I frequently read ACM articles from my desk via the web; a quick title search in EBSCO will pull up the title in about 20 seconds, and I can download a PDF of about any article published since 1965 to send to coworkers.

That said, if price is an issue, please check with your local library. Odds are good that your tax dollars are already paying the cost for you to read these articles from the comfort of your home or office. This isn't just true for the ACM -- even in the age of paywalls, your library's probably been quietly working to provide digital access to all of this for the past decade.

jwr · on Jan 24, 2011

You mean, check with my local library here in Poland, right?

Don't place me in the "doesn't want to pay for content" box. I am OK with paying for content and I do pay for many things online. But there are two issues with ACM:

1. The research has been paid for with taxpayers' dollars (I wasn't the taxpayer, but still).

2. $15 is really expensive.

And of course, if it were just one article that would advance my work a lot, I'd gladly pay. But you don't know that ahead of time. And if you're building startups, you usually do a lot of wide-area research, so it isn't that one article, it's hundreds of articles that you need to skim through.

I also don't buy the argument that we need to pay so much just so that we get peer-reviewed content. JMLR (Journal of Machine-Learning Research) is a prime example that this need not be the case.

Zak · on Jan 25, 2011

You can put me in the "doesn't want to pay for content" box when it comes to science. Science, including computer science works best when new discoveries are spread far and wide free of charge. Journals make their money by securing publication rights in exchange for deciding that something is important enough. Once, it was difficult to publish information to a wide audience, but in the web age, journals seem like a bit of a scam to me.

regularfry · on Jan 25, 2011

The curation job still needs paying for, but I think it's pretty clear that the ACM and others have strayed from that to trying to squeeze the long tail for as much money as they can can get.

lkozma · on Jan 25, 2011

Besides, does peer-review cost anything for them? The one time I was asked by an ACM journal to review a paper for them, there was no monetary exchange involved.

voidpointer · on Jan 25, 2011

Yes, or the library of the nearest university will be likely to grant you access for a small fee even if you are not a student there, plus you will also get access to their books.

Maro · on Jan 24, 2011

Problem is, most papers aren't that good or are not relevant to what you're doing, but it's usually hard to tell based on the abstract. When I do research in a subject, I find ~10 papers out of which I'll read (parts of) maybe one. If I were to buy each of them for $15, I'd end up paying $150, and only get a few pages of read useful material out of it.

grhino · on Jan 24, 2011

Exactly, without a strong reference from someone you trust who also knows the subject, you have no idea if the article is worth $15. And then you could probably just make a copy of their copy.

sh4na · on Jan 24, 2011

There is no way you can know if the content of the article will actually be useful - there goes $15 down the drain. if my local library has ACM access, yay... but it doesn't. So they get content for free from academics, sell it at a profit, while the academics get nothing for their work and have their content limited to an audience that might not be the one they actually want to reach - it could be, but they've just signed away their content for nothing and have no control on who can can see it, not even they can see it... Some academics publish just because they have to, so ACM is fine for them, but a lot are not like that, and ACM is taking advantage of the situation.

The only reason I would pay for content is to give back to the author for taking the time to commit to paper the information that was useful to me. Paying to some organization that's taking advantage of people for profit... no, thank you.

munificent · on Jan 24, 2011

> However, I find this rather harsh aversion to paying a few bucks for peer-reviewed, scientific content curious.

I already paid for it. A fraction of my taxes go to funding educational organizations so that they can produce these papers.

Even then, if the $15 actually went back to the author, I'd be OK with double-dipping, but it doesn't.

me_again · on Jan 25, 2011

Just because an organization receives some tax-based revenue from the government doesn't mean their operations are or can be fully-funded by that

jacquesm · on Jan 24, 2011

I'd gladly pay the author of the paper, the ACM not so much.

repsilat · on Jan 24, 2011

If you know the article you want to read and immediacy isn't a concern you can probably look the authors up and send them an email. Many would be happy to send you a pdf.

petercooper · on Jan 24, 2011

I've frequently Googled paper names from ACM and found that the author has put them on their own site as direct downloads. So, yes, I definitely second you but.. check their site first before you e-mail them too ;-)

quanticle · on Jan 24, 2011

It depends on the article. Yes, there are some articles I would gladly pay $15 for (like Boneh/Shaw's work on collusion secure fingerprinting). However, in a lot of cases, I'm going to read a paper for its citations. If I have to pay $15 for that paper, and then pay over and over again for each additional paper that I want, the costs quickly become unsustainable.

Though, as the replies to my question (http://news.ycombinator.com/item?id=2133193) point out, many times you can get a paper that you need by contacting one of the authors, so it the payment isn't as big of an issue as it could be.

jonhendry · on Jan 26, 2011

"If I have to pay $15 for that paper, and then pay over and over again for each additional paper that I want, the costs quickly become unsustainable."

That's a clue as to who the pricing is geared towards: staff of deep-pocketed corporations and institutions, not individual web users.

CamperBob · on Jan 25, 2011

/r/scholar on Reddit is very useful for this. A polite request almost always brings a .pdf within a few hours, or a day or so at the most.

However, I find this rather harsh aversion to paying a few bucks for peer-reviewed, scientific content curious.

I don't have a harsh aversion to paying for content, unless I've already paid for it with my taxes, which is often the case with scientific papers.

I do have a harsh aversion to rent-seeking middlemen who do nothing but put up a website and collect $15 fees to let me download other peoples' work.

cscotta · on Nov 26, 2010

Mike's a great guy, and I appreciate his post but would respectfully disagree with the proposal to strip the standard library down to barebones and break out the removed classes into gems (along with the post's title to which I object, but will not take issue).

Despite its lack of use in web projects, DRb is one of Ruby's most interesting modules. Geoffrey Grosenbach's recounting of _why's 2005 FOSCON presentation is among the highlights - see here for an example: http://www.smashingmagazine.com/2010/05/15/why-a-tale-of-a-p...

The English RDoc is eminently fixable as well, whether by a core contributor or not. In fact, the first commenter on the post highlights a quick guide to contributing to the language.

More importantly, the process for formalizing the Ruby language into an international specification is progressing nicely, first with Japan's ISC committee, and later with ISO. While the draft specification is restricted to the language itself and does not include several components of what we know as the standard library, it would be prudent for the language itself to continue to stabilize. You can find the 12/2009 draft spec here: http://ruby-std.netlab.jp/draft_spec/draft_ruby_spec-2009120...

Finally, many groups of developers have made tremendous progress on non-MRI/YARV implementations, such as JRuby, Rubinius, IronRuby, and MagLev, with others such as SAP's BlueRuby continuing to show promise (passing over 75% of RubySpec as of last year). Sweeping changes to the language and standard library impedes the maturation of these implementations, placing additional burden upon their developers and sponsors who are already doing tremendous work to advance the state of the language across multiple platforms.

We would also do well to remember that the community of English-speaking web developers using Ruby represents a small subset of the Ruby community as a whole. Though some of us might not find daily use for all of the modules in the standard library, it needn't be assumed that they should be removed and/or broken out.

Many languages experience similar growing pains. The Python standard library contains a few modules which seem out of place to most web developers, the JDK is a bit of a grabbag, and the .NET class library has a handful of oddities. Nonetheless, it's important to remember that many people use these components and they work quite well. What some may decry as cruft and stagnation, others might regard as a sign of stability and maturity.

What I appreciate most though is the civility of this discussion. Between Mike's first post, Eric's reply, and this follow-up, the conversation's professionalism and camaraderie is impressive. It's great to see programmers coming together to discuss improvements to the language and its direction in a calm, refreshing environment. I haven't written much Ruby in the past year, so it wouldn't be appropriate of me to offer an extended post as a reply, but I'm glad to see the discussion happening.

grandalf · on Nov 26, 2010

I agree. Mike is a great guy but the flamebait headlines are really not necessary.

HN For You