For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more mayank's commentsregister

I agree! But —

- BigQuery is available on GCP

- Bazel is an open source Blaze

- google source formatting (for java at least) is open source

There are probably more…


For those of us stuck on AWS it's sad not having BigQuery, but the thing that really gets me is not having Dataflow

Most of industry still seems unaware that no-knobs data query and pipeline systems even exist. If I only had a dollar for every time I saw a PR tweaking the memory settings of some Spark job or hive query that stopped running as the input data grew....

I'd love to see more people write their workflows using the Apache Beam API so they'll have the option to switch to a no-knobs, scalable pipeline engine in the future even if they're not using one today.


Blaze/Bazel actually sucks imo. The only good thing about it is that all of google uses that one piece of shit, making things nice and uniform and consistent.

There's a reason it isn't popular outside of elgoog.

IRL, every project tends to do a little bit or a lot of its own thing.


Bazel can be clunky, but not having some bazel equivalent can have very significant costs that are easy to get accustomed to or overlook.

Things like engineers losing time wondering why their node dependencies weren't correctly installed, or dealing with a pre-commit check that reminds them they didn't manually regenerate the generated files, or having humans write machine-friendly configuration that's not actually human-friendly because there's no easy way to introduce custom file transformations during the build.

Bazel doesn't spark joy for me and I wouldn't say I look forward to using it, but personally I would still always choose it for a codebase that's going to have multiple developers and last a long time. It's vastly easier to go with bazel from the beginning than to wish you could switch to it and realize people have already introduced a million circular dependencies and it's going to be a multi-month or multi-year process to migrate to it.


In my experience, Bazel is a net negative for most teams. Pretty much every JS engineer is familiar with npm; a tiny fraction are familiar with Bazel. Ditto with pip, cargo, etc. And it doesn't solve the hard part of the build process, which is distributed builds. Most of the user-perceptible value of Blaze is making builds fast by farming them out to a zillion machines — that's why it's called "blaze," because it's fast! — and Bazel doesn't do that for you.

And it's clunky, and you need to teach every new hire how to use it. The juice just isn't worth the squeeze. Just write an adapter in your build infra for the well known tools and be done with it. You'll get much more value putting work into something else, like code review tools, testing, dev environments, staging...


Being familiar with npm doesn't save the time it loses you on its clunkiness. I see for myself and other engineers around me that we waste real time every week clearing our yarn caches, waiting for "yarn install", realizing we forgot to run "run install" after syncing and we have to re-merge and re-upload, etc.

Here's a real-word example from last week: I spent several hours doing creative hackery trying to convince a particular compiler to compile the same file using different dependencies depending on context. That would have been a few minutes of work with bazel, but when your build process consists of "run this off the shelf compiler" and the compiler doesn't have any native support for building two different versions of the same thing with slightly different dependencies then you're in trouble.

Teaching new hires bazel is a one-time cost, and the only thing they'll need to get started on their first day is "bazel build <target>" and "bazel test <target>". When you don't have bazel, on your first day every new hire has to read through a gigantic wiki page explaining how to set up their dev environment (with different sections for mac and linux and special asides describing what to do if you're on a slightly outdated version of the OS, and then the wiki gets out of date and you have new engineers and the infra team all wasting their time debugging why the instructions suddenly stopped working for people on slightly newer machines, etc.)


There are many remote build backends for bazel available - some open source and some proprietary (EngFlow and Google’s RBE). Even without a build farm it will still be a huge performance boost due to caching - something make and co (recommended in sibling thread) can not do along with a bunch of other stuff that bazel does for you.


I've heard the "huge performance boost" argument before, and in my experience it's often kinda marginal, or even negative because the native build tools have been optimized for their specific task. (Sure, I'll give you that it's better than Make, but is it faster than the Go compiler?)

I wasn't familiar with EngFlow; it looks like it's a startup that raised seed funding less than a year ago. I think what you're referencing with "Google RBE" is a Google Cloud project now rebranded to "Cloud Builds" — which supports the underlying native tools like npm, pip, etc without requiring you to switch to Bazel.

Bazel is better than Make for building large C/C++ projects (although it's hardly the only game in town for "better than Make"). But aside from that use case, in my experience it's not really worth the hassle. You can get most of the benefits you want without using it, and people are already going to be familiar with the tools native to the ecosystems they work in like pip, npm, etc.


> (Sure, I'll give you that it's better than Make, but is it faster than the Go compiler?)

It depends on what you're doing. If you're compiling pure go code that's truly only go, no go build will be faster. But if you have cgo or generated code or generated data or config files or...then well maybe you want something that is more flexible. And of course if you aren't building just go, then things get complicated fast. What if you have a backend and a frontend? `go test` probably isn't running your webdriver tests that compile your typescript somewhere in the build pipeline. Having a unified toolchain with one command (`blaze test //...`) is valuable compared to various test.sh (or make layered on top of n independent build systems or...)

And of course if you're like me and need to do things that involve reasoning or searching about code and dependencies, blaze is super necessary. "Find all of the dependencies of this file and test only those" is a question that most build tools aren't even remotely equipped to answer.

So in polyglot situations, bazel and similar prevail I think, but there's absolutely a point below which you don't care (and that point is going to be basically only hit by applications, not libraries).


Depends on which Make we are talking about, commercial Make tooling like ClearMake certainly can do it, since the late 1990's.


I actually worked with clearmake and bunch of other rational tooling in one of my early gigs and don’t remember much in a way of cache improvements. Soon as you’re doing some non-sandboxes io which make-based tools totally allow it’s out the window anyway


Curious how clearcase works I found https://www.ibm.com/docs/en/rational-clearcase/7.1.2?topic=s.... They appear to be hijacking the open() and write() syscalls, so in some way they actually do have a sandbox that provides more accurate knowledge about the build graph than what the makefiles tell! Otherwise yes, makefiles themselves are very unsafe and frequently have errors such as underspecified dependencies or implicitly produced outputs, with no guardrails that can prevent them.

Wether or not that sandbox blocks incorrectly declared dependencies is unclear. Last time I used clearcase many eons ago it surely did not. Our project had tons of classic makefiles issues like not depending on included headers. Remote builds were also magnitudes slower than local builds, our network was maybe not the best, also reading the page above how the “shopping” algorithm works you can imagine it being fairly slow anyway. Maybe that was best, imagining how incorrect dependencies mixed with remote caching would result gives me nightmares.


All object files and libraries would be cached and shared across builds using the same views (Derived object sharing).


Given how Borg unleashed on the world as K8s came to dominate orchestration, Bazel's lack of similar uptake outside of the Google walled garden is indicative that it's not solving problems for non-Google teams significantly better than existing build tools.

Look at how much pain many companies have gone through to move to K8s from existing infrastructure, there is perceived value driving that.

Bazel lacks that perception of value.


>there's no easy way to introduce custom file transformations during the build.

Every real-world build system I’ve seen provides that functionality. In particular a standard make(1) can do it just fine.


make(1) has no native support for giving each build rule its own sandboxed view of the filesystem like bazel does.

If I could have a wish to upgrade file transformation with make(1), I'd probably want a widely-available, standard, simple command to make a rule-specific virtual filesystem that overlays a configurable read-only/copy-on-write view of selected existing files or directories behind a writable rule-specific output directory.


Why would you want it to mess with sandboxes? It’s a build system. There are other mechanisms for sandboxing, no need to reinvent the wheel.

Basel is as non-standard as it gets - essentially yet another Google’s case of NIH - but apart from that, how is an ad-hoc single-use pseudo-language better than reusing standard mechanism? To me it’s just a bad design.


I don't particularly want to "mess with sandboxes", but I do want my builds to be relatively fast, correct, reproducible, extendable/customizable, with bonus points for being secure (meaning a compiler shouldn't be able to tamper with parts of the output it has no business tampering with) and more bonus points for supporting distributed builds and/or distributed caching

If someone wanted to make a new build system to compete with bazel and have those kinds of features, it's probably a safe bet the competing system would use some kind of sandboxing as well

Even if you ignore everything else, just the security part is a big deal: supply chain attacks are an increasingly big concern for companies of all sizes. If your build system allows any script invoked during any part of build process to secretly read or modify any input or output file, hackers are going to love it.

Almost all tech companies (even the multi-billion dollars ones) that aren't doing something in the spirit of `bazel build` to generate their binaries have wide open, planet-sized security holes in their build systems where if you get one foot in the door you can pretty much do anything.


Are you calling bazel a nonstandard single use pseudo language, but make a standard tool?

That's just an argument from tradition.

And you want sandboxing because that's what gets you good caching. The value of bazel is never having to run make clean because artifacts aren't correctly being built from cache. Having no distinction between clean and incremental builds is really nice.


That is an argument from tradition - which you yourself have brought up, calling Bazel a "standard way".

>The value of bazel is never having to run make clean because artifacts aren't correctly being built from cache.

You can get that with make(1) too, check out FreeBSD's META_MODE for one example. And it didn't require reinventing the wheel.


> which you yourself have brought up, calling Bazel a "standard way".

I didn't do any such thing. My point is simply that make and bazel are similarly "nonstandard single use pseudo languages". In many ways, bazel is superior to make from a language perspective (it resembles other languages more closely, being a dialect of python, and avoids the loadbearing tab issue), so I think I could make the argument that bazel is in many ways less nonstandard, but make is certainly more common than bazel, so it could go either way.

> You can get that with make(1) too, check out FreeBSD's META_MODE for one example.

This suffers from the same issues that natural make does (notably the whole mtime thing). See https://apenwarr.ca/log/20181113 for a much better explanation than I can provide as to why make's entire model of "change" is irreparably broken, and why hash (+sandbox!) based approaches (which bazel and redo and nix and cargo and nearly every other modern build tool use) are far superior.

> And it didn't require reinventing the wheel.

You call inventing a new syscall to not-even-fully fix a limitation of the tool not reinventing the wheel? Like I guess its not, it's just like building a weird grand shrine around the broken wheel. It's far worse. I don't want to need to change my operating system to have make work better, but still worse than the alternatives. That's simply not a compelling argument.


As a googler, I agree with this. Blaze itself is not that remarkable. What makes it good is that it is the only tool you need to use to build anything, in any language, in Google's monorepo, and that build failures are (usually) aggressively corrected. Plenty of build systems these days support varying levels of hermetic builds. The important thing to copy from Google here is the discipline and consistency around using one build tool and fixing build failures, not the specific tool itself.


It isn’t popular? Like “uber and a dozen of similar sized companies and some major oss projects using it” not popular?


Uber engineering has a few gems but for every 1 gem they've hired 10 or more greedy idiots, so maybe not a great example?

If you love fighting with your hands tied behind your back, choose Bazel.

Otherwise, be pragmatic: Learn Make, Maven, and Gradle; then you'll be well-equipped for 95-99% of cases. Thankfully pip and npm are as straightforward as it gets.


> Uber engineering has a few gems but for every 1 gem they've hired 10 or more greedy idiots, so maybe not a great example?

What an odd sprinkling of something entirely personal.

> Otherwise, be pragmatic: Learn Make, Maven, and Gradle; then you'll be well-equipped for 95-99% of cases.

There's a time and a place for Bazel. Very large monorepos like those at Pinterest and Uber, with cross dependencies, and written in multiple languages benefit a lot from the remote backend and distributed cache of built artifacts.

Make, Maven, and Gradle, even only for JVM based projects seem to not be entirely comparable.


Can you explain why you bother with pip and npm if php and plain js already cover 95-99% of cases?


Because I do whatever it takes to get the job done.

I also appreciate what Python and Javascript offer, there are some amazing libraries and tools tied to those ecosystems.


> The big difference now is the convergence of code written for both server and client, and compilers which help strip down and optimize what happens in the client.

I understand the spirit of your comment, but this was/is also true of Google Web Toolkit (GWT).


I’m going to have to take that on faith, their site doesn’t appear to ship the JS necessary to open the nav menu. But clicking through a few links confirmed what I recall: the major difference (apart from language) is the component/templating approach. Not that one is inherently better than the other (though I do personally prefer JSX), but bringing this concept to a dev environment which thus far mostly lacks it is a good thing for users. And with more flexibility in variety, there’s better odds users will get that experience.


Yes, but modern DI (like Dagger for Java) can detect cyclic dependencies at compile time and break the build.


Curiously i've actually seen people use @Lazy in Spring projects to allow for circular dependencies, to deal with situations where the services or their dependencies aren't structured like a neat tree with leafs, but rather some interdependent graph with cycles.

Honestly, in practice it worked and wasn't too bad to work with, which was interesting to behold - in practice everyone talks about how circular dependencies are bad for a variety or reasons (trying to print or process data and ending up with endless loops comes to mind), but then there just was that system that was chugging along without a care in the world.


I know, but just thinking though what you're doing and building a program for right now, rather than building for a future that may never happen is probably going to deliver more success. I like constructor based dependency injection personally. It's simple and I don't really see a stack of value obsessing about the possibility of code reuse.


I can detect a cyclic reference before I even finish writing the code, what's the point of letting the compiler figure it out?


> I can detect a cyclic reference before I even finish writing the code, what's the point of letting the compiler figure it out?

The compiler generally has better attention to detail and the ability to deal with larger object graphs than the typical human.


Sure, but how are you going to write code that uses a class with a circular dependency?


> Aurora on spot

How would that work for a database? And have you considered or tried Aurora Serverless?


I guess the same way as aurora serverless. In general, RDS uses a seperate storage layer from the actual instances so you can do vertical scale/upgrades with 0 downtime (either read replica goes down or replica becomes master)


We run 3 nodes 24/7 (writer + 2 readers) and have RIs for them. But during daytime hours we autoscale and run an addition 8 or 9 readers. Some of these run for just a few hours and could easily run on spot (especially with minimum duration).

Serverless couldn't provide enough capacity for us (at peak we use up to 300 vCPUs in this cluster). That was on v1. v2 might change that when it supports postgres support.


I don't think this is cynical, it's a restatement of the "nobody gets fired for buying IBM" adage. Choosing PlanetScale at this stage in its market penetration is almost certainly going to be seen as the riskier business choice over RDS Aurora (even though it's not quite apples:apples) or GCP Spanner (which is). You can choose to spend your innovation token there, but you better be prepared for the downside when PlanetScale goes down and everything else is up.


I have my problems with this statement mostly because IBM actually does suck. The amount of products its bought and promptly did a full thrust crash into the ground are many. AWS has its problems but 99.99999% of the time the "core" products perfectly fine, I dont know about its multitude of side-hustles though.


I think you're missing the context that the adage "nobody gets fired for buying IBM" dates back to the 1970s at least. It's not meant to refer to IBM as it is now, but rather to IBM as it was then, an industrial behemoth with a near monopoly on business machines, some of which happened to be general purpose computers.


IBM used to be good


I love the Innovation Token blog post, but. it's been a while since I've seen the OG post. Do you have a link?



An interesting aspect that I've seen missing from discussions on this is the dark growth pattern at play here.

All across the world, there are millions of languishing enterprise/educational fleets (generally Windows) administered by a single or a few "IT folks". In my experience, these people are often the only technologically aware people in the organization. This is Norton cutting them in on the grift. Install acclaimed "Norton Antivirus", which is likely already a line item on procurements, mine crypto on the sly for yourself on idle fleets at night, and give Norton a slice of the pie while you're at it.

Nobody's likely going to look twice at Norton running on a machine, we've already collectively conditioned ourselves to think of antivirus software as "slowing the machine down", and as a bonus: unless the next person in is as up to speed as your are, your grift can keep running long after you've moved on.


> There's overhead to connecting to a different cluster server each time you access a seperate shard.

While I agree with your point, this concern is easily addressed with connection pools.


> This judgment could have had devastating consequences and turned software development into a copyright nightmare.

It’d sure have made the practice of taking someone else’s API and re-implementing the innards a lot more interesting:

https://docs.oracle.com/en-us/iaas/Content/Object/Tasks/s3co...


The WINE and ReactOS guys must be cheering!


It's a win for all open software. Torvalds and Stallman didn't ask Bell's permission before re-implementing Unix


Why was this comment downvoted? Does GNU/Linux not largely reimplement proprietary Unix?


Yes why? Doesn't it?


It does. There are some who would argue that the Linux copyright fight is already resolved because of the SCO suits, but that's wrong because

A: some of those suits are still ongoing, and

B: the suits never alleged infringement based on the API alone, SCO was claiming that Linux copied functional code in multiprocessing modules (we don't know which functions because they demand secrecy, even though it's open source).

Not even SCO, trolls that they are, were insane enough to claim that the APIs themselves are copyrighted.


> I'm curious on why you call these languages "romance" languages.

That's what they're called: https://en.wikipedia.org/wiki/Romance_languages


Git has large file storage now: https://git-lfs.github.com/


That is not-git, storing files this way will not increase your repo size. Same with git-annex, same with commiting .txt files with a URL to a file on S3.


I'm using git-shell to serve my repo. Can those extensions run over git-shell? Or do I need to set up a new server?


They are separate. git-lfs needs a separate server, that will communicate with the Git client over HTTP (not SSH), and git-annex can use a variety of different backends for storing the data (which doesn't need to be related to the Git repository) such as S3, FTP, webdav, Google Drive, ... and even git-lfs.

GitLab and Gitea come with a git-lfs server, but I don't know if there is any standalone server that you could use with git-shell.


Ok, thanks. This is why I hope large file support will be integrated in Git.

It's just too much of a hassle to figure out the security implications of yet another server, and administrating it, setting up the connection between Git and the plugin, etc.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You