For the best experience on desktop, install the Chrome extension to track your reading on news.ycombinator.com
Hacker Newsnew | past | comments | ask | show | jobs | submit | history | more zjaffee's commentsregister

It's not about how much data you have, but also the sorts of things you are running on your data. Joins and group by's scale much faster than any aggregation. Additionally, you have a unified platform where large teams can share code in a structured way for all data processing jobs. It's similar in how companies use k8s as a way to manage the human side of software development in that sense.

I can however say that when I had a job at a major cloud provider optimizing spark core for our customers, one of the key areas where we saw rapid improvement was simply through fewer machines with vertically scaled hardware almost always outperformed any sort of distributed system (abet not always from a price performance perspective).

The real value often comes from the ability to do retries, and leverage left over underutilized hardware (i.e. spot instances, or in your own data center at times when scale is lower), handle hardware failures, ect, all with the ability for the full above suite of tools to work.


Other way around. Aggregation is usually faster than a join.


Disagree, though in practice it depends on the query, cardinality of the various columns across table, indices, and RDBMS implementation (so, everything).

A simple equijoin with high cardinality and indexed columns will usually be extremely fast. The same join in a 1:M might be fast, or it might result in a massive fanout. In the case of the latter, if your RDBMS uses a clustering index, and if you’ve designed your schemata to exploit this fact (e.g. a table called UserPurchase that has a PK of (user_id, purchase_id)) can still be quite fast.

Aggregations often imply large amounts of data being retrieved, though this is not necessarily true.


That level of database optimization is rare in practice. As soon as a non-database person gets decision making authority there goes your data model and disk layout.

And many important datasets never make it into any kind of database like that. Very few people provide "index columns" in their CSV files. Or they use long variable length strings as their primary key.

OP pertains to that kind of data. Some stuff in text files.


How is a proper PK choice a high level of optimization?


unconvinced. any join needs some kind of seek on the secondary relation index, or a bunch of state if ur stream joining to build temporary index sizes O(n) until end of batch. on the other hand summing N numbers needs O(1) memory and if your data is column shaped it’s like one CPU instruction to process 8 rows. in “big data” context usually there’s no traditional b-tree index to join either. For jobs that process every row in the input set Mr Join is horrible for perf to the point people end up with a dedicated join job/materialized view so downstream jobs don’t have to re do the work


An aggregation is less work than a join. You are segmenting the data in basically the same way in ideal conditions for a join as you are in an aggregation. Think of an aggregation as an inner join against a table of buckets (plus updating a single value instead of keeping a number of copies around). In practice this holds with aggregation being a linear amount faster than a join over the same data. That delta is the extra work the join needs to do to keep around a list of rows rather than a single value being updated (and in cache) repeatedly. Depending on the data this delta might be quite small. But without a very obtuse aggregation function (maybe ketosis perhaps), the aggregation will be faster. Its updating a single value vs appending to a list with the extra memory overhead this introduces.


I'm saying that a smaller amount of data means more compute is required for a join. Sorry if that wasn't clear.


What an amazing set of articles, one thing that I think he's missed is the clear multi year trends.

Over the past 5 years there's been significant changes and several clear winners. Databricks and Snowflake have really demonstrated ability to stay resilient despite strong competition from cloud providers themselves, often through the privatization of what previously was open source. This is especially relevant given also the articles mentioning of how cloudera and hortonworks failed to make it.

I also think the quiet execution of databases like clickhouse have shown to be extremely impressive and have filled a niche that wasn't previously filled by an obvious solution.


Montana likely passed such a law because they have a governor and a senator who both came from the tech sector in big ways (sold the same company to oracle).

That said, there are a lot of other legal hurdles that would prevent Montana from ever being significant to the tech sector, despite the fact that I'm certain many skilled people would love to live there. From being the only state to not have at will employment to having a completely out of wack tax system (ratio between income and sales tax for a state entirely dependent on tourism), to countless restrictions (and often necessary because of water restrictions) on building large amounts of new housing, it just sin't happening.


AWS (along with the vast majority of B2B services in the software development industry) is good because it allows you to focus on building your product or business without needing to worry about managing servers nearly as much.

The problems here are no different than using SaaS anywhere else in a business, you can also run all your sales tracking through excel, it's just that once you have more than a few people doing sales that becomes a major bottleneck the same way not having an easier to manage infrastructure system.


I couldn't agree more, there was clearly a big shift when Jassy became CEO of amazon as a whole and Charlie Bell left (which is also interesting because it's not like azure is magically better now).

The improvements to core services at AWS hasn't really happened at the same pace post covid as it did prior, but that could also have something to do with overall maturity of the ecosystem.

Although it's also largely the case that other cloud providers have also realized that it's hard for them to compete against the core competency of other companies, whereas they'd still be selling the infrastructure the above services are run on.


“Avinatan has come home”: Jensen Huang hails release of Nvidia engineer after two years in Hamas captivity Nvidia’s CEO shared the emotional news with employees after Avinatan Or, who was among those kidnapped from the Nova music festival on October 7, 2023, was freed on Monday


And the reasons for this are increasingly clear. In a globalized world, you need large-scale organizations to compete. Smaller nations are increasingly forced to become highly specialized in a few specific industries, often where companies are sold to major firms from allied countries (large parts of europe or israel, singapore), or you end up with individual companies constituting a significant portion of the national GDP (korea).

The way in which the US is able to weld such power on the world stage, especially with the rise of China is we don't constantly break up every rising business.


Except this isn't about H1B this is about the PERM process for EB2/EB3 greencards.

The truth is we should be much more open to temporary work permits, and much less open to this sort of thing for granting permanent residency. Tons of people getting employment based green cards hold jobs that could easily be filled by an American.


"You can only stay in the country if you're sponsored by an employer" creates an environment where workers have low bargaining power, decreasing the pressure for good working conditions (e.g. high pay), which – among other things – has impacts on the working conditions for locals. One might say it "affects what the market will sustain" (personally, I don't think calling everything a "market" is insightful).

From a purely economic perspective, the ideal is no borders, and total freedom of movement – but, of course, there are reasons that people don't want that: the real world doesn't run on economics. Pretty much all of these measures are compromises of some description, with non-obvious (and sometimes delayed) consequences if you start messing about with them. Most arguments involving "$CountryName jobs for $Demonym!" ignore all that, and if that leads to policy decisions, bad things happen. (That's not to say there's no way to enact protectionist employment policies, but you'd need to tweak more than just the one dial if you wanted that to work.)


From an economic perspective the ideal is no borders if there are no significant differences between countries that would create an infinite surge in mobility. It's like electrical current, if there is zero resistance and a difference in potential, any short circuit will potentially destroy the entire circuit.


The "infinite surge in mobility" phenomenon only occurs if we model countries as infinite sources / sinks of people, and assume population movement has no impact on either country. Given both of these assumptions, the predicted phenomenon wouldn't cause any problems. Of course, neither assumption holds in real life; and if you re-do your models with more sensible assumptions, the phenomenon goes away.


> Tons of people getting employment based green cards hold jobs that could easily be filled by an American.

Could be filled by an American, sure. Is the American willing to do the work? Probably not...

This is not a uniquely American problem.

In tech, I've always felt it was hard to hire Americans because it seems there's such a push for degrees in business/law etcetera as opposed to engineering.


How hard are you looking? I was looking early last year and despite hundreds of applications, got nothing but automated rejection emails, if that.

I also know many new grads looking for jobs and having a lot of trouble.

Unfortunately, their experience is telling their younger peers not to go into tech - it's full.


I'm not the first filter, there's a recruiter upstream for me. And this wasn't for new grads but senior positions.

What I'm trying to say is that all the 'good' resumes that made it through were almost exclusively for non citizens or naturalized people.


I’d qualify as a senior and like I said, hundreds of apps and not even an interview - very different from 5+ years ago, where almost 50% of apps resulted in an interview.

When you’re a hiring manager, you need to do whatever it takes to be the first filter, or at least get the permissions needed to see candidates excluded by recruiting/hR.

This is crazy and I don’t understand it but HR and recruiters do not pass along the majority of strong candidates. I have no idea why, often the resumes are indistinguishable from ones they forward on, and plenty of the candidates they forward to me are just prima facie not qualified.


You cannot compare between years like that. There are ups and downs, currently we're certainly not in an up except for special skills I guess.

5 years ago all of big tech massively overhired, they let go a lot of people later, so that's not a fair comparison.

Also, you cannot expect a hiring manager to do everything. If the company decides I shouldn't be spending my time screening candidates then that's not what I do.


It sounds like you’re saying the job market imploded in the last five years. In that case, it seems like we should halt h1b visas until it recovers.

> Also, you cannot expect a hiring manager to do everything. If the company decides I shouldn't be spending my time screening candidates then that's not what I do.

Maybe it’s different for you. I hire people I have to work with, so I am going to do whatever it takes to make sure I get good candidates. I can’t imagine a better possible use of my time.


> It sounds like you’re saying the job market imploded in the last five years. In that case, it seems like we should halt h1b visas until it recovers.

In tech, yes. In general I don't know and not all h1b's are tech

> I hire people I have to work with, so I am going to do whatever it takes to make sure I get good candidates.

Same, but that doesn't mean I'm going to do the work someone upstream from me has already done again


Americans would be more willing to do the work if they salary was higher, and the salary would be higher if the supply of workers was reduced due to not allowing cheap imported labor.


Americans aren't willing to pay the prices needed for the vast majority of things to be made in America or made by non immigrants. Immigrants will do the hard work in very bad conditions by American standards for very little money.

To me it's hilarious how on the one hand America is outraged about how all manufacturing has left the US, then after venting about that they buy a super cheap phone charger on Alibaba...

Put your money where your mouth is. If the customer had rejected overseas cheaper products then more jobs would've stayed in the US. Those salaries are a lot higher though so the products are more expensive...


It sounds like we need high tariffs to exclude products made in countries without living wages and strong worker protections from the American market, in addition to cutting off the pipeline of cheap labor to the US.


They might be living wages in those countries. You can save a lot of money by not living like the average American.

It's the standard of living that Americans expect. In order to afford that you need x amount of money. For example, if people in a different country don't need a car (let alone 2) and live in a 800sqft home with a family of 4. What does that mean for an acceptable minimum wage?

I don't even know what you mean by cheap labor. If you mean illegal practices below minimum wage, sure. But the average farming salary for example is over 17 [1] dollars an hour. Meanwhile in China, the average manufacturing salary was 97500 yuan [2], which is ~13680 dollars a year. That's 13680/12/168 = 6.8$ an hour.

So knowing this the basic question is: Is the American consumer willing to pay more for the same product because American workers need to be paid 2.5x more. The answer is just simply no.

Can you impose tarifs to offset that difference? Sure, the end result cannot be anything other than prices going up

1: https://www.indeed.com/career/farm-worker/salaries 2: https://www.statista.com/statistics/743509/china-average-yea...


As someone who worked in the farming and restaurant industries, and whose family continues to work in that and construction, it’s always baffling to me to see people insist Americans just won’t do it.

But yes, undercutting the labor market with immigration policy is wrong for Americans as a whole and a big giveaway to the business class. Yes, paying Americans a higher labor rate would raise prices to their natural level (much less than you would think in most cases, particularly food) and reduce income inequality.


Tons of critical infrastructure in the US is run on IBM zOS. It doesn't matter what operating system you use, what matters is updates aren't automatic and everything is as air gapped as possible.


The math might not be complicated for a lot of market making stuff but the technical aspects are still very complicated.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search:

HN For You