In early 2023, I was debugging an issue in a long abandoned package that used OpenSSL and needed to be fixed for OpenSSL 3. The entire thing was a mess and the parts of OpenSSL it was using are almost entirely undocumented.
Copying a pasting into a ChatGPT window gave me the lines of code to print the error message rather than failing silently. Copying the error message then gave me a detailed explanation of problem and the diff to fix it. I still have no idea where this knowledge came from as I spent a decent amount of time searching and found nothing about this corner of OpenSSL.
I’m very confident it would have taken me a week to make sense of what the package was trying to do and with LLMs it was done in a couple of hours.
And to keep this thread, I think our three languages should count as one, because at least 20 years ago, it was quite common to have Portuguese, Italian and Spanish mingle in several activities.
I’m a little confused by this submission. CASTOR is the old system that has since been replaced by the CERN Tape Array since ~2020: https://cta.web.cern.ch/cta/
This is mentioned on the page but it’s easy to miss.
Does tape array replace castor? Just from the names it sounds like tape array is the actual storage, and castor is an abstraction that automatically decides what's kept on disk and what's kept on tape
The abstraction isn’t really a thing any more. It was a nice idea but in practice it’s an operational nightmare not knowing if data is available and for how long it will be. For reference staging can take days during intense activity and you don’t want to loose performance randomly seeking around and switching between tapes.
> Conventional wisdom is that distributed consensus is not possible at this kind of performance
I'm not sure why you would think that? If you can assume the fiber is the same in both directions you know the round trip time is exactly double the latency of the connection. Then you know to phase shift your start time by that much when you get a start signal and you're in sync.
Obviously it's not trivial in practice, but it's not a fundamentally insurmountable problem.
It helps make it difficult to do scalping at scale. They can't reliably sell seats next to people. Always having to pay people their cut to use their name means the refund mechanism is still costly to the scalper.
I’m not an accountant and what you’re saying is probably right. However, if you hire an engineer to do R&D, build systems, and take R&D tax credits, it “feels” like capex.
Yes, once you have modeled the problem correctly and you know all the input parameters. This is not that: Session# * tps * 86400 (secs in a day) * 30 days.
I don't think there is enough public information to check Anthropic's claims regarding inference profitability. It depends not just on unknown technical factors but also on agreements they have with other companies.
I agree that we dont know how expensive SOTA is. But yes my math should give you the max amount of tokens you can sell per month, and its not remotely profitible for most of the larger open source models (at their current pricing). Im not sure why a 10x larger model that is more in demand would be profitible when its only 5x the price.
Its possible you could pay off hardware for Kimi 2.6 after maybe 2-3 yrs (by providing low tps / high concurrency) but you're now out of warranty and have been running your machines full throttle 24/7 for 2-3 years.
This is why moonshot attempted to double the price when they released 2.6 but then it got driven down by North American capital subsidies.
We should specify which subscription plan we are talking about. You seem to be talking about the Anthropic Claude Max plan. I think it's consensus that these flat rate type of subscriptions are loss leaders, as they come with restrictions how you can use the API via T&C, namely only with Claude Code et al. They are meant to hook developers into their products.
Shouldn't we compare the API pricing, where we pay per token? The whole point of local inference is that we don't have any restrictions regarding product use or time limits, so it would only be fair if we compare it to a plan that offers the same. And even that is only a first approximation, because the commercial models are usually much more capable than the open weight models.
> I could easily say that everyone who says its profitible is msking unsubstantiated claims lol.
And people who don't understand the difference between capex and opex are making uneducated claims. It's not basic math.
Running an inference data center is a mix of variable and fixed costs. The fixed costs are currently in the billions of billions of dollars for pretty much any investment in this space. Many of those fixed costs have (currently) unknown refresh cycles. So, unless you have access to the financial books of these companies it's currently just speculation whether inference is profitable.
You got numbers? Because it seems perfectly possible to me. OpenAI and Anthropic’s marginal cost for inference is certainly far less than their API pricing.
Everything there is extremely speculative and I don't see anything that contradicts that inference itself could be profitable at massive scale. See https://youtu.be/xmkSf5IS-zw for example.
If the companies as a whole are destined to be profitable, or worth their valuations is a very different question. The only people who can truely answer that have time machines.
Because I’ve looked at what it would cost my company to self-host a SOTA sized model. For us it wasn’t worth it because the hardware is all bought up by frontier labs and we can’t get any supply. But if we could, at the prices they’re paying, it would pay for itself in 10-ish months. I assume further that they have economies of scale on top of what I was estimating.
To some degree I think there's a hope that it becomes like a gym membership. If everybody used their membership, the gym would be too crowded. It's all of those memberships that people feel like they need to have but don't use where the extra profit comes in.
As long as the power users are paying per token, everything is good.
Really? This is what we expect from this amazing world changing technology? People will sign up for it and not use it? Good business plan, how can I invest? /s
Copying a pasting into a ChatGPT window gave me the lines of code to print the error message rather than failing silently. Copying the error message then gave me a detailed explanation of problem and the diff to fix it. I still have no idea where this knowledge came from as I spent a decent amount of time searching and found nothing about this corner of OpenSSL.
I’m very confident it would have taken me a week to make sense of what the package was trying to do and with LLMs it was done in a couple of hours.
reply