There's a misconception in the question that is important to address first: when an LLM is running inference it isn't querying its training data at all, it's just using a function that we created previously (the "model") to predict the next word in a block of text. That's it. When considering plain inference (no web search or document lookup), the decisions that determine a model's speed and capabilities come before the inference step, during the creation of the model.
Building an LLM model consists of defining its "architecture" (an enormous mathematical function that defines the model's shape) and then using a lot of trial and error to guess which "parameters" (constants that we plug in to the function, like 'm' and 'b' in y=mx+b) will be most likely to produce text that resembles the training data.
So, to your question: LLMs tend to perform better the more parameters they have, so larger models will tend to beat smaller models. Larger models also require a lot of processing power and/or time per inferred token, so we do tend to see that better models take more processing power. But this is because larger models tend to be better, not because throwing more compute at an existing model helps it produce better results.
"Every moment in business happens only once." -- Peter Thiel
The success of the old Facebook was very much rooted in a particular time and place. The leading edge of the Millennial generation (which was raised to be both more social and more trusting than previous generations) was in college. The Internet was new, and people were figuring out what it was for. The economy was humming along, and recovering from the dot-com bust, and people had few concerns that basic needs like housing and food would be taken care of. A website where the point was to throw sheep at each other was a nice idle diversion for that time surplus.
It also helped that Facebook was started in one of the highest social-status dorms, in the highest social-status college, among the highest social-status demographic (college students), in a generation that was tightly socially connected. I'd remarked to a coworker, when Google was just starting Google+, that we were replicating all the technology in Facebook (and many good ideas not in Facebook that had been pioneered by LiveJournal), but we were missing the particular social moment in time that Zuckerburg capitalized on.
In short, yes, consumer preferences have changed. Consumer preferences are always changing. The new Facebook is messaging, or hanging out in person, or maybe TikTok. Replicating the old Facebook won't give back the moment in time that lead to its ascendance.
I agree that 64-bit is likely to be a very long-running standard. But, given that on 64-bit, there's no agreement on sizeof(long) - and it's impossible to change at this point because it would be a massive breaking ABI change for any of the relevant platforms - the only sensible standardization approach for C now is to deprecate short/int/long altogether and always use int16_t/int32_t/int64_t.
It helps to look at the history of C integer type names and its context. Its origin lies not with C, but with ALGOL-68 - as the name implies, this is a language that was standardized in 1968, although the process began shortly after ALGOL-60 Report was published. That is, it hails from that very point in history when even 8-bit bytes weren't really standard yet nor even the most widespread - indeed, even the notion of storing numbers in binary wasn't standard (lots of things still used BCD as their native encoding!). ALGOL-60 only had a single integer type, but ALGOL-68 designers wanted to come up with a facility that could be adapted in a straightforward way to all those varied architectures. So they came up with a scheme they called "sizety", whereby you could append any number of SHORT or LONG modifiers to INT and REAL in conformant code. Implementations could then use as many distinct sequences as they needed to express all of their native types, and beyond that adding more SHORT/LONG would simply be a no-op on that platform. K&R C (1978) adopted a simplified version of this, limiting it to a single "short" or "long" modifier.
Obviously, this arrangement makes sense in a world where platforms vary so widely on one hand, and the very notion of "portable code" beyond basic numeric algorithms is still in its infancy. Much less so 40 years later, though, so the only reason why we still use this naming scheme is backwards-compatibility. Why use it for new code, then, when we had explicitly sized integer types since C99?
Makes me wonder if we're on the crux of a shift back to client-based software. Historically changes in the relative cost of computing components have driven most of the shifts in the computing industry. Cheap teletypes & peripherals fueled the shift from batch-processing mainframes to timesharing minicomputers. Cheap CPUs & RAM fueled the shift from minicomputers to microcomputers. Cheap and fast networking fueled the shift from desktop software to the cloud. Will cheap SSDs & TPU/GPUs fuel a shift back toward thicker clients?
There are a bunch of supporting social trends toward this as well. Renewed emphasis on privacy. Big Tech canceling beloved products, bricking devices, and generally enshittifying everything - a lot of people want locally-controlled software that isn't going to get worse at the next update. Ever-rising prices which make people want to lock in a price for the device and not deal with increasing rents for computing power.
“Have you looked at a modern airplane? Have you followed from year to year the evolution of its lines? Have you ever thought, not only about the airplane but about whatever man builds, that all of man's industrial efforts, all his computations and calculations, all the nights spent over working draughts and blueprints, invariably culminate in the production of a thing whose sole and guiding principle is the ultimate principle of simplicity?
“It is as if there were a natural law which ordained that to achieve this end, to refine the curve of a piece of furniture, or a ship's keel, or the fuselage of an airplane, until gradually it partakes of the elementary purity of the curve of a human breast or shoulder, there must be the experimentation of several generations of craftsmen. It seems that perfection is attained not when there is nothing more to add, but when there is nothing more to remove.”
This is the reason I dislike automatic squashing branches with rebase. Squashing discourages thoughtful and meaningful commit messages. What is the point of making a meaningful commit message for some specific change when it is just going to all be smashed together as a single commit on merge. I feel like rebasing is something that should be intentional to clean things up by the dev, but not as a default pattern on merge.
As someone who has contributed to Git since before GitHub existed and who maintains legacy code, I simply cannot disagree more. I use `git blame`, `git log`, and `git show` in the terminal all the time. It's trivial to follow the history of a file. It takes me seconds to use `git log -G` to find when something was added or removed.
Nothing pains me more than to track down the commit and then find a commit message that's of the form "bleh" or "add a thing" when the developer could have spent 60 second to write down why they did it.
Nothing gives me more joy than to find a commit message (often my own) that explains in detail why something was done. A single good commit message can save me hours or days of work.
Let me also just say, and this is a bit of shot: GitHub contributes to the problem of bad commit messages. If I'm lucky, folks have put some amount of detail in the PR description, but sadly that's not close at hand to the commit log. It's another tool I have to open. Usually though, the PR is just a link to Jira, so that's another degree of indirection I need to follow. Then the Jira is a link to a Slack conversation. And the Slack conversation probably links to a Google doc.
As an industry, we're _terrible_ at documentation. But folks like Jeff King are fighting the good fight. At the end of the day, I don't think the problem is with the technology. I think it's a people problem. Folks perceive writing documentation as extra work, so they don't. There's no immediate value to it. The payoff comes days, weeks, or months later.
Please, write good commit messages. Just spend a minute saying why you did something so that every commit isn't a damn Chesteron's fence exercise. Put it in the commit message where I can easily find it. Your future self and I thank you.
Edit to add: I didn't address your argument, that commit messages are too hard to find.
First, I don't find this to be true. I rarely have trouble following the history of a line of code, a function, or a file.
Second, commit messages have value at the time they are written even if they are never seen again. I find that writing a good commit message helps ensure that I've written in code what I've intended to (I often view the diff while writing the commit message) and they have value to the people reviewing my code.
For great commit messages, just browse the git history of the Linux kernel where this is the standard.
The first line always mentions the subsystem affected by the change, followed by a one-line imperative-mood summary of the change. Subsequently, three questions are answered in as much detail as possible:
1. What is the current behaviour?
2. What led to this change?
3. What is the new behaviour after applying this change?
Example:
"Currently, code does X. When running test case T, unexpected behaviour U was observed. This is because of reason R. Fix this by doing F."
Yes, but it also reflects the fact that it is no longer a separate hardware line.
In the beginning (1978) was the IBM System/38, which had a custom CISC CPU architecture with 48-bit addressing (called IMPI), vaguely resembling the 360/370 mainframe instruction set, but incompatible with it, and having some rather high-level abilities like task switching in microcode (similar to hardware task switching on the 386). The System/38 had some very advanced features: single level storage, capabilities and programs compiled to byte code (which the OS then converted to the IMPI physical instruction set). However, IBM also had its System/36 "midrange" line (basically minicomputers but IBM preferred to call their business-oriented minicomputers "midrange"), which was incompatible and more of a traditional system architecture. So in 1988 IBM "unified" them by releasing the AS/400, which was basically a version 2.0 of the System/38, keeping the same basic architecture but adding a System/36 emulation subsystem so it could run most System/36 applications.
Separately, IBM had its RISC Unix RS/6000 line, which spawned POWER and PowerPC. And then in 1991, IBM came out with a new version of the AS/400 based on PowerPC instead of proprietary IMPI CISC. The fact that applications compiled to bytecode meant most applications could be ported to RISC seamlessly, since the new OS version translated the bytecode to PowerPC instructions instead of IMPI instructions. At the same time, much of the core of the OS was rewritten in C++ (having previously been in a proprietary PL/I dialect.)
But still, although RS/6000 and AS/400 now used the same CPU architecture, they were still physically different hardware. Originally, the AS/400 used its own PowerPC chips with additional instructions the RS/6000 ones lacked. Even after they unified the two lines on the same CPU models, they still had different firmware.
In 2000, there was a marketing-driven decision ("eServer") to rebrand RS/6000 to pSeries and AS/400 to iSeries. This was part of an attempt to present IBM's four distinct server platforms (mainframe, AS/400, RS/6000 and PC) as some kind of cohesive strategy (mainframe became zSeries and PC servers became xSeries).
Then, in 2006, the iSeries (formerly AS/400) and pSeries (formerly RS/6000) hardware lines were merged completely, to become IBM Power Systems. Now there was no physical difference between the hardware, it is just which OS you install on it. The IBM i (originally OS/400 and later i5/OS) operating system uses certain firmware features which AIX doesn't use – but all IBM Power Systems have that code in their firmware, it is just AIX and Linux don't call those functions. (There are now low-end Linux only machines which refuse to run AIX or IBM i, although possibly that's just a flag in the firmware license as opposed to distinct code.)
> The thing is that economy does not make sense without people. Economy is a way to allocate human work and resources, and provide incentives for humans to collaborate, factoring in the available resource limits.
I disagree with the underlying presumption. We've been using animal labour since at least the domestication of wolves, and mechanical work since at least the ancient Greeks invented water mills. Even with regard to humans and incentives, slave labour (regardless of the name they want to give it) is still part of official US prison policy.
Economics is a way to allocate resources towards production, it isn't limited to just human labour as a resource to be allocated.
And it's capitalism specifically which is trying to equate(/combine?) the economy with incentives, not economics as a whole.
> Now if AGI make people's work redundant, and makes economy grow 100-10000x times... what does that measure mean at all?
From the point of view of a serf in 1700, the industrial revolution(s) did this.
Most of the population worked on farms back then, now it's something close to 1% of the population, and we've gone from a constant threat of famine and starvation, to such things almost never affecting developed nations, so x100 productivity output per worker is a decent approximation even in terms of just what the world of that era knew.
Same deal, at least if this goes well. What's your idea of supreme luxury? Super yacht? Mansion? Both at the same time, each with their own swimming pool and staff of cleaners and cooks, plus a helicopter to get between them? With a fully automated economy, all 8 billion of us can have that — plus other things beyond that, things as far beyond our current expectations as Google Translate's augmented reality mode is from the expectations of a completely illiterate literal peasant in 1700.
> Can produce lots of stuff not needed or affordable by anybody?
Note that while society does now have an obesity problem, we're not literally drowning in 100 times as much food as we can eat; instead, we became satisfied and the economy shifted, so that a large fraction of the population gained luxuries and time undreamed of to even the richest kings and emperors of 1700.
So "no" to "not needed".
I'm not sure what you mean by "or affordable" in this case? Who/what is setting the price of whatever it is you're imagining in this case, and why would they task an AI to make something at a price that nobody can pay?
> So we just hand out welfare tickets to take care of the consumption of the ferocious production, a kind of paperclip-maximizer is doing? I suggest reading the novel Autofac, it might turn out prophetic.
Could end up like that. Plenty of possible failure modes with AI. That's part of the whole AI alignment and AI safety topics.
But mainly, UBI is the other side of the equation: to take care of human needs in the world where we add zero economic value because AI is just better at everything.
"My OS makes it easier to do X" is generally an illusion because you learned what arcane nonsense your OS requires in order to do that ten years ago and now it takes two seconds, but the other OS requires you to learn some different arcane nonsense which would also take two seconds if you already knew what it was.
Back in 2009, Matthew Garrett measured a blinking cursor as costing 2W of power consumption. Presumably it'd be less of a big deal now, but it does show that UI updates on a low frequency can cost watts.
I think you're missing the connecting lines here: with the Steam Deck, Valve made significant investments into WINE emulation and Proton development and all of those dependencies needed for its product that are also applicable to desktop. That convinced a lot of people who were using Windows just for gaming to make the switch, and they all browse the web. I'd argue those types are most of the new users we've seen coming to Linux in the last few years even - and I'd attribute all of it to Valve.
Open source devs often don't like writing GUIs or documentation, which VB-like environments rely on very heavily.
Commercial devs meanwhile will work on those things, but generally want to host the resulting app on their own cloud for a monthly fee. That's lockin which is scary to people and so outside of platforms where there's no choice (e.g. Apple) they prefer to have a less productive dev environment but more vendor options.
30 years ago devs were much less sensitive to lockin concerns. Open source barely existed so the question was merely which vendor would you choose to get locked in to, not whether you'd do it at all. And fewer people had been burned by projects going off the rails or being abandoned. The VB/Delphi era ended when VB6 suffered execution-by-product-manager and Borland renamed itself to Inprise whilst generally losing the plot (wasting resources on Linux, etc).
Open source stuff tends to have much worse usability, but there are no product managers, no corporate strategies and in the unlikely even that the project decides to radically change direction it can always be forked and collectively maintained for a while. That concern outweighs developer experience.
Also the ecosystem is just way more fragmented these days. In the 90s everyone coded for Windows unless you were doing mainframe stuff, and on Windows there was C++, Delphi and VB. That was pretty much it and they could all interop due to Microsoft's investment in COM. These days you have JS, Python, Ruby, Java, Kotlin, Swift, C#, Rust, Go ... and they barely talk to each other.
One thing i think forgotten here, which actually is in the Worse is Better talk. But people tend to miss it.
These ST and Lisps systems failed at another aspect. Reuse. The biggest change of the past 2 decades in software engineering compared to previous generations is the amount of reuse. It is tremendous.
It is hard to talk of cause and effects here, but mostly this is due to the Internet. At this point, the vast majority of code running on any proprietary system is... Open source infrastructural packages.
This condition a lot of the current ecosystem. You can only reuse code on systems in which said code runs well. As such, the Linux "stability" combined with x86 won, same as C and friends because of the tooling that made the code "portable".
Yes i know. It is far from magically portable, but it is far more than full machine living image SmallTalk or Lisp like.
As such, these "living code" are fundamentally evolutionary deadend. They are amazing but they cannot easily move to different machines and sharing parts of them is hard to separate from the rest of the living organism.
On top of this, a lot of the elements to make this kind of machine works does necessitate deep in depth expertise. As the piece shows, the Newton is a pale copy of the goal because they did not have that knowledge in house nor the time (or money) to create it.
Same thing all over the stack. A good efficient logger need deep expertise. Same for a good localization library. Same for a good set of graphic servers. Same for audio servers. Same for a http parser or a network library. A good regexp engine is knowledge knows by less than 10 people in the world probably.
Once you realise that, you realise that at scale reuse is the only realistic way forward for software so ubiquitous as it is today. And that is how we got the current FOSS ecosystem, not because the code is better but because it would need too many licences to be manageable without breaking the bank in numbers of lawyers.
Same thing for the Worse is Better. It works because it provides extension points and can adapt. Something the Lisp and SmallTalk machines fundamentally failed to provide. And that is something Richard Gabriel focuses on far more than the whole New Jersey schtick in his talk.
That kind of makes logical sense, though. Physical hidden chests are, after all, how almost all real life shops implement inventories. Those are typically inaccessible to players, too.
(There is a part of shopping experience where the player grabs items and puts them in their own chest/basket prior to purchase. This works thanks to the security scheme of law enforcement NPCs dragging your ass to jail you can't save-scum your way off, should you steal something. But this is too complex to implement in a game, unless you're making the next GTA.)
Hell, even the bunnies and spectral radio cats make sense, to a degree. This reminds me of the ol' Flash games or Klik&Play/The Games Factory-made games. In all of them, you'd find yourself placing support objects on the scene but outside the screen boundary. I used to laugh at it, but eventually realized it kind of makes sense, if you think of the game as a theatre play - there's lots going on at the edges of the stage, just beyond what the audience can see.
Or think back to RAD tools from Borland (Delphi, C++ Builder) - they had a notion of abstract objects like "Timer" as invisible UI controls that could be placed in the window you're designing. On the one hand, this makes no sense - an abstract timer doesn't have "position" or "size", not at runtime. On the other hand, it was intuitive and convenient at design time.
I was on the flight and took the picture referenced as "A passenger took this photo in flight, showing turbine fragment exit holes in the upper surface of the wing. (ATSB)"
Forced myself on another A380 flight shortly after so I won't lose faith in it's engineering safety.
I went from Netscape to IE to Firefox to Chrome and back to Firefox. Sure they're different, but it's not jarring. It's like switching to another car. You can just hop in and drive away. Then you gradually adjust the seat just how you like it and install your favorite air freshener in a natural progression, and so on.
> news from Uber study that people with low battery could pay more for taxi, which was the nail to the coffin.
Yet people still claim "I have nothing to hide" when it comes to concerns about privacy. When your battery level can be weaponised against you by multi-billion dollar companies, everything can be.
I have one of these. My Dad died last August (just before his 102 birthday) and he had bought himself a Pro Display XDR about 6 months before that. My brother is a PC person and I am a Mac person so I got the Pro Display.
It is an amazing display and I love it, but I would never buy one for myself. It is obviously fine for programming, but for me it really stands out as something for consuming entertainment, even though I only get 4K content. It is capable of, I think, 7K with the right computer and has 10 bit color depth. When my Dad first bought it, I used it to play Apple Arcade games on my iPad Pro - that was fairly spectacular.
EDIT: my Dad had a Black Magic video camera that I think had 8K resolution, and so he had a lot of fun with his setup.
We're teaching Git wrong. Most of the common confusion is due to people learning from the porcelain down to the plumbing, when it should be the other way around. If you limit your mental model to the plumbing, there's generally only one outcome that you want, but there are a dozen ways to get there from the porcelain. You can choose whichever one you prefer. But if you start from one of those dozen ways, they could each lead to a different outcome than you expected.
I'm forever grateful for one of my early internships, where a guy from GitHub visited the office and gave us a one day workshop on Git. He started from the internals and explained how Git models your codebase. (He's also the one who introduced me to the idea of plumbing vs. porcelain.) Then once we had a common language, teaching the porcelain was a matter of starting from the plumbing and working upwards, rather than the other way around.
Another invaluable resource in learning Git is this interactive tutorial [0], which renders a tree diagram of start state and desired end state and makes you write the commands (for which there are often many options!) to get to that end state. This reinforces the idea that the best way of planning Git commands is to first visualize the end state you want, and then reason about how to get there.
Also: RTFM! Not just once. Go back to it. You'll learn something new every time. The docs [1] are really good.
> With Linux it can feel like I'm fighting with the system at times.
All your points are valid, and it's been a meme for a while now that the year of linux on the desktop is always next year.
HOWEVER, consider the following:
When you're on linux you're fighthing the shortcomings of the operating system (and learning stuff as you go) where as when you're on Windows/MacOS you're fighting companies actively trying to screw you over (and over and over again, always in new ways).
The question now becomes: what fight are you willing to fight?
About 15 years ago I received the advice, "If you want to be hacker, stop using Windows and start using Linux." Today, I am well aware that this is not the only path to enlightenment, but I count it as some of the best professional advice that I have ever received. This is colored by the direction my career has taken me: the technologies I use are open source, and that fits much better into the Linux box than others.
It's also not about writing code. Sure, when using Linux exclusively, sometimes you might have to hack together a little script to make your computer do what you want, but that's really not necessary, especially in the year of our Lord 2023. It's about tooling. So many younger devs that I meet still have irrational fear of the command line. Inability to use built-in documentation (like manpages), and again a fear of trying (because web browsers exist). Worst of all is the lack of understanding that these younger devs have of Unix permissions. We all know the guy who just pastes `chmod -R 777 .` or something from StackExchange. Since most of our production software still lives on Linux, knowing the proper way to configure these environments is valuable (though unfortunately undervalued, in my opinion, since improper configuration can still "work fine").
Using Linux full-time for years will make you more than comfortable. And yes, it probably will take years. You may come to prefer the terminal to most of the GUI wrappers provided in desktop Linux distros. You won't even notice when you come out the other side. You'll realize that everything only seemed like it was 43 commands away because you only knew 2 commands to begin with. Typing the most common commands will be second nature and take less time than moving your hand to the mouse. Anything that does need to be reasoned out and typed slowly you will learn to embed in a script, with comments so you can remember how it works, and that will save you even more time.
Most importantly, in the end you will have the confidence in your abilities to write long, condescending comments on Hacker News. I kid of course, but you will no longer fear tooling (though you may grow weary of it - looking at you nodeJS), and I truly believe that's a more important and difficult skill than reading and writing code of all sorts.
No, people are getting mad because Youtube has been letting people use it with ad-blockers for 18 years (in practise, at least) and have now hardened their stance considerably. In other words, they got to the dominant position by being very lenient, and now that they are on top they are tightening the screws.
This means that people who watched Youtube for up to 18 years with ad blockers are now forced to decide whether they want to pay money, watch ads, or stop using the service.
None of those choices is better, except possibly morally, than the now-unavailable but previously-very-available choice of getting Youtube for free with no ads.
If Google showed text ads for a few seconds before every search and used technical means to prevent people from bypassing that process, there would be a similar outcry -- even though everybody knows that it costs Google money to provide search results.
you have a bunch of barbers in a city, well, i open up a business where i cut hair. BUT I do a shitty job, so you kinda need to come back maybe 3 times for a decent haircut. Wow it sucks. Yeah. BUT...
I also put in a giant screen with ads, and an advertiser that counts people watching the ads. And then I say... ITS FREE.
Woah okay. At first not much happens, but eventually the other barbers start to lose money, thats because they can't compete with my free haircut business. I quickly expand and now have shops all over town, and competition is dying.
Now some people realized they can take a quick nap while getting a haircut so they don't see the ads. And suddenly I kick out anyone who takes a nap. Those people can no longer get haircuts because everyone else closed down because they couldn't compete with my free. And as those people complain everyone says "EVERYONE WANTS A FREE HAIRCUT BUT NOT WILLING TO PAY FOR IT!"
Edit: I also start selling packages of 20 ad-free haircuts a month. Most people think this package is a bit insane given that they don't get 20 haircuts in a month. Or every month. And they still have to listen to the barber push their "cancer cures" and other nonsense they keep pushing every time they get a haircut.
If a business opens up in my neighborhood and offers free pizza with no strings attached I'll gladly eat that pizza.
If they change their mind and then start offering free pizza but only if I take a stack of advertisements I will take the pizza and throw the ads in the trash.
If they then start having people follow me home and harassing me for not looking at every ad, I will take the pizza and tell the harasser to fuck off and throw the ads in the trash in front of them. If they refuse to serve free pizza after that I'll just go pay for higher quality pizza elsewhere.
What I won't do is pay the harasser. It's even crazier when the harasser isn't even making the pizza, just delivering it
Note that the deaths for nuclear energy include only radiation related deaths, whereas the accidents for solar and wind include "mundane" accidents like workers falling from heights. Nuclear power plants have their own non-radioactive accidents, though, like this one:
(5 workers killed at Mihama nuclear power plant due to accidental steam release)
I have not seen any comparison that tries to sum up non-radioactive accidents for nuclear power and incorporate them in the deaths-per-TWh rate for nuclear power. The number will still be lower than anything based on combustion, but at these very low numbers it could make a meaningful difference in the relative rates.
You can radiate it away, but it would be really hard compared to just dumping it to atmosphere / water bodies. Especially as computers need quite low temperatures.
Having files double as a launcher for some program was a mistake, the result of which is that we not how have to sit through those awful trainings about not clicking untrusted files.
Just open the program first.
People should fear untrusted programs, but should be at ease viewing data through programs that they trust.
Building an LLM model consists of defining its "architecture" (an enormous mathematical function that defines the model's shape) and then using a lot of trial and error to guess which "parameters" (constants that we plug in to the function, like 'm' and 'b' in y=mx+b) will be most likely to produce text that resembles the training data.
So, to your question: LLMs tend to perform better the more parameters they have, so larger models will tend to beat smaller models. Larger models also require a lot of processing power and/or time per inferred token, so we do tend to see that better models take more processing power. But this is because larger models tend to be better, not because throwing more compute at an existing model helps it produce better results.