Just do what I did: ignore your orthodontist and wear them only at night, smashing them into your mouth like some kind of idiotic brute. 4 years later, great teeth!
You teeth are literally being pushed to a new position. So if you dont use the braces for the "minimum" duration, you teeth may not be in a place where they need to be for the second set of braces to start working. So your 2nd braces are going to be pushing more and that might cause damage to your teeth/gums. Think of it like trying to do splits. If you have never done them ever before and you try to do them on your first try, then you gonna break whatever you have down there. But if you keep at it then day by day you will notice progress. The braces are like that. Each one eases you into the next stage if that makes sense.
That's an assertion, not a thought experiment. You can't logically reach the conclusion ("It won't") by thinking about it. But it doesn't sound so grand if you say "The assertion I use constantly to explain this".
It still can't learn. It would need to create content, experiment with it, make observations, then re-train its model on that observation, and repeat that indefinitely at full speed. That won't work on a timescale useful to a human. Reinforcement learning, on the other hand, can do that, on a human timescale. But you can't make money quickly from it. So we're hyper-tweaking LLMs to make them more useful faster, in the hopes that that will make us more money. Which it does. But it doesn't make you an AGI.
That's not learning, though. That's just taking new information and stacking it on top of the trained model. And that new information consumes space in the context window. So sure, it can "learn" a limited number of things, but once you wipe context, that new information is gone. You can keep loading that "memory" back in, but before too long you'll have too little context left to do anything useful.
That kind of capability is not going to lead to AGI, not even close.
1. It's still memory, of a sort, which is learning, of a sort.
2. It's a very short hop from "I have a stack of documents" to "I have some LoRA weights." You can already see that happening.
Also keep in mind that the models are already trained to be able to remember things by putting them in files as part of the post training they do. The idea that it needs to remember or recall something is already a part of the weights and is not something that is just bolted on after the fact.
>but before too long you'll have too little context left to do anything useful.
One of the biggest boosts in LLM utility and knowledge was hooking them up to search engines. Giving them the ability to query a gigantic bank of information already has made them much more useful. The idea that it can't similarly maintain its own set of information is shortsighted in my opinion.
It's simply a fact that LLMs cannot learn. RAG is not learning, it's a hack. Go listen to any AI researcher interviewed on this subject, they all say the same thing, it's a fundamental part of the design.
I disagree. Human memory is literally changing the weights in your neural network. Like, exactly the same.
So in the machine learning world, it would need to be continuous re-training (I think its called fine-tuning now?). Context is not "like human memory". It's more like writing yourself a post-it note that you put in a binder and hand over to a new person to continue the task at a later date.
Its just words that you write to the next person that in LLM world happens to be a copy of the same you that started, no learning happens.
It might guide you, yes, but that's a different story.
A human can't keep 100k tokens active in their mind at the same time. We just need a place to store them and tools to query it. You could have exabytes of memories that the AI could use.
Wtf? Once it was AI. Then the models started passing the Turing test and calling themselves AI, so we started using AGI to say "truly intelligent machines". Now, as per the definition you quoted, apparently even GPT-3 is AGI, so we now have to use "ASI" to mean "intelligent, but artificial"?
I think I'll just keep using AI and then explain to anyone who uses that term that there is no "I" in today's LLMs, and they shouldn't use this term for some years at least. And that when they can, we will have a big problem.
LLMs are artificial intelligence illusion engines, they only "reason" as far as there's an already made answer in their dataset that they can retrieve and eventually tweak (when things go best). Take them where there's no training data and give them the new axioms to solve your specific problem and see them fail with incorrect gibberish provided as confident answer. Humans of any level of intelligence wouldn't behave like that.
Part of the issue there is that the data quantity prior to 1905 is a small drop in the bucket compared to the internet era even though the logical rigor is up to par.
Yet the humans of the time, a small number of the smartest ones, did it, and on much less training data than we throw at LLMs today.
If LLMs have shown us anything it is that AGI or super-human AI isn't on some line, where you either reach it or don't. It's a much higher dimensional concept. LLMs are still, at their core, language models, the term is no lie. Humans have language models in their brains, too. We even know what happens if they end up disconnected from the rest of the brain because there are some unfortunate people who have experienced that for various reasons. There's a few things that can happen, the most interesting of which is when they emit grammatically-correct sentences with no meaning in them. Like, "My green carpet is eating on the corner."
If we consider LLMs as a hypertrophied langauge model, they are blatently, grotesquely superhuman on that dimension. LLMs are way better at not just emitting grammatically-correct content but content with facts in them, related to other facts.
On the other hand, a human language model doesn't require the entire freaking Internet to be poured through it, multiple times (!), in order to start functioning. It works on multiple orders of magnitude less input.
The "is this AGI" argument is going to continue swirling in circles for the forseeable future because "is this AGI" is not on a line. In some dimensions, current LLMs are astonishingly superhuman. Find me a polyglot who is truly fluent in 20 languages and I'll show you someone who isn't also conversant with PhD-level topics in a dozen fields. And yet at the same time, they are clearly sub-human in that we do hugely more with our input data then they do, and they have certain characteristic holes in their cognition that are stubbornly refusing to go away, and I don't expect they will.
I expect there to be some sort of AI breakthrough at some point that will allow them to both fix some of those cognitive holes, and also, train with vastly less data. No idea what it is, no idea when it will be, but really, is the proposition "LLMs will not be the final manifestation of AI capability for all time" really all that bizarre a claim? I will go out on a limb and say I suspect it's either only one more step the size of "Attention is All You Need", or at most two. It's just hard to know when they'll occur.
A 16 year old has been training for almost 16 years to drive a car. I would argue the opposite: Waymo’s / Specific AIs need far less data than humans. Humans can generalize their training, but they definitely need a LOT of training!
When humans, or dogs or cats for that matter, react to novel situations they encounter, when they appear to generalize or synthesize prior diverse experience into a novel reaction, that new experience and new reaction feeds directly back into their mental model and alters it on the fly. It doesn't just tack on a new memory. New experience and new information back-propagates constantly adjusting the weights and meanings of prior memories. This is a more multi-dimensional alteration than simply re-training a model to come up with a new right answer... it also exposes to the human mental model all the potential flaws in all the previous answers which may have been sufficiently correct before.
This is why, for example, a 30 year old can lose control of a car on an icy road and then suddenly, in the span of half a second before crashing, remember a time they intentionally drifted a car on the street when they were 16 and reflect on how stupid they were. In the human or animal mental model, all events are recalled by other things, and all are constantly adapting, even adapting past things.
The tokens we take in and process are not words, nor spatial artifacts. We read a whole model as a token, and our output is a vector of weighted models that we somewhat trust and somewhat discard. Meeting a new person, you will compare all their apparent models to the ones you know: Facial models, audio models, language models, political models. You ingest their vector of models as tokens and attempt to compare them to your own existing ones, while updating yours at the same time. Only once our thoughts have arranged those competing models we hold in some kind of hierarchy do we poll those models for which ones are appropriate to synthesize words or actions from.
I meant visual patterns, too. You're thinking about what I said on too granular a level. JEPA is visual, based ultimately on pixels. The tokens may be digested from pixels until they're as large as whole recognizable objects, but the tokens are not whole mental models themselves.
Here's an example of humans evaluating competing mental models as tokens: You see a car, it's white, it's got some blood stains on the door, and it's traveling towards a red light at 90 miles an hour in a 30 mph residential zone, while you're about to make a left turn. A human foot is dangling from the trunk.
You refer to several mental models you have about high speed chases, drug cartels in the area, murders, etc. You compare these models to determine the next action the car might take.
What were the tokens in this scenario? The color of the car, the pixels of blood, the speed, the traffic pattern? Or whole models of understanding behavior where you had to choose between a normal driver's behavior and that of someone with a dead body fleeing a crime scene?
They were practicing object recognition, movement tracking and prediction, self-localisation, visual odometry fused with porpiroception and the vestibular system, and movement controls for 16 years before they even sit behind a steering wheel though.
That's an exaggeration. Nobody is trained to read STOP signs for 16 years, a few months top. And Waymo doesn't need to coordinate a four-limbed, 20-digited, one-headed body to operate a car.
Well, I also think that there is a lot that we process 'in background' and learn on beforehand in order to learn how to drive and then drive. I think the most 'fair' would be to figure out absolute lowest age of kids that would allow them to perform well on streets behind steering wheel.
i am not making a point that it is, I am rather expanding on the possible perspective in which 16 years of training produce a human driver.
That being said, you don't really need training to understand a STOP sign by the time you are required to, its pretty damn clear, it being one of the simpler signs.
But you do get a lot of "cultural training" so to speak.
A 4 year old is currently more capable than LLMs (I'm not making this up, ask Yann LeCun). You're going to need it to reach at least "adult" level to be general intelligence.
It seems more like people haven't decided on what the goal post is. If AGI is just another human, that's pretty underwhelming. That's why people are imagining something that surpasses humans by heaps and bounds in terms of reasoning, leading to wondrous new discoveries.
The 1905 thought experiment actually cuts both ways. Did humans "invent" the airplane? We watched birds fly for thousands of years — that's training data. The Wright brothers didn't conjure flight from pure reasoning, they synthesized patterns from nature, prior failed attempts, and physics they'd absorbed. Show me any human invention and I'll show you the training data behind it.
Take the wheel. Even that wasn't invented from nothing — rolling logs, round stones, the shape of the sun. The "invention" was recognizing a pattern already present in the physical world and abstracting it. Still training data, just physical and sensory rather than textual.
And that's actually the most honest critique of current LLMs — not that they're architecturally incapable, but that they're missing a data modality. Humans have embodied training data. You don't just read about gravity, you've felt it your whole life. You don't just know fire is hot, you've been near one. That physical grounding gives human cognition a richness that pure text can't fully capture — yet.
Einstein is the same story. He stood on Faraday, Maxwell, Lorentz, and Riemann. General Relativity was an extraordinary synthesis — not a creation from void. If that's the bar for "real" intelligence, most humans don't clear it either.
The uncomfortable truth is that human cognition and LLMs aren't categorically different. Everything you've ever "thought" comes from what you've seen, heard, and experienced. That's training data. The brain is a pattern-recognition and synthesis machine, and the attention mechanism in transformers is arguably our best computational model of how associative reasoning actually works.
So the question isn't whether LLMs can invent from nothing — nothing does that, not even us.
Are there still gaps? Sure. Data quality, training methods, physical grounding — these are real problems. But they're engineering problems, not fundamental walls. And we're already moving in that direction — robots learning from physical interaction, multimodal models connecting vision and language, reinforcement learning from real-world feedback.
The brain didn't get smart because it has some magic ingredient. It got smart because it had millions of years of rich, embodied, high-stakes training data. We're just earlier in that journey with AI. The foundation is already there — AGI isn't a question of if anymore, it's a question of execution.
The whole point is that LLMs, especially the attention mechanism in transformers, have already paved the road to AGI. The main gap is the training data and its quality. Humans have generations of distilled knowledge — books, language, culture passed down over centuries. And on top of that we have the physical world — we watched birds fly, saw apples drop, touched hot things. Maybe we should train the base model with physical world data first, and then fine tune with the distilled knowledge.
Human life includes a lot of adversarial training (lying relatives) and training in temporal logics, which would seem to be a somewhat different domain than purely linguistic computations (e.g. staying up late, feeling bad; working hard at a task for months, getting better at it; feeling physical skills, even editing Go with emacs, move from the conscious layer into the cerebrellar layer). I think attention is a poor mans "OODA" loop; cognitive science is learning that a primary function of the brain is predicting what will be going on with the body in the immediate future, and prepping for it; that's not a thing that LLMs are architecturally positioned to do. Maybe swarms of agents (although in my mind that's more of a way to deal with LLM poor performance with large context of instructions (as opposed to large context of data) than a way to have contending systems fighting to make a decision for the overall entity), but they still lack both the real-time computational aspect and the continuously tricky problem of other people telling partially correct information.
There's plenty of training data, for a human. The LLM architecture is not as efficient as the brain; perhaps we can overcome that with enough twitter posts from PhDs, and enough YouTubes of people answering "why" to their four year olds and college lectures, but that's kind of an experimental question.
Starting a network out in a contrained body and have it learn how to control that, with a social context of parents and siblings would be an interesting experiment, especially if you could give it an inherent temporality and a good similar-content-addressable persistent memory. Perhaps a bit terrifying experiment, but I guess the protocols for this would be air-gapped, not internet connected with a credit card.
I created an open source tool to help apps stay near the Dunbar Number: https://highlyprobable.io/articles/ten-cubed. I think the concept of social networks is interesting, but the ultimate unbounded result is a disaster.
I've always been confused by this First Amendment argument with regards to TikTok: they're an organization that has been tied directly with an adversarial foreign state. How is this a rational take? Using this logic, Russia should be allowed to foment unrest through fake hate groups on Facebook (which they've done).
People should familiarize themselves with Gresham's Law: bad actors will always beat good actors if bad actors suffer no penalty. If bad actors leverage the rights and freedoms of a democracy to perform attacks without repercussion, we're toast.
>I've always been confused by this First Amendment argument with regards to TikTok: they're an organization that has been tied directly with an adversarial foreign state.
The First Amendment doesn't contain exceptions for adversarial foreign states, it's that simple. If it's acceptable to foment disinformation, hatred and conspiracy natively, and all free speech advocates will say that it is, then the same speech coming from foreign adversaries must also be acceptable.
And let's be clear - the premise that TikTok is some kind of nefarious CCP mind control platform is entirely speculative, and based primarily on Sinophobia. Elon Musk is driving far right wing white supremacist and anti-vaxx content all the time, but people are losing their shit about something TikTok isn't even doing.
>People should familiarize themselves with Gresham's Law: bad actors will always beat good actors if bad actors suffer no penalty.
This is true, but as far as the First Amendment is concerned, it's the job of society to penalize bad actors in the marketplace of ideas, not the government. Which is still further than many free speech advocates are willing to accept, but in practice means that it's up to TikTok (and every other platform) to decide what speech to carry, and what speech not to carry, and under whatever terms they choose, within legal limits of course.
The first time I fired up TikTok I was subjected to video of an older, "surgically-enhanced" woman in a Trump bikini (intention obvious), and someone putting a bumper sticker about killing pedophiles on their car (intention less obvious: a dog whistle for QAnon).
Maybe it's not a nefarious CCP mind control platform. Maybe it is even doing this sort of thing totally blindly based on "engagement". But there's definitely propaganda being served up by default.
Does anyone know if this new model handles silence better? I was trying to use whisper for transcribing bursts of talking amid large spans of silence, but the frequency of hallucinations was too high.
Sure, but that should be considered an accuracy problem. Telling a system to do its best to extract words from background sounds, and then getting words from it, is a different type of problem.
-------
I can't reply to the below, but you have to consider the difference in the signal to noise ratio for why it should be considered a different problem.
If I told a binary image classifier to classify a clear image of a cat as either a "cat" or a "dog", and it said "dog", then that would be an accuracy problem.
If I gave the same classifier an image of a black cat standing in a very dark room, where even a human would have trouble identifying it, and it says "dog" it's not an accuracy problem as much as a signal to noise ratio problem.
It seems like you're making the assumption that all of these have the issues you describe have the same root cause. I don't think that's a sound assumption...tehe.
"Silence" is a problematic term. For me, that word encompasses: squeaky chairs, typing on a loud keyboard, moving objects around on my table, etc. In a perfect world, Whisper —like a human— can easily distinguish a human voice from the din of my office, and only try and transcribe my voice.
Does anyone have solutions for clearing out "silence" from an audio file that works off something a bit more accurate than just "<= decibel x"?
I just wrote a script with Hazel to automatically transcribe my voice notes to txt. It handles punctuation extremely well. What a wonderful contribution!
Simply put: your state exists on each client and the mechanism to reconcile that state across clients is not mission critical. Almost all apps currently are built to get their state from a centralized, remote location. This is easy to build but is fragile and not forgiving for loss of network connection.
CRDTs and local-first ideals means putting the client in charge of its state which leads to all these positive side effects: virtually instant UX interactions, privacy (your not sending data necessarily to a central server, and even if you are it could just be opaque, encrypted blobs that are proxied), network-agnostic syncing (email, bluetooth, internet, wifi, etc) and no fears of a service completely going out of business and losing your data.
Still curious if theres a side of this idea thats about local first software in a more regional or niche demographic sense. Like instagram but just for Oakland, or GrubHub but just for Austin. Im sure there are infinite problems associated with that sort of idea but still curious if theres some movement / push for that as well as this state-related stuff.
There’s geographical locality e.g. apps like ChatRadar[0] that show information from nearby.
And then there’s data-locality e.g. is the data stored on your device or does it all ultimately live on a server somewhere that’s the authoritative source of truth.
I don't think this analogy works. I think it is more akin to police opening a folder and seeing paper evidence, but having no idea who put the paper there, when it was last opened/modified and unable to determine if the evidence is legitimate.
For me, this story isn't about fear that police could leverage the bugs to manipulate a case. It's about the constant fear that laymen rely on unverified "experts" to put people behind bars for years.
Since the bug allows for arbitrary code execution, it's more akin to the officer reading the piece of paper and by doing so, he becomes the subject of some sort of curse that completely controls his actions.
Yes, but you can’t join (“intersectionality”) your campaign against ad tech companies with a campaign against the police if you’re this busy being intellectually honest.
I'm glad to see this article is eliciting in others the same reaction I had when reading it. I'm a huge fan of Linear and use it for our company, but this article just has so much hand-waving going on it's staggering. Where is a single example? They claim that user stories are outdated, inefficient, not valuable and yet show no alternative to the seriously difficult problem of engineering tasks missing context and purpose, especially after a certain amount of time has passed. I can't count the number of times I've fired off a one-line task thinking "This is fine. I'll remember what I need to do when I get to it" only to, after a few weeks, go "WTF?".
The part about writing your own tasks is also strange. So my coworker finds an issue and instead of just writing the task out with context, explanation and direction, he has to... explain it to me in some medium... then I go do it? Really?..
I feel like this article was written as some kind of SEO "let's just get people here looking at our product" kind of thing. I can't believe this was written by someone who actually writes software.