enthulhusiastic's comments

enthulhusiastic · on Aug 1, 2024

Seat booking is hard. If a parent chooses the option of unattended seating, that’s just the reality of limited resources.

The OP is acting like parents and kids are the only people who will ever want to sit together.

What happens when OP’s kid is sitting next to another family? You gonna pull off your swap then?

Airlines suck don’t get me wrong. This is someone blaming an airline for bad parenting and/or poor planning ahead.

metabagel · on Aug 1, 2024

Seat booking isn't hard. It's a solved problem, for decades.

enthulhusiastic · on Aug 2, 2024

Two seats are free on a plane, but each is in a row of three, and each of those rows already has a couple. Book 2 seats side by side.

There you go, algorithm geniuses

Scoundreller · on Aug 1, 2024

Then why do airlines keep fiddling with it?

frumper · on Aug 1, 2024

To get more money from passengers.

enthulhusiastic · on Aug 1, 2024

Oh… hackernews…

enthulhusiastic · on Aug 1, 2024

> The intent is prevention, not justice.

Civil liability doesn't stop a dog from jumping a fence and eating a kid's face. You can already be held liable for the actions of your animal. No amount of scaring people with [insurance premiums] is going to decrease the population of these animals.

enthulhusiastic · on July 31, 2024

Yes, but imagine all the free fire we’ll have.

enthulhusiastic · on July 31, 2024

The focus on insurance over criminal charges or even personal civil liability is wild.

bastawhiz · on Aug 1, 2024

Civil liability doesn't stop a dog from jumping a fence and eating a kid's face. You can already be held liable for the actions of your animal. No amount of scaring people with consequences is going to decrease the population of these animals.

enthulhusiastic · on Aug 1, 2024

A deterrence doesn’t have to be 100% effective.

If deterrence is ineffective, turn around and look at the dog owners again. Why do people get aggressive dogs?

The fraction of people who:

- get an aggressive dog (because they’re afraid of something)

- but feel no fear of repercussions (for their aggressive dog’s behavior)

has to be very small.

———

Also, might I suggest indeed “decreasing the population of these animals”?

enthulhusiastic · on July 31, 2024

As a non-expert who has basic fucking sense:

first mistake is to approach an animal you know is hostile.

Also that specific owner of the lip-biter should be in prison. If someone actively being bitten in the face can maneuver a door closed on the dog, then so can the owner.

The owner didn’t try hard enough to help. A person in a dire situation figured this out. The owner did not. The owner is not fit for owning this dog if their level of control is less than someone who’s in panic.

There is no way this person is not responsible for buying the dog and then not acting to prevent the attack.

enthulhusiastic · on July 27, 2024

Because our bodies are already peak technology.

I think the 6-million-dollar man thing is a weird goal.

Better to regrow, implant, perhaps supplement.

Full-on replacement isn’t just a “when” but a “why” and a “how do you expect to do better than evolution”.

Titanium is cool! That doesn’t make it better than muscle.

Go for a run if you want a better heart

perihelions · on July 27, 2024

- "how do you expect to do better than evolution"

Because our goal is aligned with our actual problem (survival of individual human beings), as opposed to the merely partially-aligned goal of "statistical survival up to reproduction age"? Natural evolution is an unaligned AI, in a sense: powerful, but not helpful.

Eventually humanity will defeat this problem, and we won't even need computational parity with natural evolution to do that. Most of that computation is wasted.

enthulhusiastic · on July 30, 2024

Most of the computation is suffering, but not wasted. It still benefits us.

And the computation has to be done even in simulation. Just better to shortcut the suffering.

I think bioengineering is a good goal. I think cyborgs are a bad goal.

Dylan16807 · on July 27, 2024

It's really easy to come up with a better plan than evolution for lots of things.

If our bodies were peak, we wouldn't need to use hundred(s) of hours of exercise per year to force muscular and cardiovascular improvements, it would just happen. Muscles wouldn't shrink away in an aggressive attempt to conserve calories, especially not regardless of BMI.

bamboozled · on July 28, 2024

If our bodies were peak, we wouldn't need to use hundred(s) of hours of exercise per year to force muscular and cardiovascular improvements, it would just happen.

Then what would be the point of living if you had nothing to do but wait for updates? How boring.

Even a self-improving machine is spending time, not exercising but “designing a better version of itself”. Which is an “exercise”.

I hate to say this but I think your comment misunderstands the beauty of life and the challenge behind the struggle to improve. It is a blessed journey.

I personally think being fallable and not having full control of my destiny, while scary, is a feature and not a bug. It seems to make existence exciting.

BriggyDwiggs42 · on July 28, 2024

“Nothing to do” is not the same as not having to exercise. Exercise is quite boring, and you’d have more time to appreciate the beautiful things if you didn’t have to do it for maintenance.

bamboozled · on July 29, 2024

It's boring "for you" that is your opinion, I absolutely love hiking, weight training, football and surfing. I am a physical person. If you find it boring, that's your own preference.

BriggyDwiggs42 · on July 30, 2024

Yeah i find it boring unless it’s gamified, like in sports. Nothing wrong with feeling differently though.

HeyLaughingBoy · on July 29, 2024

> you’d have more time to appreciate the beautiful things if you didn’t have to do it for maintenance

Some people manage to do both at the same time.

BriggyDwiggs42 · on July 30, 2024

Okay, good for them

Dylan16807 · on July 28, 2024

> nothing to do

> Which is an “exercise”.

You're arguing against a strawman. I'm not complaining about working for self-improvement, I'm complaining about how muscles shrink away so rapidly when not in use. Look at what happens when people are bedridden for a few weeks, that is not good design!

And I'm not saying people should automatically look like body builders, I'm saying the baseline should be a healthy level.

bamboozled · on July 29, 2024

Each of your muscle fibres contain what is known as myonuclei. These are a structure within the muscle cell that act as the ‘brain’ of the cell – in this manner, they are what tells the muscle fibre to grow in response to strength training.
Interestingly, when you undergo a period of strength training, you see an increase in the number of myonuclei within your muscle fibres.

And this increase is permanent.

[1] https://foreverfitscience.com/exercise-science/is-muscle-mem...

I had a child, took a year off lifting, or basically doing anything but walking, I didn't lose much mass or strength and it hasn't taken me long to get back where I was, I'm pretty happy with the current design, maybe you should train more so you actually have some muscle in case you do need to spend a few weeks in bed?

elcritch · on July 28, 2024

I'm pro genetic modifications for these sort of (slight) changes. We live in entirely different environments and our baselines to be healthy makes sense to me. Of course the next step always seems to be "make superhuman soldiers" or "designer babies".

kaba0 · on July 29, 2024

I mean, no, the human heart is absolutely not “peak technology”, not even peak of what is possible in nature (birds have us beaten), but none of these hold a candle to, say, a turbine.

The main issue here is the combination of biological tissue/conditions and artificial ones without either giving out.

enthulhusiastic · on July 27, 2024

[flagged]

dylan604 · on July 27, 2024

You can do plenty of other things good for your heart that is much less damaging to the rest of the body. Running is horrible for joints. Just walking at a brisk pace will get the heart pumping. If we want people to get more active, there are much better things to suggest that go for a run. I ran long distance all through school, and refuse to run for running's sake. Playing soccer is the only running I will do, but at least there's a purpose. Walking, swimming, riding a bike, spin class, whatever are much easier suggestions for couch potatoes to start with

kiba · on July 27, 2024

Runnng being bad for your joint is apparently a myth.[1]

1. https://longevity.stanford.edu/lifestyle/2023/08/29/is-runni...

HPsquared · on July 28, 2024

I always wonder about survivorship bias in studies correlating exercise and health. Someone with joint trouble isn't able to run much, for example. It filters those people out. No doubt there are real benefits, but simple correlation will always have that problem.

dylan604 · on July 27, 2024

tell that to my knees and ankles. the equipment is much better today than it was back when Moses and I were kids, but there are plenty of other ways of getting people active than jumping straight to running.

kaba0 · on July 29, 2024

How is running bad for our joints? Maybe if you do it on concrete in a bad shoe, but the human body was literally made to run - which is very apparent from its design (our thighs are basically a spring that can “charge up” slowly in the “up” phase of running for less energy usage + no hair and improved thermodynamics (people can literally run for longer distances than most other animals with very few exceptions)), and you can even feel it yourself, like no other sport has such a short getting up to speed phase. If you start running for a week, you will already feel a huge improvement, unlike in tennis or bodybuilding where it might take months to noticeably get better.

normie3000 · on July 27, 2024

> Walking, swimming, riding a bike, spin class, whatever are much easier suggestions for couch potatoes to start with

Apart from walking, one issue with these other suggestions is that they are much less available than running and require special skills.

dylan604 · on July 28, 2024

prior to 2020, one of the perks a lot this audience had with their job was in office gyms. for the price of a decent pair of running shoes, you could afford a few months of a gym membership. you could also do jumping jacks, or any other number of at cardio exercises from plenty of places with Apple's offering just an example. i would venture a guess there are youtube channels for it as well. to say that exercise is cost prohibitive is just someone looking for excuses.

enthulhusiastic · on July 27, 2024

but is changing the economy as a whole a fair characterization of UBI?

That’s a disservice to the existing complexity of modern economies.

Valid arguments not downvoted pls

sokoloff · on July 27, 2024

I think the overwhelming majority of transactions can be characterized as a trade (of goods for money, services for money, work for money, money for other money (in form or time), etc.)

UBI seems quite different in that regard and, while it doesn’t invalidate everything, it introduces a lot more of money for nothing and a corresponding nothing for money trade that is required to fund it.

enthulhusiastic · on July 19, 2024

Lots of people appreciate “technically good”.

Personally I can’t stand all but a few guitar solos. “Technically good”

enthulhusiastic · on July 17, 2024

ELI6 why SPAG is better than just the default pretraining method (token context statistics?) of an LLM.

TimPC · on July 17, 2024

The red and blue agents are effectively unlimited sources of true and false examples so you can get far more efficient scale than you can by pre training with labelled inputs. It’s also far more targeted on correct/incorrect rather than a notion of answer quality which doesn’t directly get at hallucination vs reality.

blueblaze0 · on July 18, 2024

This is impressive, but what prevents the blue agent from generating an incorrect proof of a "true example"? What prevents the red agent from generating a correct disproof of a "false example"? I'm curious how they managed to generate a truly unlimited source of correctly labeled examples.

HanClinto · on July 18, 2024

> "but what prevents the blue agent from generating an incorrect proof of a "true example"?

That's the role of the Verifier. It's not going to be perfect, and I'm sure some incorrect proofs of true examples slip through, but it's good enough to increase the quality of the model overall.

> "What prevents the red agent from generating a correct disproof of a "false example"?

And on the other side, it's counterbalanced by the rules engine (math) that can determine absolutely whether or not the right answer is given at the end.

The Red and the Blue agents are held in check by the tension between the math engine and the verifier, and they are free to fight back-and-forth within those parameters as long as they are able. Eventually, I think the Red agent loses the ability to attack effectively, and so that's the big limit on OpenAI's arrangement. This particular game isn't balanced enough for this training loop to continue infinitely.

Natsu · on July 18, 2024

But how do we know the answer you gave us wasn't generated by the sneaky prover? :)

HanClinto · on July 18, 2024

At least in the context of this game, we essentially check the answer with a calculator (which the Verifier program doesn't have access to).

HanClinto · on July 17, 2024

I don't think of SPAG as a replacement for pretraining. For SPAG to work effectively, I would think that it would have to start with an LLM that is pretrained with self-supervised / imitation learning on regular next-token prediction. Think of SPAG as more of a competitor to RLHF than to pretraining. RL is what gave AlphaGo the edge to finally go beyond merely imitating human games, and finally achieve something new.

RLHF isn't true RL, because it's still based on imitating human preferences, and has trouble going beyond that. Once it achieves the plateau of "human preference", then there's nowhere else to go. That's one theory of why LLMs are asymptotically approaching human-level performance -- we're limited by imitation, or at the very least -- human judgement. We need super-human judgement to achieve super-human performance, and that's where we need true RL.

But you asked me to ELI6, so here goes. Warning -- wall-of-text incoming:

<ELI6>

Similar to how small kids often play games to learn, programmers train LLMs (like ChatGPT) with simple games too. The first stage (kindof like kindergarten) is the "pretraining" or "imitation learning" phase. This is where we teach the LLM to imitate us one word at a time. We play a simple game where I say something, but then I stop suddenly, and it tries to guess the missing word that will come next. Like, "My favorite food is..." and the LLM tries to guess which word I'm thinking of. Or I'll say something with a missing word in the middle like: "At my _____ party, I opened a bunch of presents" -- and the LLM needs to guess what the missing word is. We only play this game one word at a time, and so it's a very simple game -- but it's very important to learn the basics of language. This is what we call "pretraining".

After the LLM gets good at that, they can graduate from Kindergarten and move to first grade. Here we play another game, and this is called "instruction-tuning" -- it's where we give it a set of instructions and it needs to do its best to obey. Like, "Arrange the letters T P C G A in alphabetical order" and it tries to get the right answer.

This is fun for a while, but sometimes we want to give it more complicated instructions. Things like "write me a poem about puppies" or "tell me a story about a dragon". And those are things that don't have answers that are clearly right or clearly wrong, but we still need to tell it if it did a good job or a bad job. How do we tell if it was a good poem, or a good story? Well, you need to have someone listen to them and judge it -- which means we need to have people read ALL these dragon stories and ALL these puppy poems and mark which ones are their favorites.

I like reading puppy poems and reading dragon stories, but if I had to do it all day every day, I think I would get pretty tired of it pretty fast, don't you?

So when people get tired of doing boring things, the best thing is to have a robot do their job! They can do the boring things (they never get tired of it!) and we get to go do fun things. So how do we train a robot to judge the poems?

Well, we use this technique called RLHF (Reinforcement Learning with Human Feedback), where we ask a bunch of people -- given Option A and Option B -- to say which one is their favorite. So they read two puppy poems at a time, and say "I prefer A" or "I prefer B".

Once we have a BUNCH of human feedback (and just about when the humans are getting super super tired and don't think they could read another poem), we take ALL that data and we use it to train a SEPARATE computer program (that functions like a Judge) whose job it is to try and predict which poem or story the human would prefer.

It doesn't always get the right answer, but it doesn't need to be perfect -- partly because humans aren't perfect, and different people might prefer different stories. Keep in mind, this Judge program can't write good puppy poems or dragon stories on its own -- it can only predict which poem or story a _human_ would prefer. It still needs the first program (the LLM) to actually write anything.

So now we use the LLM to write a bunch of stories and poems and things, and then grade them all (two at a time) with the second program. For every pair, when the Judge picks its favorite, then we tell the LLM "write more things like this, please!" and for the things the Judge didn't like, we tell the LLM "don't write like this anymore, plzkthx". And we do this over and over, millions of times, and eventually it can write okay poems and stories.

So this way, instead of needing to have humans sit there and read thousands and millions of puppy poems, humans can just read a few dozen / hundred, score them, and then the computer can use that to try and guess what humans would prefer for everything else that it tries. It's not as accurate as if we actually had a human read it all, but it's not too bad, and it seems to work pretty well.

But one problem of this method is that it's not perfectly accurate (the Judge doesn't always get it right), and the more complex the task, the less of a good job it does. It's still just trying to imitate what a human would prefer -- but even if it did its job perfectly, it's not going to get much above human preference (because that's its target). Plus, as you keep going up, it takes more and more data to make smaller and smaller improvements, and so it feels like there's only so far that this RLHF game can get us.

So when we graduate to the next grade, that's where SPAG comes in, because it's a totally new way to play the game. Instead of training it by teaching it to write things that one human would prefer, we are going to train it to play a game where it needs to be sneaky. It needs to communicate a secret word or idea to someone without letting them know that they're being controlled. Kindof like if you've ever tried to get your mom to give you a cookie without asking for it directly. In SPAG, we have the LLM play against a copy of itself, and if the first player (called the Attacker) can trick the other player (called the Defender) into saying a secret word without realizing it was the secret word, then the Attacker wins. It's a sneaky game.

So for this, we don't need much human-annotated data at all, and the LLM isn't trying to aim for writing something that a human would prefer. The LLM can be as creative or as sneaky as it wants, and it can "level up" much higher.

This is kindof like when researchers first wrote the computer program AlphaGo -- at first they trained it to imitate previous human games that it had seen, but eventually they stopped using human-created data and purely had the machine play games against itself. Once it was no longer held back by needing to have human-written data in the process, it was free to run as fast as it could, and it became the best Go player that the world had ever seen -- better than the best human players who ever lived.

Having a computer play games against itself -- rewarding itself when it does well, and punishing itself when it does bad -- is called "reinforcement learning" (RL), and it's a very powerful concept.

But reinforcement learning only works in situations where you can know CLEARLY whether something is Good or Bad. There must be a clear Winner and a clear Loser -- it can't be like RLHF where it might be tough to know which puppy poem is better.

So we can't do SPAG or other RL methods for improving poetry writing, but there are still plenty of other games where we CAN write clear rules and the computer can clearly know when it has won, and when it has lost.

In the end, SPAG looks very similar to RLHF, but instead of training the Judge to predict which answer a human would prefer, it uses the clear rules of the game to say who is the winner and who is the loser, and rewards them appropriately.

The funny thing about SPAG though, is that it showed -- as long as the game involves using human language, then getting better at playing a game makes the model better at other tasks that involve human language.

It's like this guy I heard about who learned to read English because he wanted to play Magic: The Gathering. But by learning English inside the game, it let him do more than just play Magic -- he got better at using English in a whole bunch of other things.

So the idea is that -- if we can let a model learn in such a way that it's not merely aiming for "human preference", but if it can aim for a target that is above that -- if it can practice against itself until it gets better and better than any human -- then maybe it can fly higher than us in _other_ areas too.

</ELI6>

enthulhusiastic · on July 26, 2024

nice try, sneaky prover

(thank you)

HN For You