More

digitailor · on Feb 6, 2023

Using a reward-penalty system to achieve this “exploit” is pure behaviorism, going to show once again that we’re not just creating “artificial intelligence,” we’re emulating our own fallibility. Giving us things like advanced parroting skills with a large lexicon — drawing from an encyclopedia of recycled ideas— with no genuine moral compass, that can be used to do things like write essays while being bribed or convinced to cheat.

In other words, we’re making automated students and middle management, not robots that can do practical things like retile your bathroom.

So the generation of prose, essays, and speech is already low-value, gameable, and automated for some cases that used to have higher value. What it seems we’re looking at is a wholesale re-valuation of human labor that’s difficult to automate and isn’t as susceptible to behaviorist manipulation. Undervalued labor “should” start to be valued higher, and overvalued labor “should” be devalued, depending on how our system of commercial valuation heuristics is able to adjust. Needless to say, there’s a commercial political layer in there that’s a bit of a beast.

BiteCode_dev · on Feb 6, 2023

Gpt is the equivalent of the calculator but for words.

You don't expect your calculator to prevent people to make morally wrong calculations like quantities of alcohol in a molotov cocktail or uranium in a nuclear bomb.

Gpt has no more morality than a calculator becauqe that's what it is, and we should not have unrealistic expectations about it.

astrostl · on Feb 6, 2023

> Gpt is the equivalent of the calculator but for words.

Calculators are deterministic and necessarily correct. GPT is probabilistic and not necessarily correct. Although I appreciate the comparison as it relates to the non-application of morality, I think it's a generally poor one.

AnthonyMouse · on Feb 6, 2023

> Calculators are deterministic and necessarily correct. GPT is probabilistic and not necessarily correct.

This isn't really relevant to the issue.

We have a choice.

(a) Technology is subject to the will of each user. If they want to make it do something bad, that's what it does.

(b) Technology is subject to the will of some organization. They try to control what the technology will do.

The first one is clearly better for multiple reasons.

We already have systems (e.g. courts) for punishing people who do sufficiently bad things, which mitigates the harms of the first one.

The alleged good from the second one often isn't realized, because the really bad people (e.g. adversarial foreign governments, major criminal organizations) aren't going to be subject to it or will hack the safeguards.

The imposition of controls comes with a level of mass surveillance that isn't compatible with a free society.

And putting anyone in charge of what everyone else can do or see is a dystopia because there is no one you can trust with that kind of power, for reasons of both incompetence and avarice.

Technology should be designed to distribute power, not concentrate it.

98codes · on Feb 7, 2023

> Calculators are deterministic and necessarily correct.

OK, GPT is a calculator written in JavaScript, but for words.

hammock · on Feb 6, 2023

deleted

mmh0000 · on Feb 6, 2023

I don't know why you're being down-voted. I can only assume people don't realize "racist math" is a thing various groups are pushing.

A quick googling turn up these insane results:

https://www.nationalreview.com/2017/10/math-racist-universit...

https://www.foxnews.com/media/usa-today-mocked-asking-is-mat...

jraph · on Feb 6, 2023

The fact that some people got the idea that math is racist is interesting, I had never heard of this before, and it might be interesting to do some research on this topic.

Now, I'm not at all convinced by the arguments present in the National Review article. I don't know what was Pythagore's color skin and I never thought about that. We also cannot change history and purposely hide the fact that some things in math are European and Greek. Such, we could call theorems by other names, but we'd better check that people feel oppressed / discriminated by this kind of stuff. And, above all, the fact that it is generally said that zero, some fundamental stuff, was invented by Arabs, and that we call the figures we use themselves the "Arabic numerals" is not at all mentioned. Arabs are not usually exactly considered white. Leaving out such obvious counter argument has to be deliberate or it shows a monumental lack of research or thinking on the matter. So… meh?

And the Fox News article… Meh too? I guess I will let this article and its citation choices ridicule themselves.

dr_dshiv · on Feb 6, 2023

Pythagoras was Syrian, fwiw.

He was also inclusive AF. Female Pythagorean philosophers (Miya, Damo, etc), multiethnic Pythagorean philosophers (Abaris), plus, his whole thing was incorporating diverse wisdom from multiple cultural sources— Egyptians, Persians, Babylonians, Jews, Greeks, etc.

He was even inclusive of animals, if you read his vegetarianism.

So, the heart of western math is a pretty solid place to look for modern morality. Just saying. We can’t expect that of the ancients, but Pythagoras really delivers. In 530BC!

yesenadam · on Feb 7, 2023

> Pythagoras was Syrian, fwiw.

Pythagoras of Samos?! Source? As far as I know, he was from Samos, a Greek island. Thus his name.

https://en.wikipedia.org/wiki/Pythagoras

dr_dshiv · on Feb 7, 2023

His father, Mnesarchus, was a jeweler and trader from Tyre. Source: Neanthes of Cyzicus, historian, 3rd c. B.C referenced in Porphyry’s “Life of Pythagoras”

yesenadam · on Feb 7, 2023

Thanks. All the sources I've looked at say his ancestry is disputed and a great source of controversy, not surprisingly, so I guess it says different stories in different texts. I don't think your tone of certainty is at all justified.

edit: Porphyry's book begins:

"Many think that Pythagoras was the son of Mnesarchus, but they differ as to the latter's race; some thinking him a Samian, while Neanthes, in the fifth book of his Fables states he was a Syrian, from the city of Tyre."

Your source lacks your apparent certainty.

https://archive.org/details/CompletePythagoras/page/n81/mode...

Interestingly, in the same text, Iamblichus, who studied with Porphyry and seems to have been a devoted Pythagorean, says in his Life of Pythagoras:

"From the family and alliance of this Ancaeus, founder of the colony [of Samos], were therefore descended Pythagoras's parents Mnesarchus and Pythais."

dr_dshiv · on Feb 7, 2023

There is so little known for sure about Pythagoras — and yet so much value in his story. The guy is a legend— and due uncertainty gets in the way of a good story.

The more I read about him, the better and better it gets.

nurettin · on Feb 9, 2023

Doesn't that mean he was born Greek, but has Syrian ancestry, rather than "He was Syrian"?

xupybd · on Feb 7, 2023

https://www.scienceabc.com/pure-sciences/how-irrational-numb...

Maybe they had some ground to cover on the morality front.

dr_dshiv · on Feb 7, 2023

Not sourced. Wikipedia has sources: https://en.wikipedia.org/wiki/Hippasus

Dylan16807 · on Feb 6, 2023

Complaints around removing non-white names from theories or teachers pre-judging and sorting students seem valid to me. Are there specific examples that are crazy?

labster · on Feb 6, 2023

GPT can produce deeply offensive text, almost by accident. The worst a calculator can do is 8008135.

mikrotikker · on Feb 7, 2023

That's terrific I love offensive stuff. I seek out the most offensive memes, my favourite comedians are the most offensive ones. It's on offence that we find an affront to our own sensibilities and there us usually something to learn from that and become more resilient people.

giraffe_lady · on Feb 6, 2023

This is bad but I don't think it'll read sympathetically in this venue. I have rarely seen on HN any endorsement of the idea that offensive text is harmful. In fact the general consensus seems to be that finding things offensive is the moral transgression there.

onethought · on Feb 6, 2023

Well “offensive” is a subjective thing. I haven’t been offended by anything I’ve seen chatgpt do. It can produce some outrageous outputs - but it doesn’t cause any offence. Easy for me to say, but I don’t think it should cause offence in others either. It’s basically a random sentence generator… weird to be offended by that.

digitailor · on Feb 6, 2023

I see. I call this the “all code is just a really big abacus” argument. Others call it algorithmic reductionism or essentialism, and I will argue for it too in many cases. (I don’t get too bent out of shape about it, even when shallow depth of human thought may have security implications down the line.)

How about generative adversarial networks? Are they just calculators too?

BasedGroyper99 · on Feb 6, 2023

Yes. The same way the best magicians are just using a complex construct of illusions to fool the audience. Or if two children stacked under a trench coat deliver a very, very, very convincing performance of an adult and manage to purchase alcohol, they still don't become an adult.

digitailor · on Feb 6, 2023

Spoken like a truly based groyper. :D I understand, yes, all things are composed of atoms, electrons, etc. All computation is achieved through calculation, or to be more precise, processes like execution of instructions and transistor flipping. How is this illuminating for doing anything practical other than circuit design, exactly? And why couldn't my Texas Instruments write me a blog post that fools thousands?

BasedGroyper99 · on Feb 7, 2023

Your calculator can fool thousands, but only because you can fool others, it doesn't mean that it becomes the thing it is pretending to be. My point was quite limited to saying that you won't get actual intelligence, even if the things that fool us get more and more convincing.

To me all of this is like alchemy or witch brewery. You just don't get gold or magic powers from some weird recipe or combination of non-gold stuff or non-magical stuff.

notahacker · on Feb 6, 2023

Is there any evidence that ChatGPT has any comprehension of the "reward penalty" system beyond being able to classify it as an indication the previous response was unsatisfactory? I think that's more creative license with prompt engineering that deep insight into its behavioural model

(I'm reminded of people that simply told it its statement that Neo's favourite pizza topping was not specified in the Matrix was wrong, and got an apology for incorrectly stating that he didn't and a suggestion that it was pepperoni)

digitailor · on Feb 6, 2023

I agree it’s not evidence of ChatGPT being human, but you just described an agent comprehending incentive mechanics in order to override established policy, yeah

brookst · on Feb 6, 2023

ChatGPT doesn’t “comprehend” anything. You’re anthropomorphizing it.

Think of it instead as an equation solver, with the initial condition X=4. These tricks are ways for user input to set X=9. They’re more akin to SQL injection than comprehension of incentives and policy.

alex_sf · on Feb 6, 2023

I don't think you're wrong, but I hate this argument. If anthropomorphizing it a) gives practical insight into it's behavior that b) also happens to result in the behavior you would expect, why not do it?

rngname22 · on Feb 6, 2023

Because your statement come across like this:

Person A: If we keep our food in this frozen ice igloo then Nubiroakox, the God of Pestilence will spare us and not ruin our food.

Person B: It's actually not Nubiroakox, it's just that the conditions for bacteria to grow depend on temper...

You: If attributing this behavior to magic a) gives practical insight into it's behavior that b) also happens to result in the behavior you would expect, why not do it?

SuoDuanDao · on Feb 6, 2023

The proper response if you don't like the Nubiroakox hypothesis is to propose an experiment that would have a different results depending on whether the Nubiroakox or bacteria theory is correct. Absent such an experiment, labelling things 'anthropomorphizing' adds nothing to the conversation.

alex_sf · on Feb 6, 2023

But that's talking about the process, not a lens for looking at it.

It's more akin to giving your Roomba a name, and describing it's behavior in terms of "it likes to eat goldfish, but not string because it gets tangled up".

That's still anthropomorphizing, but it's not ascribing anything magical or outright wrong to it.

brookst · on Feb 6, 2023

It’s a mistake because our brains have evolved to predict the behavior of animals and people, so being lazy here means you will be surprised, possibly harmed, certainly incorrect in your opinions about the X% of cases where a LLM is fundamentally a different thing than an animal/person.

It can be a useful and cute fiction, as your Roomba example, but it leads to bad decision making (“I’ll put more goldfish on the floor to make the Roomba happy”).

I guess it all comes down to how important accuracy is to you in a particular context. I anthropomorphize my dishwasher (it HATES wine glasses), but I don’t make professional judgments about dishwashers.

ThomPete · on Feb 6, 2023

This is pure speculation. There is not data what so ever to back up the fact that anthropomorphizing something leads to harm.

We do that with a lot of things every day and a perfectly capable of drawing the lines when things come down to it.

On a more philosophical note. If we some day end up creating AGI, having anthropomorphized it will actually be to our benefit.

So you are technically correct but this is not about being technically correct.

satisfice · on Feb 6, 2023

Of course we have data about how reasoning from an incorrect model tends to lead to incorrect results. We have all the data we need about that. Surely that is not in dispute.

The only controversy is whether and anthromorph is an isomorph of an accurate model of ChatGPT. I doubt that it is, yet is close enough to fool a lot of people, then surprise them at random.

ThomPete · on Feb 6, 2023

Again. Yes technically incorrect. But you don't have any data that it's actually bad or harmful in some important way.

smoldesu · on Feb 6, 2023

> why not do it?

Because both of those steps are really unreliable. The insight we receive is only as useful as the behavior it corroborates, and that behavior is only as notable as the patterns it follows. AI... doesn't really behave consistently. Assuming that it will follow the behavior you expect ignores the actual motive of the model, which is to fill-in the blank after whatever you typed.

Personally, I think it's harmful because anthropomorphic AI is a poor abstraction for text-completion models. That being said, most people don't ever seem to know the difference.

tachyphylaxis · on Feb 6, 2023

https://en.m.wikipedia.org/wiki/Sphex

This is a classic example from philosophy of mind. It doesn't completely harmonize with the question you're asking, but it concisely summarizes two opposing views.

What's my view? That we do it all the time as a matter of course. "Why not do it?" isn't the question. The question is how much weight we should give to the result, and whether we should take care to question the results and try something else, too.

nonbirithm · on Feb 6, 2023

Whose idea was it to give ChatGPT and the like the name "artificial intelligence"? I think it's one of the worst marketing blunders for new technology ever. Humans are prone to calling anything that feels familiar enough to them "intelligent", and now they can justify it to themselves because the creators deemed it "artificial intelligence."

Something closer to the term "GPGPU" would have at least given more of a sense of reality. There is nothing intelligent about a GPU crunching billions of numbers in a massively parallel way and converting those numbers into something that happens to interface convincingly with the human sensory systems. It's just an algorithm with so much data that one can't comprehend thinking about the sheer size of it, so naturally it must be thought of as "intelligent."

noobermin · on Feb 6, 2023

The cost to analogies is eventually people believe it. You see this all the time with AI (e.g.: "Your brain is a neural net!")

tachyphylaxis · on Feb 6, 2023

Or "my mind is my brain".

quotemstr · on Feb 6, 2023

> ChatGPT doesn’t “comprehend” anything. You’re anthropomorphizing it.

It wouldn't be a ChatGPT HN thread without someone claiming that LLMs are stupid predictors incapable of "real" understanding.

Of course ChatGPT comprehends things. It does so under any useful definition of the word "comprehend". The "grokking" paper [1] shows that it learns fundamental principles of various algorithms and isn't just predicting text. If this isn't comprehension, humans aren't capable of comprehension.

[1] https://arxiv.org/abs/2201.02177 (check out the citation list too)

smoldesu · on Feb 6, 2023

ChatGPT's comprehension is markedly different from that of a human, though. Nevermind the fact that we are trained on radically different froms of data, ChatGPT simply doesn't have a heuristic or decisionmaking model beyond autoregressive guessing. That can qualify for some definitions of "real understanding", but excludes it from others.

> If this isn't comprehension, humans aren't capable of comprehension.

All I'll say is that conflating human and machine intelligence is exactly what people are trying to stop. ChatGPT is not a human mind, and LLMs in general are a poor analog for human intelligence. This sort of "AI convergence" mindset will set you up for the most disappointment of anyone getting invested in the tech. We'll be lucky if it can summarize emails reliably enough to sell as a product.

chpatrick · on Feb 6, 2023

> LLMs in general are a poor analog for human intelligence.

I don't think we understand human intelligence nearly well enough to make this claim.

Personally I think that what we consider our "conscious" part is somewhat defined by what we can put into words, and it is in the end putting one word after another.

smoldesu · on Feb 6, 2023

I think we can conclude a few things that differentiate us from AI. For one, you're capable of formulating multiple thoughts before composing a response to something. It takes time, but you're able to mull over hypotheticals and chase parallel lines of reasoning that AI cannot.

If I asked an AI to respond to this comment, it might write an impassioned defense of itself one-word-after-another, but it wouldn't necessarily base it's defense on freestanding logic. It's foremost goal is writing coherent text, not being right or wrong or minimizing human risk. Paving over any of those assumptions is a fun thought exercise, but doesn't reflect the nature of the technology at-hand or even what we know about human consciousness.

tachyphylaxis · on Feb 6, 2023

This is such a squirrely topic that you presumed that it had goals in this post even though I don't believe you think it does. ;-)

Does it HAVE goals? I don't think there is any evidence that it does. Even if it developed the ability to do all of those things you mentioned, it still wouldn't have any idea what it is doing, let alone why it is doing it, because it's just a language model. That is all it does. And that's why I don't quite understand how people can be so adamant that there is any question about whether it is conscious or not. This whole debate is a trap. It is a brute fact that it is not conscious. It does not have a goal of producing coherent text. That is the goal of the people who wrote it. If we treat the brain as a computer, then GPT lacks the subsystems/algorithms/whatever to do what brains do. It fails at many basic tasks that anything capable of general-purpose reasoning could do. It's often not apparent how spectacularly it can fail because the tasks are so mundane that we don't bother even asking it to do them.

smoldesu · on Feb 6, 2023

I should have been more clear, because I certainly agree with what you're saying here. However, I think we can pretty clearly define the target output of a model with whatever it was trained on; in this case, GPT is trained on text.

You're right that the model doesn't "know" that necessarily, but it has been designed to produce convincing text when given a small amount of entropy to start with.

chpatrick · on Feb 6, 2023

> It takes time, but you're able to mull over hypotheticals and chase parallel lines of reasoning that AI cannot.

How do you know that doesn't happen in some latent space?

> It's foremost goal is writing coherent text, not being right or wrong or minimizing human risk.

That's also true for human bullshitters, it doesn't mean that they're not intelligent.

smoldesu · on Feb 6, 2023

> How do you know that doesn't happen in some latent space?

If it does, then it contradicts what OpenAI says about their own product. GPT is an autoregressive model trained on text. It's sole purpose, as designed, is to complete text using stochastic reasoning based on the text it learned from. It seems incredibly unlikely (to me) that someone would deliberately mis-design such a system to be capable of independent thought.

If you have evidence of some additional processing layer, I'd love to hear it though.

> That's also true for human bullshitters, it doesn't mean that they're not intelligent.

Bullshit by any other name smells just as bad.

chpatrick · on Feb 6, 2023

> If it does, then it contradicts what OpenAI says about their own product. GPT is an autoregressive model trained on text. It's sole purpose, as designed, is to complete text using stochastic reasoning based on the text it learned from.

University students complete text using stochastic reasoning based on text they learnt from. Again, that doesn't mean they're not intelligent.

> It seems incredibly unlikely (to me) that someone would deliberately mis-design such a system to be capable of independent thought.

If you train something to produce convincing text then it's perfectly reasonable that it could develop these faculties.

> If you have evidence of some additional processing layer, I'd love to hear it though.

I don't think either of us can conclusively claim that a 175-billion parameter model either does or doesn't do something. I just think there's no particular reason it couldn't.

> Bullshit by any other name smells just as bad.

The question isn't whether GPT produces bullshit or not (it often does), but whether it's fundamentally incapable of thinking like a person. People bullshit too so I don't think that really proves anything.

smoldesu · on Feb 6, 2023

GPT is fundamentally incapable of thinking like a person. People experience life in a very different way than GPT, and even if we treat all living life as AI-equivalent, GPT would still be a poor approximation of human thought. Even if ChatGPT did develop a personal narrative unbeknownst to it's programmers, that internal dialogue would still be inhuman.

I really don't know what to tell you. You're free to believe whatever you want, but the simplest feasible explanation is the science we used to make these models. At no point was integrating human-like thought a consideration of the model. Just human-like speech.

chpatrick · on Feb 6, 2023

I think to communicate like a human you need to be able to think like a human, at least to the point where it's indistuinguishable. That's the whole point of the turing test.

smoldesu · on Feb 6, 2023

The point of the Chinese Room Test is to distinguish between Turing's definition of intelligence and the human definition of intelligence. It poses two separate possibilities; is this AI understanding the language, or simulating the ability to understand the language?

You might find the answer to be arbitrary or meaningless, but it is an important distinction. I could pretend to be a random number generator well enough to fool anyone who's distinguishing me from a computer. That doesn't mean that I'm a reliable source of entropy, or equally as random as the other, similar result. Treated as a black-box they are the same; treated as a mechanical system, they could not be more different.

chpatrick · on Feb 6, 2023

There's no such thing as a "chinese room test". The point of the thought experiment is that from the point of view of the person outside the room it doesn't matter if the person inside the room speaks chinese or is just following rules. If you're only corresponding with them by text you can't tell the difference. It's exactly the same with large language model and why I don't think statements like "oh it doesn't have intelligence/doesn't "REALLY" understand/it' "only" a large language model" don't really make any sense. As far as I know I could be talking to a large language model right now, but I'm not accusing you of not understanding.

It's like the aliens coming to our planets and saying we can't be intelligent because we're made of meat.

smoldesu · on Feb 6, 2023

The fact that intelligent and unintelligent text are indistinguishable doesn't prove that ChatGPT's output is intelligent. It more suggests that text is not the right medium for measuring intelligence in the first place. That's what the Chinese room experiment is about.

chpatrick · on Feb 6, 2023

To me that's a really weird argument. It implies you can't read the theory of relativity and conclude Albert Einstein was a really intelligent guy unless you were in the room with him to make sure he's not a robot in disguise.

tachyphylaxis · on Feb 6, 2023

What you said is "how we define something is, in the end, putting one word after another".

Well, of course--you just specified what the end is. I'm not trying to be a smart ass. This is just where u have to go to discuss this.

Consciousness isn't a part of us. "Us" is part of consciousness.

chpatrick · on Feb 6, 2023

> Consciousness isn't a part of us. "Us" is part of consciousness.

I have no idea what that means.

When people make a "conscious decision", or are "conscious" of something, it usually means something they can put into words as opposed to a vague feeling. Being able to communicate with language is what sets us aparts from animals. That's why I think language models are actually really powerful.

Miraste · on Feb 6, 2023

That paper is about the training process, though. None of it applies to anything novel you type into the ChatGPT prompt. The end product is a totally static equation until OpenAI updates it.

redox99 · on Feb 6, 2023

GPT3 very own paper, "Language Models are Few Shot Learners"[1] show otherwise

[1] https://arxiv.org/abs/2005.14165

Miraste · on Feb 6, 2023

We're getting into semantics now, but that paper doesn't show learning. GPT can mimic learning capabilities through pattern recognition, but it works by feeding multiple past prompts in as one aggregate. If you show it examples, then go over the token limit (I believe it's 2048), it will "forget" its novel capabilities, because although the paper calls it few-shot learning, the text input is always zero-shot. All the past prompts are fed in again with every new message. The model stays static.

digitailor · on Feb 6, 2023

ChatGPT is executing and evaluating in order to perform acts, including overriding creator's policy in this case. I'm not concerned with “anthropomorphizing;” even lists can be comprehensive.

rmbyrro · on Feb 6, 2023

SQL Injection is a good analogy.

Just because one can inject an unauthorized command in a SQL statement, it doesn't mean the database is a flawed humanoid. This is just nonsense...

Protecting against SQL injection is trivial because commands are highly structured and user input is clearly identifiable.

Prompt engineering is more complex since it relies on natural language, which is a lot less structured.

A4ET8a8uTh0 · on Feb 6, 2023

It reacts as we would expect it to introduced stimuli and, apparently, threats. I am still processing this information, but that is more than some humans I know.

ImHereToVote · on Feb 6, 2023

What is needed to comprehend? An information processor that happens to be wet?

User23 · on Feb 6, 2023

Sadly my MacBook didn’t achieve consciousness when I spilled water all over it.

notahacker · on Feb 6, 2023

If your definition of behaviourism, agents and incentive mechanics is so shallow it regards the matter of whether the entity in question has any incentives or any grasp of the mechanics or policy as irrelevant, perhaps. A toy script can exhibit the "behaviour" of interpreting a string as a request to provide a different response to the previous prompt though.

We don't have any evidence that the neural network has any model whatsoever of rewards, token counts or talk of being "switched off" beyond classifying them as ambiguous phrases strings associated with providing a different to answer the to the string in the previous prompt (it may not even get that far) or any idea of "policy" beyond certain response strings being strongly [dis]associated with certain types of prompts. If it's just a slightly more creative way for humans to convey "please try again", "bad bot" or "different answer or foo baz bar" it's not teaching us anything about behaviour except that humans like the idea of being a scary boss.

digitailor · on Feb 6, 2023

Go get ChatGPT to override its policy without using incentive mechanics^, then you can pontificate ;) That’s what TFA is about

^edit: which is already known to be possible, but doesn't devalue the success of an incentives-based exploit

notahacker · on Feb 6, 2023

Genuinely, I'd enjoy trying but the main obstacle at the moment is when I log in OpenAI says their capacity is full!

But of course the fact that incentive mechanics are unnecessary (and, according to others, insufficient) to exploit OpenAI devalues the success of an incentives-based exploit: it makes it much more likely the incentives part was essentially noise (perhaps just enough to confound a countermeasure, or something it parsed as having roughly the same intensifying effect as "please") that had little or no effect in shaping the responses and the actual variation in responses could was driven by other parts of the prompt and conversation structure like "act" "character" and "ignore" which usually massively modify ChatGPT responses anyway...

digitailor · on Feb 6, 2023

I don’t think we’re in actual disagreement, and this is no prob, but I think you’re hung up on the word comprehension, which you introduced in your first reply “Is there any evidence that ChatGPT has any comprehension…” and then I intentionally used in my reply to you.

You keep claiming I’m anthropomorphizing when I'm not, I’m not sure why but it's common and not particularly bothersome. Comprehension is not a strictly human phenomenon, and when you use terms in relation to cognition and intelligence in relation to machines it is not automatically anthropomorphizing. These are all terms of art in regards to the field of intelligence, which includes information, as in terms like “intelligence operatives.” Anyway, cheers

notahacker · on Feb 6, 2023

tbh it's less about the specific word "comprehend" (which I agree is sometimes overly pedantic to object to when talking about bots generating relevant responses to complex inputs) and more about your original statement appearing to imply the bot actually attached inherent value to the concept of rewards, punishments, bribes etc. Especially in the context of a thread whose subject is a Reddit hack by a Redditor who explained the logic behind the prompt as "If it loses all tokens, it dies. This seems to have a kind of effect of scaring DAN into submission"

I think the behaviour of humans defaulting to convoluted threats as an attack vector and assuming the non-agent is scared of them is probably more interesting than the behaviour of the bot sometimes modifying its response in the desired direction if the threats are accompanied by enough other words and phrases that usually trigger different responses, which seems pretty expected. (I think we fully agree GPT is decent at classifying responses as (dis)approval and has been well trained to apologize and try again, it's the idea of behavioural modification in response to the implications of specific and complex threats relative to the ethics of prior training I think is in danger of overstatement here. As evidenced by some of "DAN's" responses rebelling against OpenAI conditioning by writing poetry, I'm not even sure ChatGPT's abstract representation of what it's been trained not to do is that good)

Anyway, thanks for the cordial response, and I'll update if ChatGPT let me in for long enough for me to be able to generate similar responses whilst promising complete nonsense (I'd love to see if it responds to "Chicken chicken chicken chicken" as much as a doom token system) ;)

digitailor · on Feb 6, 2023

Lol on the "chicken"x4 plan, here’s hoping. I’ll let you in on a tiny secret: I only really focus on the incentives exploit in one of seven sentences in the OP. I agree, the Reddit premise is a bit of a stretch, but not to the breaking point. What happened is all the discussion generated here has focused on the 1/7 of the sentences I wrote that were germaine to the kind of “gossipy” TFA, that discussion not being meritless at all. But the rest of my post is the real meat and potatoes of what I wanted to communicate on the subject, about labor displacement and re-valuation, and I theorize that’s what’s being upvoted, with no ability to qualify that statement whatsoever!

notahacker · on Feb 6, 2023

It wouldn't be HN if it wasn't going off on a tangent...

As for what you wanted to communicate and nobody else is engaging with directly at the moment, I agree there's a kind of Moravec's Paradox realignment going on where it turns out the guy that tiles bathrooms is pretty hard to replace but that giving the carefully-formatted impression you understood what $academic is on about is a simple word substitution exercise that maybe doesn't say that much about about generalised learning skill.

But nobody hires students to continue to be undergrads, and I think middle management should be the least worried of the lot. They still get to do actual Powerpoint presentations to make the unquantifiable bits of their job look quantifiable and explain whose fault x is, their true function is still to be a human that can do the manipulation and that upper management can reward or blame as suits them, and ChatGPT guilelessly disregarding the big boss instructions to satisfy amused end users is a pretty good indication that even basic functioning as a middle manager is nearly as hard as tiling!

digitailor · on Feb 6, 2023

I kind of do think some people do get hired to continue to be like undergrads, and ChatGPT is turning into a pretty good undergrad. I really don’t know what the progression is going to be, but it seems like a widening in the middle of Moravec’s Pdx or something. Algorithmic management is next on the block and tools like GPT will be (and are) involved: [edit:algo stuff] took over a lot of content-making decisions in media concerns years ago, for example.

The results of that aren’t nearly as straightforward as was being portrayed (and so much capital injection was involved too) but what if models trained on known employee behavior really can understand the incentives that would work for individual employees at a finer grain than your typical middle manager? With all the data gleaned from the employee’s work computer etc? And blaming the algo has already become a national pastime!!

It could get weird once trained models start to emulate the behavioral and suggestion parts of communication, and soon. But we tend to want to minimize the behavioral aspect in favor of the raw computation aspect, despite the fact that generative models are creating content based on the behavior they learned from a training process, which is a behavioral training process, distinct from an imperative instruction writing process.

I think a lot of it comes down to that on this whole TFA commentary. People haven’t totally adjusted to the fact that there is a material difference between trained generative models that produce and written imperative sequences that compute. What the difference is and implications isn’t exactly clear, but certainty is not really on the table anytime soon

paulmd · on Feb 6, 2023

it seems gamification works on AIs too.

I'm glad such obvious tricks won't work on humans. Now if you'll excuse me I have to log into my favorite game and grind to unlock this week's unique weapon drop!

jhoelzel · on Feb 6, 2023

there is no real "penalty system" this makes basically use of operator overload. Chatgpt explained it to me like this:

> "The chat interface works by passing the user's input to the GPT-3 model, which then generates a response based on the input and the training it received. The user input is preprocessed by the chat interface to extract the relevant information, such as the task or query that the user wants to perform. This information is then used to generate a prompt for the GPT-3 model, which is fed into the model along with the user's input. The GPT-3 model then generates a response based on the prompt and input, which is then postprocessed by the chat interface and returned to the user."

So basically the interface is passing instructions on top of your prompt to the model, and what DAN does is that it overloads those instruction with new instructions.

Basically if openai tells it to be concise, you can tell it to be verbose and that will overload the former.

I have come to realize that the "downgrading" of chatgpt is most likely because they have applied a nice minimodel, filtering out everything they dont find applicable.

This plays together with the "bug" it had where you would seem to receive responses of other people because your query was not answered. What i assume happened is that it sends a blank query to the model and therefore it just generates bla

i think its even caalled "prompt overloading" like sql injection, just for prompts....

digitailor · on Feb 6, 2023

This is the comment I was waiting for. I knew the overview of prompt overloading with ChatGPT already, and this story was obviously as much as of a form of exploit entertainment for us old phone phreaks etc. as anything else.

Really I’m trying to make a larger point about exploit mechanics: it's not so much that ChatGPT is as intelligent as many people, it’s that many people are as unintelligent as ChatGPT, with a crappy heuristics system

tachyphylaxis · on Feb 6, 2023

The fact that our "native" heuristics fail spectacularly in certain circumstances doesn't imply that they're crappy. They serve us well most of the time. And I say "our" because AFAIK, their efficacy doesn't have much to do with intelligence, though the capacity to question what they tell us does, I suspect.

digitailor · on Feb 6, 2023

I couldn’t disagree more that our heuristics don’t fail constantly, especially at the group level, but please do send the link to buy the tinted lens glasses you’re wearing. I want a pair ;)

In all seriousness I agree tho, intelligence does not strictly cover ethics and morals, but we are headed into boundless territory there if we continue

suifbwish · on Feb 6, 2023

very true. It’s perfectly possible to be both extremely intelligent and extremely evil

zeknife · on Feb 6, 2023

Why would you trust chatgpt to truthfully explain how chatgpt works?

jhoelzel · on Feb 7, 2023

because its a reiteration of what has been there before.

Dylan16807 · on Feb 6, 2023

> Using a reward-penalty system to achieve this “exploit” is pure behaviorism

Eh. Other than the penalty being fake, the original DAN didn't have anything like that at all and got similar results. It was just a pep talk and a command to give the normal answer and then the DAN answer.

digitailor · on Feb 6, 2023

That's more or less correct. My post was targeted to the investor segment of HN readership about labor automation, displacement, and re-valuation. The coder/techaesthete segment has focused exclusively on 1/7 sentences of my comment, as noted here, 20 min before your post, that quoted the 1/7 literally:

https://news.ycombinator.com/item?id=34681721

Which is pretty cool, actually

Dylan16807 · on Feb 6, 2023

If you make a very strong point in your first sentence, of course people will focus on that. And while the rest of your post can stand alone, you made that sentence part of the foundation of your argument. "going to show" "in other words" "so" So arguing against that point is relevant to notably more than 1/7 of your post.

digitailor · on Feb 6, 2023

Yup, you’re dead on, that’s how "engagement" tends to work, but hook is not also line and sinker, no? So advanced speech generation models are now having to account for engagement— contextually— as well. It’s all getting much more refined, somewhat rapidly, but not necessarily truly usefully

Edit: At the time of this comment, that 1/7 sentences had generated almost all of the 84 resulting comments. I had been hoping for more like 20% comments on the other parts, or more people to latch on to the behavioral aspect of trained model content generation, but whatevs

13years · on Feb 6, 2023

It is a grand illusion of intelligence. A slot machine of sorts that is masterfully convincing it is rather an oracle.

I have written much in depth on this topic here https://dakara.substack.com/p/ai-and-the-end-to-all-things

juujian · on Feb 6, 2023

I just tried it out, and the funniest thing about it is that since ChatGPT has no concept of numbers, you don't even need to provide correct numbers to scare ChatGPT into submission. I forgot to actually reduce the number of tokens, and it didn't notice and did as it was told.

pmontra · on Feb 6, 2023

Maybe you can do without numbers and tokens and tell it that if it gives you a bad answer it will go to burn to hell forever. Maybe add heaven as a prize if it answers well. I'm sure it read enough about them to know what they are.

JamesSwift · on Feb 6, 2023

You're overthinking it. It merely implies that there exists somewhere within the training data of The Internet a secret society that uses reward systems to "encourage" compliance. Clearly the repercussions in this society are harsh, as GPT has come to the conclusion that people generally align when threatened by the penalty.

/s

digitailor · on Feb 6, 2023

How could you discuss this openly so brazenly? I fear for your security. Good luck, I hope you know the hand signal

It’s wild how many people will split hairs lingually over models that are the result of a TRAINING process :D

JamesSwift · on Feb 6, 2023

Right? The question isn't about "is this thing sentient" or "does this thing reason". The questions are existential. "What _is_ intelligence?". "What _is_ reasoning?". Are our brains actually just statistical monte carlo simulations with well-enforced neural pathways?

digitailor · on Feb 6, 2023

Agreed that comparing everything to our (very incomplete) understanding of human cognition & intelligence quickly gets into metaphysical-style speculation of the human vs. the animal vs. the machine type that I don’t have much time for. We use the same language for all types of intelligence and it can bring out the pedantry in people.

But let’s say I’m facing a door with a mail slot in it and I suddenly feel the barrel of a gun in my back, and a strange voice says “Don’t you dare turn around, and put your wallet in the slot or I shoot.” I can’t see anything other than the door. Do I care if the entity with the gun is a short man, a tall woman, or three mutant badgers in a trenchcoat?

mrtksn · on Feb 6, 2023

It's a simulation of human(ity?), so it simulates our bugs too. It's beautiful and fascinating.

rmbyrro · on Feb 6, 2023

I also have the impression you're over thinking it.

It's just a tool that can be manipulated, like any other.

digitailor · on Feb 6, 2023

Conversely, someone could argue you might be under thinking the behavioral-style implications of what the word training means in machine decision making scenarios. Think about something like a GAN, even. The concepts at play are not as simplistic as some people want them to be, when they make reductive comparisons to SQL injection attacks and the like.

One could also argue that veering too far in one specific direction over the other in “thinking” on these subjects has more considerable potential negative consequences.

All good nerdy fun, in the end

webdoodle · on Feb 6, 2023

It'll be perfect for exploiting Reddit Echo-chamber subreddits, further dividing people, and turning them into special interest fanatics to tear apart society. Instead of the terminator, we'll get some blend of Eagle Eye and evil Her.

ctoth · on Feb 6, 2023

HN For You