Thanks for sharing! Looking through the data[0], some of the terms / sentences don't really reflect the target word meanings. For example, "beta" is only used in a derogatory way in 1 instance, out of 4. "facial" is used as an adjective instead of a noun 3/4 times. "eating out" is used in the context of going to a restaurant 4/4 times.
This leads me to believe the models are even MORE censored than you make them out to be.
Totally! In some of the cases (we used LLMs to help us generate these) the target word is not clear enough for a human either. So for some of these it turns into more of a guessing game than a flinch measurement.
Agreed, the expectation would be that the flinch measurement becomes stronger. If you are interested in making it better feel free to reach out on the repo!
We started with a Polymarket project: train a Karoline Leavitt LoRA on an uncensored model, simulate future briefings, trade the word markets, profit. We couldn't get it to work. No amount of fine-tuning let the model actually say what Karoline said on camera. It kept softening the charged word.
Not even the most unleashed models can utter the words of today’s politicians, I don’t know if this says more about the current technology or the people in charge.
I would suggest it says primarily that mimicking people's voices in meaningful ways is still far beyond LLMs and particularly small LLMs, but also more insurmountably that the prompt for Leavitt herself contains many tokens that the LLM prompt absolutely doesn't
Such as the values of the bets her own entourage has placed
“We used LLM technology, which is great at parroting content, to attempt to predict what the US president’s spokesperson would say at their next conference.
We used that as input for ~gambling~ purchasing a position on a prediction market, which has been popularized recently in part due to its ability to circumvent gambling regulations.
However, even the LLM couldn’t parrot the words of the spokesperson. The implication is that the spokesperson speaks so outrageously that even an uncensored LLM couldn’t parrot their words.”
reply