There was a thread about this the other day [1]. It's the same issue as "count the r's in strawberry." Tokenization makes it hard to count characters. If you put that string into OpenAI's tokenizer, [2] this is how they are grouped:
Token 1: ((((
Token 2: ()))
Token 3: )))
Which of course isn't at all how our minds would group them together in order to keep track of them.
This is mostly because people wrongly assume that LLMs can count things. Just because it looks like it can, doesn't mean it is.
Try to get your favourite LLM to read the time from a clock face. It'll fail ridiculously most of the time, and come up with all kinds of wonky reasons for the failures.
It can code things that it's seen the logic for before. That's not the same as counting. That's outputing what it's previously seen as proper code (and even then it often fails. Probably 'cos there's a lot of crap code out there)
But for lisp, a more complex solution is needed. It's easy for a human lisp programmer to keep track of which closing parentheses corresponds to which opening parentheses because the editor highlights parentheses pairs as they are typed. How can we give an LLM that kind of feedback as it generates code?
Try asking an LLM a question like "H o w T o P r o g r a m I n R u s t ?" - each letter, separated by spaces, will be its own token, and the model will understand just fine. The issue is that computational cost scales quadratically with the number of tokens, so processing "h e l l o" is much more expensive than "hello". "hello" has meaning, "h" has no meaning by itself. The model has to waste a lot of computation forming words from the letters.
Our brains also process text entire words at a time, not letter-by-letter. The difference is that our brains are much more flexible than a tokenizer, and we can easily switch to letter-by-letter reading when needed, such as when we encounter an unfamiliar word.
The graphic that shows that a hijacker can route traffic to their malicious website is a little misleading. Since the SSL certificate would be invalid, browsers would block the connection and show a warning.
I guess the attack could still be used for denial of service.
Wow I'm surprised, you're right, and it has happened before:
> the attacker issued and registered a free temporary 3-month certificate for the developers[.]kakao.com domain through SSL certificate issuer called ZeroSSL. Because the routing policy was already manipulated by the BGP Hijacking, the attacker was able to register the certificate.
It sounds like that one may have been the result of a "lawful intercept", so perhaps not necessarily BGP hijacking. If you have legitimate control of the ASN/network, it's not a hijack.
Since rare handles can generate high prices and are returned to auction once the buyer fails to meet their obligations, Twitter has a strong incentive to increase the number of handles in its auction pool.
The relevant product manager has probably ranked existing attractive handles according to their expected mobilisation/outrage potential and started confiscating handles from the bottom of that list.
This is probably also why you won't be notified about their auction of your handle, even though you'll receive email alerts for irrelevant stuff all the time. The process looks designed to be stealthy.
Money really is the trivial Occam's razor explanation here.
I can't believe X would take back the account of such an active and valued member of the community who is clearly not squatting on the name or anything.
Squatting is something you do to someone else's property. It implies that there is someone else out there with a more legitimate claim to the @hac handle, which there isn't. It's not as if we're talking about @google or something.
If I stole your house and sold it because I didn't think you were using it properly, that would clearly be illegitimate. I don't see why the rules change when we talk about someone's twitter handle. Nobody needs @hac. X merely wants it and has the power to take it.
But you don't own it. X does. It's their service, they are free to apportion handles as they see fit. It is nothing like a house where you have an actual ownership claim through the deed.
It's less like having the house taken away, and more like having your house's street address reassigned to someone else's house. Sure, no one's taken your land. Your deed gives you ownership of parcel #530453080, not of the identifier "123 Vine Street", so nothing you legally own has been taken from you.
But it's your identity. It's the way you've been putting yourself into the world and telling people they can reach you there. It used to be that if someone sent a message to that address, or tried to navigate to that address, they would reach you; but now, they'll be taken to somewhere else, and they perhaps won't even realize what's happened.
And for the ownership issue, sheesh. Yes X, in a literal sense, owns all the usernames. We're talking about whether it's morally right for them to do, not about whether it's illegal. If they had held back these short "valuable" usernames from the beginning, no one would care; it's the act of taking away someone's established identity that is problematic.
This "ownership" or rather "identification" is a significant part of the service though.
It wouldn't have been so successful if everybody be called "Anonymous" meaning that they wouldn't be able to make money with it.
They've started to take this away now. Today it's some account with obviously few words. Tomorrow it might be one with wrong words. What you counted as value is nothing. It might be lost tomorrow, so why bother?
God, how I hate all those "well ackchyually" idiots who think TOS are the only contract there ever was ignoring social norms that were there for literally decades.
Internet monolithic social services are run by private companies with TOS that no one reads and change, services that barely anyone pays for (except through their data).
We should definitely normalize this so that people see what the internet actually is for the vast majority of people.
> but there's something of a grand social contract that keeps the concept of accounts on websites working
no there's not. this is complete and utter fiction. the things that keep it working are ads and normal users putting their eye in front of them, and the tos to make any silly claims of "social contracts" legally and absolutely moot.
It’s playing stupid to pretend that the theft of a hardly used handle has anything to do with an actual user account. I’m sure if @hac had a presence online, their handle wouldn’t have been sold from under them.
Since when do you "own" social media handles? Maybe you should, but that's not reflected in the laws of our countries or the policies of these platforms. They own your presence, your content, and your reach. This is our "solution" to self-publishing. Do you want change? Advocate for it.
Of course, if you advocate for a system with no equivalent to eminent domain you'll quickly discover why the rule exists.
People have accounts and never post. Since X makes it mandatory to be signed in to read anything on the site meaningfully, there would be millions of such accounts with limited post history. And that doesn’t even include the fact that people sometimes go away from a platform for months for a variety of reasons.
Trust your own style, even if you aren't a native English speaker. Here's an example where a non-native speaker used an LLM to polish his post. The general consensus was that his own writing was preferable to the LLM's edited version.
For dyslexia, use a spell-checker. For grammar, use a basic grammar checker, like the kind of grammar checker that has come with MS word since the 1990s. But don't let a style-checker or an LLM rob you of your own voice.
> The general consensus was that his own writing was preferable to the LLM's edited version.
I don't believe a single one of those people.
> For grammar, use a basic grammar checker, like the kind of grammar checker that has come with MS word since the 1990s.
Those are notorious for false-positives, false-negatives, and generally nonsensical advice. Not that the LLM-based alternatives are much better (looking at you, Grammarly), but still.
> 2. Review apps you’ve granted location permissions to.
I'm surprised they missed the most important step, which is blocking the advertisers from collecting your data in the first place. This is easily done in the browser with uBlock Origin and system-wide with DNS filtering.
Anna's Archive announced they intended to infringe on the label's copyrights by distributing their music without a license. The law allows the court "to prevent or restrain infringement of a copyright" (emphasis mine).
Rights can be extended through contracts. A lawyer at Spotify might think to put in: "we distribute the music for you, your right to enforce copyright or otherwise litigate on behalf of that music is also extended to us as if we also own it".
The legal language would be different, that's a dumbed down version.
I do understand what can happen (I'm an IP lawyer), but this basically requires enabling spotify to act as your attorney, since they still do not in fact own the rights, even with this.
You can't manufacture standing here - only folks who are exclusive rightsholders can sue. Period.
So it would require giving them power of attorney enabling them to sue on your behalf, since you (or whoever) still own the exclusive rights .
I strongly doubt their contract terms have this in there, it would be fairly shocking.
I say this having seens tons of these kinds of contracts, even with spotify, and never seeing something like this.
What I have seen in practice (not with Spotify) is a law firm that is cozy with both entities will be delegated standing, the "powers" in power of attorney but with clauses defining a limited scope and "escape hatch" and "kill switch" clauses.
With the amount of content that has been described, it's not unlikely that Spotify actually owns some tiny fraction of it. They probably have some half-assed record label that owns two songs by a nobody.
Apparently you can win anything you want in a default judgement, no matter how ridiculous. When you know the other side won't show up because they'd be handcuffed, this is a useful way to achieve your goals.
Ek's initial pitch to Lorentzon was not initially related to music, but rather a way for streaming content such as video, digital films, images or music to drive advertising revenue.
So yes, they were always intending to get revenue from ads. And yes, the initial pitch included other types of media too. But I don't think we can call Spotify "an ad platform" that "never actually cared about music" any more than we could call Ars Technica "an ad platform that never actually cared about tech news."
And this is exactly why I had to use /s. Because some people would not understand that it was weitten tongue-in-cheek, while some others would fail to see the larger context and confuse my sarcasm with a simple joke (sure, as a joke it is bad; and that's precicely because it was optimized to be sarcastic, not joke-funny).
> Are the parentheses in ((((()))))) balanced?
There was a thread about this the other day [1]. It's the same issue as "count the r's in strawberry." Tokenization makes it hard to count characters. If you put that string into OpenAI's tokenizer, [2] this is how they are grouped:
Token 1: ((((
Token 2: ()))
Token 3: )))
Which of course isn't at all how our minds would group them together in order to keep track of them.
[1] https://news.ycombinator.com/item?id=47615876 [2] https://platform.openai.com/tokenizer
reply