Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Nearly 20% of active Twitter accounts likely to be fake or spam (sparktoro.com)
508 points by iamflimflam1 on May 16, 2022 | hide | past | favorite | 394 comments


I don't think this post means as much as people are acting like it does.

The indicators of being a spambot they have in their post seem VERY iffy to me. "Not tweeting in the past 120 days", "Location set to a non resolving location", "Small number of followers", "default profile image", "No URL in bio or non-resolving URL in bio", "Not on many lists", "tweets in a different language than the person they're following" - Those all seem like extremely weak signals to me. My profile matches 6 of those, and I'm a human. I would like to see them hand-verify a subset of their results and see if their algorithm matches reality.

Also note that they define "active" differently than Twitter. They define "active" as having tweeted recently. Twitter gives spambot numbers as a percent of monetizable daily active users. I wonder if Twitter's given bot numbers are low because bots don't typically lurk or load ads. I can believe that the total bot count as a percentage of users or as a percentage of recently-tweeting-users is higher than 5%, but that only 5% of daily visitors seeing ads are bots.


> Location set to a non resolving location

This is a terrible metric. Real people use the location field for all sorts of non-location purposes, as well as more freeform descriptions of their location that wouldn't resolve mechanically.


And I don't see a reason why a bot/fake account wouldn't set a real random location.


If I were a bot, I would simply set my location field to "bot/fake"


> My profile matches 5 of those, and I'm a human.

Just out of interest, imagine you were in a hot desert. There is a tortoise in front of you. You reach down and you flip the tortoise over on its back. The turtoise lays on its back, its belly baking in the hot sun, beating its legs, trying to turn itself over but it can't, not without for your help. But you're not helping. Why?


... I'll tell you about my mother.


I've seen things you people wouldn't believe... Tesla Roadsters on fire off the shoulders of Orion.


The tortoise is fine. You think it's never been on its back before? There's a thing called resonance that lets it's get back on its feet.

The real question is how strong am I to flip a hundred fifty pound tortoise without injury?


Because that tortoise is a politician.


Does it please you to think that I am not helping. Why?


My main account until recently would have qualified as a bot by those criteria until recently.

Meanwhile, several accounts of mine that were predominantly run with bots would have passed as human with ease when they were active.


My main account literally has “fake account” in the description as sort of a joke, because I really use Twitter exclusively for browsing, and maybe liking/RT’ing stuff.


> ... until recently ... until recently.

Are you sure you aren't a bot?


Hah! A good bot wouldn't get distracted and edit a line without reading it all back to make sure it still made sense ;)


I guess I'm a spambot too.

But hey maybe with these kind of analysis, and rando computer generated / un-appealable bans in the the future the "real accounts" will just mean "very elaborate bot".


I think I match all of them and I too am a trustworthy human, fellow human.


Exactly what a bot would say.


hah ha, fellow human, how we love to jape!


I think i match all of this. Also, a simple user account with an URL in bio is definitely more sketchy in my eyes than one without.


That seems like most accounts. I know many real people with exactly those properties.


Also,

Bots tweet and they usually have some sort of generic profile picture, so their methodology wouldn't even account for real bots. Bad.

Regardless, I do think that there are a lot of bots in TW and they are definitely more than 5% of total users.


Yup. Imagine the disdain there'd be on this forum if Twitter used these signals for policy enforcement and somebody was hit by a FP.

"WTF, my account was closed because I didn't tweet in four months."


Mine as well. I deliberately have not set a profile image, and have not attracted many followers. I probably should not bother with Twitter but I am around and am a real human.


They did mention that no one feature was a clear indication of being a spam account, but rather a combination of them.


I match these as well, though I'm not an active user.


The opposite is also true, it misses a lot of real bots.


Seriously. Having a URL in your bio is suspicious IMHO.


Parag apparently lost his patience with superficial and misleading claims about Twitter spam (like this analysis) and posted about it today.

You can see it here (https://twitter.com/paraga/status/1526237578843672576).

Noteworthy highlights:

* Twitter estimates its <5% number from human analysis of multi-thousand user random samplings of mDAU

* Twitter allows that number to remain so high to avoid introducing friction like captcha into real users' experiences

* Twitter uses all sorts of internal private data in its analysis

* Parag says you cannot get a reliable indication of bot/not bot without this internal private data

Having just finished building a Twitter analysis tool, I agree with Parag that the Twitter API doesn't provide sufficient clarity to make decisions about spam. This article's analysis doesn't hold up - just because you can name several features you're going to use to generate a spam confidence score about an account does not mean that spam confidence score will have any precision.


Musk responded to Parag's thread with a poop emoji. Not going to lie if I worked at twitter I would be a little nervous about my career at this point. For several reasons including what appears to be the potential for a very hostile work culture in the near term. Musk is being openly antagonistic towards twitter leadership and denigrating the people that work there. Although it does seem more and more like the deal is not going to close. I don't think Musk wants it anymore and is seizing on anything to get out of it.


> Musk is being openly antagonistic towards twitter leadership and denigrating the people that work there.

Seeing how badly twitter has been managed, (for a laugh, check out their "R&D" expenditure), how mush of a loss making enterprise it has been, and how it always at risk of a take over, is it that surprising?

If it has been Elliot Management (the previous rumored takeover threat for twitter) a group far less prone to public display than Musk... would things have been any less different? The only difference is that Musk is being open about what he has been doing, which I see a public good, frankly.

Elliot's track record shows it is far more vicious in layoffs of cuts.

----

https://fortune.com/2013/10/25/why-is-twitter-spending-so-mu...

https://www.rndtoday.co.uk/latest-news/is-twitters-rd-provid...

https://www.axios.com/2021/11/30/jack-dorsey-twitter-departu...

https://www.forbes.com/sites/kevindowd/2022/02/27/wall-stree...


I don't think he wanted it in the first place. This is all performative bs for him to make another quick buck / get pr.


This type of argument needs to die. He has never done anything "to make a quick buck" so the use of "another" here implies facts that are utterly false. Musk has no history of "pumping and dumping" what so ever.


How is Elon going to make a quick buck from a failed deal? If anything he'll have to pay out significantly.


He got an excuse to sell lots of Tesla stock at a price 40% higher than today. Selling the stock without this charade would have sent a negative message about how little he believes in Tesla’s stock price and sunk the price further.

Even accounting for the breakup fee he comes ahead financially by many billions in cash, while also building his image as the free speech edgelord. Win-win.


He just sold $16 billion worth of Tesla stock in Nov/Dec 2021, why would he need a fake excuse to sell another $8 billion in April 2022?


Your logic is terrible. Your argument would only make sense if he didn't have an excuse for the previous sale, but he did.


I remember reading on here that the breakup fee is only applicable in very narrow cases, basically only if the US govt blocks the deal. If it's not applicable then he's on the hook for the whole ~40billy. Does he still come out ahead financially in that case if, for instance, the Twitter board take him to court? Genuine question.


Would they really go to court? That could take years and create a cloud of uncertainty around Twitter — certainly not good for shareholder value.

It's much easier for the board if they settle privately with Musk: "Yeah, pay the breakup fee and we'll forget this ever happened."


I read someone else's analysis of the agreement: it bizarrely sets the cap for any damages that can be sued in court to $1B - same as the breakup fee. This obviously gives Elon an incentive to not pay breakup fee, but go to court first instead. At worst, he'd pay a billion, and best, whatever the court says thats less than a billion + legal fees


Read Matt Levine's Money Stuff column from the past couple days -- it explains realistically what could happen.


> He got an excuse to sell lots of Tesla stock at a price 40% higher than today.

Ironically, Twitter deal plus related announcements hammered the stock more than the overall market downturn. Besides that, I believe no one wants this deal to go through including US government. Earlier, The board could not legally decline the deal


Musk has a history of making public, bad-faith declarations to pump assets he later sells at profit.

More than one person has pointed out that Musk likely wanted to sell his Twitter stock, and was using the official-looking, low-ball buy offer to pump it. Twitter board smartly called his bluff, and now Musk is trying to weasel his way out of paying anything.


> Musk has a history of making public, bad-faith declarations to pump assets he later sells at profit.

Could you point me to those instances?

His offer wasn't a lowball offer at all considering the tech stock bloodbath we have seen as of late, and especially so because the offer was made just right before the massive downturn we have witnessed. I think it's quite the contrary; 44 billion is way higher than its worth imo.

> Musk is trying to weasel his way out of paying anything.

He would need to pay a $1 billion break up fee. I don't understand what point you are trying to make here.


The Twitter offer came before the stock market drop, making it both a low-ball offer and making the Twitter board smart to call his bluff.

Musk only needs to pay the $1B breakup fee if he can't procure enough financing, among other strings. Musk has definitely has enough financing, he just doesn't want to pay it, so he's trying to weasel out of it by laying the ground work for a settlement to walk away.

Either way, Twitter's board is getting Musk's money, either selling at a now-premium or settling for wanting to break, due to Musk's errors.


> Could you point me to those instances?

Here's one [1].

[1]: https://techcrunch.com/2018/08/08/the-sec-wants-tesla-to-exp...


You mean the instance where he did have a verbal agreement with an able buyer? A buyer who admitted to such in private text messages that came to light in court proceedings, and who later bought a different electric car startup?


"Verbal agreement" that has no stipulations about price or anything concrete that a U.S. Judge could find.

You asked for an example, and received one. No True Scotsman'ing the violation doesn't change the fact that Musk pumped the stock on Twitter with false and misleading statements.

Another example includes crypto: https://www.bloomberg.com/news/newsletters/2021-05-13/money-...


> Musk has a history of making public, bad-faith declarations to pump assets he later sells at profit.

The example you linked does not showcase nor prove your original point.


Only if you don't read it.

At this point I'm beginning to wonder what your burden of proof is for Musk doing anything that could be considered wrong.


> His offer wasn't a lowball offer at all considering the tech stock bloodbath we have seen as of late

The real bloodbath came after he bought it, at the time it was a modest premium.

> He would need to pay a $1 billion break up fee.

He is laying out the case at the moment to walk away and pay no break up fee.


> Musk has a history of making public, bad-faith declarations to pump assets he later sells at profit.

No he does not. He has no history of that what so ever.


Wasn’t Bitcoin one of these?


No it wasn't. As he hasn't sold any of the coins he bought. Tesla still has them.


(I think their comment was satire)


The PR is priceless.


[flagged]


Can't wait for the $2,000 "Texas TruckNutz" option for the Tesla Cybertruck. Retractable, obviously - for better aerodynamics.


Or when it's cold outside, presumably.


> * Twitter allows that number to remain so high to avoid introducing friction like captcha into real users' experiences

This doesn't pass the smell test in my opinion. Given that everyone who tries to create an account without a phone number has to go through the friction of getting their account locked immediately, they clearly don't care about this sort of friction. Not to mention the friction of just trying to view a tweet which has been discussed at length on HN before.


The difference is between posting a tweet everytime and going through the friction once. That's what he's talking about. You are comparing different things.


I have a Twitter account with > 10k followers that is a few years old. Created without a phone number, and of course, immediately locked. Somehow, managed get it unlocked and still going on a few years later without a phone number. Though, I'm always in fear of the ban hammer.


How about you add a phone number if you are living in constant fear?


> Given that everyone who tries to create an account without a phone number has to go through the friction

Yes

> Not to mention the friction of just trying to view a tweet which has been discussed at length on HN before.

Yes

Now imagine filling out 3 captchas every time you open the twitter app on your phone, 1 for ever time you tweet and 1 for every person you follow.

Most users of twitter, use either an app on their phone or the cookies in their browser suffer the friction of forgetting their password (likely password1 btw) because they have to log in so infrequently.


Also, bot networks are so kind as to label themselves with the same hashtag, I don't understand why Twitter doesn't analyze trending topics to detect bots. There's always dormant accounts with thousands of followed/followers who start the propaganda.


But we were talking about bots. You and Parag are shifting the dialogue to spam

> The hard challenge is that many accounts which look fake superficially – are actually real people. And some of the spam accounts which are actually the most dangerous – and cause the most harm to our users – can look totally legitimate on the surface.

He's not talking about detecting bots. I.e. fake and automated accounts. He's talking about twitter users/bots that cause what they perceive to be harmful content. Which is a very different thing, and was the whole point of Musk's intended involvement in the first place.


And? Only advertisers see the harm from bots - being charged for bot impressions.

Outside of that, bots cause the same problems that real people do, making twitter a place people don't want to spend time on/view ads on.

The advertisers need both bad groups removed


>* Twitter uses all sorts of internal private data in its analysis

>* Parag says you cannot get a reliable indication of bot/not bot without this internal private data

"I have secret information so trust me" is an excellent reason to reject an assertion every time, whether it is made by an individual, a corporation, or a government. It doesn't mean it isn't true, but it means that absolutely nobody should put any credence in the assertion at all.


It makes sense. The bots are just loud and tend to all follow the most famous people, so their numbers look larger when people look there


I think Elon's response reflects what a lot of us are thinking about the "<5%" number.


> Parag says you cannot get a reliable indication of bot/not bot without this internal private data

That's convenient isn't it?

This report made headlines because it aligns with everyone's experience with Twitter: almost everyone on Twitter is either a bot or a corporate managed account.


That doesn't align with my experience. The vast majority of what I see in my feed is real people. I do know, however, that if I look at the replies to any viral tweet that a large percentage of them will be through fake accounts.


Related thread:

Twitter CEO: “Let’s talk about spam, with the benefit of data, context” - https://news.ycombinator.com/item?id=31399913 - May 2022 (13 comments)


>Parag says you cannot get a reliable indication of bot/not bot without this internal private data

Riiiight, and Craig Wright claims to have proof of being Satoshi Nakamoto but won't show anybody.

I don't know how good of a CEO Parag is, but he's not a very good bullshitter.


"But you don't know which ones we count as mDAUs" and "accounts that look like spam are actually real" are not as good a defense as he thinks. The product is still affected by spam and fraud even if it's excluded from advertising metrics, and accounts that look like spam are not good for the product either even if they happen to be real people for whatever reason.


If they believe the claim to be false they can open the data they used to calculate the 5% so it can be verified by a third party.


So you want them to publicly list / make download-able any phone numbers of the people in that 5%, and their full names and email addresses?


Absolutely not. No social media company should take a set of user's private data and "open the data" (especially just because some blowhard is trying to find any reason to back out of a deal). Even without the "open" bit, they shouldn't be providing that data to a third party.


"We undercount active users whose accounts are protected, accounts that view tweets but don’t send any, and accounts that log in and engage in other ways beyond tweeting (like favoriting or adding profiles to lists)."

My markup. If I understand correctly, not having a public tweet is a marker for being a spam account. Isn't that kind of a lot of people? I know from other forums that there's a large ratio between lurkers and active posters.

Given that you need an account to customise your timeline, and, these days, pretty much for just reading a tweet, there may be loads of real reader accounts that never post and never bother customizing their profile.


Yeah, I never knew I was a bot. I thought I just had a passing interest in a few people on Twitter and not much of interest to share of my own.


I remember when I found out I was a bot. It was during a harrowing judgement conflict with an image-based captcha about traffic lights, and caused a complete shutdown of my higher order processes. To this day, I still don't know the threshold of how much traffic light actually needs to be in a square to be considered for selection, so I just stopped logging in to everything.


That's because you took the blue pill.


Yep, me too.


If you have not tweeted.. you are excluded from this analysis.

A spam account is gonna spam right? But some real users of twitter may only tweet once a year. This study just doesn't include you. It isn't saying you are spam, just not including you in the count.


There are probably many 'fake' or bot accounts that don't tweet; they'd be used to prop up the 'likes' or views of other accounts, either customers paying for exposure, or other bots.


Or are simply used to bypass Twitter's semi-aggressive login wall.

Anecdotally, I know plenty of non-robot people who have twitter accounts but don't tweet.


The population of accounts that tweet is going to contain a larger proportion of spammers than the population of active Twitter accounts. Presenting the former as the latter is disingenuous.


Yeah, but it's just clickbait then. An honest title would be "We sampled users that act like bots on Twitter and found out that lots of them were bots".


Twitter is as hostile to unauthenticated users nowadays as reddit is. Really sad.


Yea, I have an account that is literally just for reading 3 peoples' tweets (they're former generals that frequently comment on Russia's invasion of Ukraine). I have their timelines bookmarked, and just read the threads they post.

I'd almost certainly be marked as a bot.


I don't think accounts like yours are marked as bots... They're just not counted.


But since the 20% in TFA is a fraction that must necessarily include a denominator, the denominator in this post is wrong. Maybe by a large factor.


Can you share the accounts please? Sounds interesting.


There's even a name for it: https://en.m.wikipedia.org/wiki/1%25_rule


so much of this thread is 'proof' that supports whatever elon musk is trying to do, but seems to not realize this is a totally independent actor making a dubious set of assumptions to come to a number. This analysis might be interesting because it is timely but it has no bearing on the musk twitter acquisition anymore than me running such an analysis does.


Oh, do I have notes on their methodology.

1) They talk about "active" accounts (meaning have tweeted in the last 9 weeks), and do a bunch of filtering against that. That seems like a huge bias - lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely. Frankly, until recently, my twitter account would have been one of the ones they would have discarded as inactive. This one thing alone makes me question all of the rest of their results.

2) By the same token, the rate or frequency with which a user sends tweets has no relation to whether a user is monetizable. If they're seeing ads, they're monetizable...lurkers are just as monetizable as high-volume posters.


You seem to be arguing against something that the article doesn't claim. The article isn't equating inactivity and fake/spam, but that: of the accounts that actively send tweets ~20% are fake/spam.

Sure that's a different question from what proportion of all users are fake/spam, but this is still a perfectly valid question to ask, and the fact that they're only considering active users is in the title so I really don't get your complaint.

If you want an analysis that attempts to answer a different question go find or write one that addresses the question you want answered...

The article clearly states (emphasis mine):

> This represents the largest set of accounts on Twitter we could acquire, but it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit Twitter’s definition of mDAUs (monetizable Daily Active Users).

From the linked Twitter earnings report:

> We define monetizable daily active usage or users (mDAU) as Twitter users who logged in or were otherwise authenticated and accessed Twitter on any given day through Twitter.com or Twitter applications that are able to show ads.

EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.


> EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.

The fact that you had to do this proves the point. Nobody defines "active" the way they have here. The claim is nonsense.


That edit was made 40m before you joined the conversation. Noting your edits is a social convention and voluntary concession offered by a posts' author to validate replies that were made before the edit, while clarifying the authors intended message for future readers. If those future readers use the content of the edit message to shallowly refute the post, consider the incentive this creates to not follow that convention for all authors in the future. If you have a valid refutation, surely you can find evidence for such in the body of the message rather than nitpicking the edit history.


I think you misunderstood their response. They are saying that the study has an unusual definition of "active", and that your need to clarify the definition proves that it is unusual.

Though personally I think filtering specifically for users that actively send tweets makes sense, since that's really what matters when it comes to measuring how healthy and authentic the discourse is


What is the proper definition of "active"?

It seems like everyone is arguing about different metrics and it makes more sense to discuss different, specific measures that might fall into a range of behaviors that are "active" in some sense rather than focusing on which definition of "active" is somehow the best one.

What would be more interesting would be to adapt this and answer several different questions about the proportion of spam among accounts with different metrics of activity to see how things change. For example, does the percentage of spam accounts go down a lot if we lower the bar for "active"? How much & how fast?


> What is the proper definition of "active"?

Twitter's quarterly earnings define active users thusly:

> Twitter defines monetizable daily active usage or users (mDAU) as people, organizations, or other accounts who logged in or were otherwise authenticated and accessed Twitter on any given day through twitter.com, Twitter applications that are able to show ads, or paid Twitter products, including subscriptions.

https://s22.q4cdn.com/826641620/files/doc_financials/2022/q1...

I'm pretty sure I've heard a similar definition from Facebook.

This definition supports g-clef's critique that the article picks an unorthodox way to measure active users, resulting in an inflated percentage of accounts being measured as spam/fake accounts, vs what the percentage would be if measured against Twitter's definition of 'active', which includes lurkers.


Strange rant. It's not about you editing your post in general. It's that your edit shows that saying "active accounts" when you really mean "accounts that have recently tweeted" is wrong, like the very title of this submission.


The point is that their definition of active is inaccurate. You can be an active user and not tweet.


Look, there are dozens of potentially interesting and valuable questions to ask on this subject. Answers to which may produce a wide range of insights and conclusions. And there's a whole potential conversation about which questions are most important, that may have different answers depending on the context.

But there's no reason to pin the whole frame of the conversation to the one question for which Twitter corporate chose to publish an answer, unless the only question we are interested in is "did Twitter technically lie" which is the most uninteresting question in this whole situation. If this is the sole context you are using to frame this issue then maybe you should consider if you're following the current news cycle a little too closely.

The idea that there is such a thing as an 'inaccurate definition of active' is silly.


In the light of Musk's statements, which presumably precipitated this timely article, I would say the question of whether Twitter technically lied is the most important question for Musk doing the things he does.

If you're more interested in Twitter's ecosystem as a whole, it is less interesting.


At every company I've worked at any time someone has asked "How many active users do we have?" it was a difficult question to answer because everyone's idea of "active user" was different.

"Active, as in logs in regularly? Wait, what is 'regularly'? Once a week? Once a month? Every day? Does 'active user' mean, online right now?"

Etc, etc...

Their definition of "active user" is relative, not inaccurate.


>"did Twitter technically lie" which is the most uninteresting question in this whole situation.

I don't know, that seems more interesting than most questions that could be asked about Twitter.


> I don't know, that seems more interesting than most questions that could be asked about Twitter.

Why? Twitter is a for profit corporation. If, on the balance, lying serves their interests (I'm sorry, I meant "is consistent with their fiduciary duty to their shareholders") more than edging up to the line without crossing it, that's what they will do.

Even the watchdog organizations such as the FTC and SEC that police the speech of corporations more or less limit themselves to material statements that move markets or influence consumer behavior in ways that can be considered fraudulent. The FTC, FDA, and others are concerned with a fairly narrow reading of consumer harm, the SEC is motivated by the health and trustworthiness of the public market. In any case, there pretty much always has to be some sort of alleged harm. Lying per-se is hardly ever forbidden. So if the advantages of a lie outweigh the (risk adjusted) penalties and reputational risks, that's that.


I think a conversation about what ways we expect and permit corporations to lie, either specifically in financial statements or to the general public, is much more interesting than a discussion of exactly how many fake tweets there are and exactly how many accounts are making them, though I guess you could construe that as broadly part of the same conversation.


> I think a conversation about what ways we expect and permit corporations to lie, either specifically in financial statements or to the general public, is much more interesting than a discussion of exactly how many fake tweets there are

I agree, that would also be a much more interesting conversation than "did Twitter technically lie."


Sure. But if I'm looking to purchase Twitter, I think I'd be much more interested in and concerned about this "white" lie than you are as a general consumer.


I'm with you there. But the context here is what would be interesting to us, and not what's potentially interesting to Elon Musk.


I think it's pretty easy to argue that their definition is intentionally misleading, which may not be technically inaccurate, but is arguably just as bad.

The big story in the news last week was "Elon Musk says deal on hold while verifying twitter's 5% Monthly Active Users stat", or something to that effect.

That's the context this article was published in. It is transparently obvious they are re-using the word "active twitter accounts" to cause confusion with the definition of "active" that has been being bandied around. The post is using such a title as a clickbait, to hop aboard a trend.

I think the title, and lack of significant clarification in the article, make it clearly misleading, and I don't think pedantic "well technically active can have multiple definitions" changes the reality of the situation meaningfully.


But their definition makes things look worse for them. The high number of lurkers would make the percentage of fake accounts smaller.


I don't understand what you mean.

Let's take both their numbers at face-value and assume they're true.

Twitter has reported: 396.5 million logged-in-this-month users, of which 5% are fake/spam (19.8 million fake users)

This article reported: Looked at 44,058 tweeted-recently accounts, of which 20% are fake (8,800 fake)

Which of those stats looks worse for them?

> The high number of lurkers would make the percentage of fake accounts smaller

Why? Twitter included lurkers in its dataset, this article didn't, why should that impact stats in the direction of fake accounts being smaller?


> Why? Twitter included lurkers in its dataset, this article didn't, why should that impact stats in the direction of fake accounts being smaller?

Because you usually don't create fake accounts to lurk, but to do "something".

I'm speculating, but even when you create bots to boost follower counts you'd probably make them post now and then so as to seem "active".

It makes sense that the proportion of tweeting accounts being bots is much higher than the proportion of lurkers. And since there are also more lurkers in turn than posters, I would say that the real number is much lower than that.


I don't buy the speculation as obviously accurate.

Let's say I own a twitter bot farm. I make 20k accounts, have a system setup that logs into each of them from a unique IP each month at random times to make sure they're not banned yet, and advertise it out. On month 1, someone buys 1000 of them as followers. On month 2, someone buys 1000 of them to tweet spam. etc etc.

Each month, there's 20k active bot accounts (logged in to verify they weren't banned). Only a small number may actually tweet though since buyers may have not gotten them yet. Bot accounts lurk too, for months on end, before ever acting.

I'm not claiming this is accurate, but I am claiming this is a reasonable alternative which doesn't align with the view of bot accounts being more prevalent in tweeting accounts than lurking accounts.


> Twitter included lurkers in its dataset

A metric they've artificially inflated by gating tweets, which works to their advantage when calculating spam. With that in mind, I think I'm more inclined to look at spam as a percentage of active tweeters and ignore lurkers.


I thought the parent was criticizing Twitter's active monthly user definition, which only includes people who have tweeted in the past 90 days. The article used this definition of active use as well.


Twitter requires users to log in before lurking so their definition of activity is intentionally selective. I'd be surprised if Twitter doesn't know how active their users actually are, even the lurkers.


I read lots of tweets and don't have a Twitter account, or at least one that I've logged into in the last 10 years... The philosophical question seems to be, "am I a Twitter user"?

You could probably argue that most of the world read Twitter and hence are users, account or none. It's that pervasive.

But then there's the next question: "am I a user that reportedly matters to Twitter's business?". What people are trying to land on, in light of Elon's tweet that the deal is on hold pending investigation of Twitter's metrics reporting, seems to be a framework for carving out what exactly constitutes a user that brings the platform revenue that shows up in quarterly reports and hence would directly relate to the tangible value of the enterprise.

In reality, nobody knows what numbers are being thrown around behind closed doors. This article is just one framing.


For most of Twitter’s existence, this was me. I used Twitter a lot but I never tweeted.


They define it in the article as accounts that have tweeted within the previous 9 weeks. If you are lurking and not tweeting you are not "active".


> of the accounts that are active ~20% are fake/spam.

Nope. ~20% of accounts that tweets are fake. A lurker (aka read-only) is by all meanings an active account.


It's not an active account if by "active" they mean "generating content". While Twitter isn't a typical content aggreagation site like Youtube or Reddit, tweets are still "content" in the sense that they drive further user engagement on the site.


Words used to mean things. The current HN submission title just says active, heavily implying accounts with any kind of activity (eg. like, follow/unfollow), not "users who Tweet".

Sure, clickbait headlines are the norm and the devil lurks in the details, but still, many comments have been spent on this, because it's clearly misleading.

~80% of email is spam, it doesn't surprise anyone, because it's so cheap to send spam. Similarly it's easy to create fake accounts and spam, yet it doesn't mean much.


Who's counted as "engaged"? The people reading, or only the people writing? More to the point, if Twitter moved to a subscription model, would zero lurkers buy in?


Seems like social network aren't interested in counting those that don't use all potential features of the platform. I'd say a lurker/ghost member is definitely an active account.


I would say that if someone is able to be advertised to (since that is what makes the business money) then they should be counted. So yes, there should be no requirement to tweet to be counted.


Absolutely, anyway they're earning money from these users, so definitely count it.


Not according to the article, which is the point...


"Active user" is a common industry term with a well-defined meaning. It's misleading to use it to mean something else, particularly when there are a number of more appropriate choices, e.g. "20% of Twitter posters".


It isn't misleading when the article itself explains how they are using the term.


The article clearly defines those accounts as "active" because it's the only way an external observer can somehow isolate an "active" group. Only twitter can know how many users are "lurkers".

And since they are trying most probably to get some PR for their company, they use their specific definition of "active Twitter account".


You are an inactive user. According to me, being an inactive user is making a comment I disagree with.


When you are in the context of : - Twitter determine the active status of an account using login - People are wondering the % of active users as defined per the twitter metrics

But then use your own definition of active and write only a one liner on the difference with no reflection on the impact it might have and no warning on the fact you are answering a different question. Then my conclusion is you want people to make this mistake.

> EDIT: rephrased "accounts that are active" to "accounts that actively send tweets" to clarify what the article addresses.

Made me laugh because you had to add it and made more effort than the author of the article to prevent the confusion :D.


Interesting. This could be a bracketing error, because I read

> it includes analysis of many older accounts that haven’t sent tweets in the last 90 days and thus, likely don’t fit

> Twitter’s definition of mDAUs (monetizable Daily Active Users)

As implying that they think accounts that haven't tweeted in the past 90 days don't fit Twitter's mDAU definition. Given the placement of the qualifying phrase, I think that's a reasonable parsing of the sentence, but I see your point that they could be trying to imply their set doesn't fit the definition. If so, that sentence is very badly constructed.


The full quote doesn't do SparkToro and Followerwonk any credit:

> Followerwonk selected a random sample from only those accounts that had public tweets published to their profile in the last 90 days, a clear indication of “activity.” Further, Followerwonk regularly updates its profile database (every 30 days) to remove any protected or deleted accounts. We believe this sample is both large enough in size to be statistically significant, and curated to most closely resemble what Twitter might consider a monetizable Daily Active User (mDAU).

The fact that they don't even consider the concept of a non-tweeting lurker to be an mDAU brings their entire analysis into question. Let's face it - Twitter is an emotionally-charged enough place, and tweets have such a way of living forever and being taken out of context, that there are many who use it to consume (and perhaps Like) content but will not tweet publicly. These people are still viewing and engaging with advertisements! Twitter absolutely should consider them monetizable!

But of course, engagement data on lurkers is internal only, and Likes data counts against global API caps: https://developer.twitter.com/en/docs/twitter-api/tweets/lik.... Which means that SparkToro and Followerwonk are incentivized to ignore these users. That they do ignore them, and don't address it anywhere in their methodology, is highly suspect.


The article is just clickbait. The title is obviously clickbait (based on your edit you've realized that "active account" !== "accounts that tweet"). Then they try to define active account:

> “Spam or Fake Twitter accounts are those that do not regularly have a human being personally composing the content of their tweets, consuming the activity on their timeline, or engaging in the Twitter ecosystem.”

Ok, but "consuming the activity on their timeline" is essentially unknowable outside of Twitter, since you can't see what tweets people are viewing. It turns out they're trying to infer this through some other signals like follower count, etc. But you can imagine why that might be sketchy.

Then they constrain the analysis: > A more fair assessment of Mr. Musk’s Twitter following would only include accounts that have tweeted in the past 90 days

Let's be real, if you look at a list of Elon tweet replies, they might as well all be spam. Just search @elonmusk and sort by latest. Then compare that to the sorted tweet replies under an actual tweet. IDK how many millions of dollars and man-hours went into the AI that sorted this list, but it seems to just be putting the blue checks at the top and shrugging at the rest. I doubt this three man team is doing any better at spam detection.


For manipulation / spam purposes I don't really care about accounts that don't actively post/like/retweet/follow. The mDAU isn't useful at all for determining if the activity on Twitter is done largely by bots.


I do wonder how "fake" is calculated. Is @tweetsfrommydog fake? It's a real person making tweets that are funny and provide value to the platform, but it's not a real person as an individual tweeting their personal thoughts, are corporate accounts or parody accounts fake?


It is valid criticism because the context of this article is that Elon Musk wants to know whether Twitter's own claims of ~5% fake/spam accounts is accurate. We do really want an analysis that investigates that precise question and not a related one.


Elon Musk waived his right to due dilligence ... more fool him.

You can file this in the 'pedo guy' cabinet of his life story where his child-ego got the better of his undoubted business skills.


He can do due diligence but from what I heard (correct me if I am wrong) he has to pay a heavy penalty ($1B) if he backs out.


According to Matt Levine, that's "not how any of this works". The $1B is if he could not secure financing, but it appears we are now past that point. The relevant question is whether the Twitter board wants to sue in court to compel a sale.

Given what Musk does to the personal lives of his opponents, I'm not sure I would want to fight him. But given how many laws and rules he's broken at the point, I think there is a clear failure of justice if he can just do whatever he feels like without repercussions due to his common popularity.


What does he do to the personal lives of his opponents? And why would the board not do their fiduciary duty out of fear of that?


Lurkers are also the most important people. They consume the content. They are the meat of the business, the ones that respond to advertising and political messaging. If I were twitter I would champion all the lurker accounts, all the eyeballs to which twitter serves content. Nobody ever faulted the Nielson ratings scheme for "lurker" viewers who only watched but didn't themselves create television shows.


Definitely agree. I joined Twitter four months ago. I haven't tweeted yet, but I'm reading it daily on the app and occasionally liking tweets.

I've been so surprised at how effective the advertising has been on me. I've never experienced this level of engagement with online marketing. Ads for TV shows, movies, live shows, musicians and comedians have been particularly effective.

I've found myself following a lot of show writers I've never heard of, and I even signed up for some new streaming services because of it. Google and Facebook ads never felt like they impacted me, though I know how important and dominant they are to business marketers. I've never clicked on a banner ad and my eyes glaze over sponsored links. Twitter's level of engagement with their marketing content is new to me, and I'm impressed.


I actively work to block or prevent ad tracking. When youtube serves me an ad for retirement planning or feminine hygiene products, that is my little victory. That is me successfully preventing them from knowing enough about me to target ads.


Furthermore, there are the non-tweeting active users (ones who like only) and the ones who RT a lot but don't create organic tweets.

Those are indeed incredibly valuable. Engaged audience = your real audience.


Unlike passive media consumption though, Twitter needs users to submit content (tweets, replies) to give lurkers something to do.


Yes and no, just like any major media platform, huge majority of tweets being seen are from a very small group of influencers/popular person. That's why when you join twitter, it suggests to you a lot of people to follow that are already big.


There's only a yes in your answer.


No. You can have only 10% of accounts actively tweeting and the rest just consuming what those post. All those - active and not - are monetizable


You don't really need that many people to submit content though. I imagine most YouTube users have never uploaded a single video, and they don't need to, since there's basically no end to available content there.


Twitter specifically added the annoying feature of your likes being shown to your followers so that lurkers would be actively contributing to the algorithm though.

As long as lurkers are "liking" content, their local network will see an engagement increase.


This is a second-order objective though. The goal is to show ads to humans on the platform. Having a lot of human authors (or any kind of content authors) generating content is a way to achieve the goal, not a goal in itself.

There are other ways to achieve the goal, such as making ads more relevant (targeted advertising), having users consume more of the same content (recommendation), having the same content take longer to consume (periscope). Growing the number of human posters is definitely not a requirement.


The people who create content do it in such massive amounts that this never seems to be an issue.


And I thought it was common knowledge that lurkers always vastly outnumber people who post content on any platform. If lurkers outnumber posters by at least 3:1, then 20% goes to 5% and twitter’s “<5%” figure is accurate.


Lurkers are probably anywhere between 8-12:1. People actually posting stuff on the internet are in the vast, vast minority, creates a sort of echo chamber.

I am technically "logged into" twitter so I can click through and read the postage stamp-sized charts linked to through various articles and blogs, or watch a video about a riot in some far flung part of the planet. Once a year I tweet at airlines when they lose my luggage or whatever but otherwise don't tweet. Twitter isn't a good social media service, it just happens to be the image/video sharing platform of choice for journalists to promote themselves.


> That seems like a huge bias - lurkers exist

I created an account 5 years ago, followed one or two people, got bored and never logged in again.

Presumably their intention is to exclude abandoned accounts, like mine - is there any way they, viewing Twitter externally, could tell lurker accounts like yours and abandoned accounts like mine apart?


As a third party? Probably not. Which is why it's going to be very hard to disprove Twitter's assertion unless Twitter chooses to share their data.

That's part of why I find articles like this frustrating: I don't think they have the data to actually answer they question they're attempting to answer. Knowing that, what's the purpose of the article?


> Which is why it's going to be very hard to disprove Twitter's assertion unless Twitter chooses to share their data.

It's impossible to disprove Twitter's assertion because they never claimed that less than 5% of their accounts are spam. From their quarterly earnings:

>We define monetizable daily active usage or users (mDAU) as Twitter users who logged in or were otherwise authenticated and accessed Twitter on any given day through Twitter.com or Twitter applications that are able to show ads.

>... mDAU does not include users accessing Twitter through third-party applications.

Their statement said that less than 5% of their monetizeable daily active users are spam. There very well could be 50% of the entire user base as bots or spam, but that doesn't negate the metric Twitter releases.


This doesn’t resolve the issue the article has though. I’m a mDAU because I’ve logged in, yet there is no way for the people writing the article to know that I’m active.


Yeah the article has a few big issues, yours is definitely at the top.


They could maybe use like activity in addition to just tweets? Inherently though this system is going to be less accurate than the dataset that Twitter has access to. If a large chunk of users only engage in Twitter through DMs then an external organization isn’t going to have insight into that.


I would imagine Twitter would have access to analytics that third parties don't have, which would allow them to pretty easily work out which accounts are logged in and used for browsing and which are actually abandoned.


As a small complication: I have a twitter account, doubt I've ever tweeted. I browse twitter quite often, but I'm _never_ logged in.

No idea if I should be counted or not in any particular bucket, or how anyone would know.


AFAIK, you're not counted in any bucket. That's one reason TWTR wants you to log in to read. So active user numbers go up.


I thought they used a banner that pretty much forced you to log in to see more than the first few tweets in a thread now (same as instagram)?

I have an account that is logged in, but it has only sent 7 tweets since 2014 (and they're only to customer service accounts).


It doesn't seem to. A few months back it was ~forcing for a bit so I moved to nitter.


What client are you using that allows you to browse without logging in?


Opening a Twitter link in a private tab is the low complexity solution, or there's nitter.net, or deleting cookies, or various browser extensions that delete cookies for you.


I had no idea it was so strict. I just use Firefox. Their cookie behavior must be picky enough that it bypasses whatever nonsense Twitter is doing?


After posting that, I went back and retested. It looks like they have swapped back to a soft nag popup. For a few months it was hard blocking any further scrolling, at least with Chrome.


That's quite possible, I did move to nitter for a while, some months back, due to that.


Firefox :shrug: . It's never forced me. When it gets too annoying trying to push me to login I move to nitter.


Private browsing mode ("incognito" in Chrome)


is there any way they, viewing Twitter externally, could tell lurker accounts like yours and abandoned accounts like mine apart?

No. Which is why the only reasonable thing to say as an external party is "we don't know."


If an account is in lurk mode, then its not a spammer so I'm okay with it being left out of that equation.

Where I might agree with you is a lurk mode account could become collateral damage in being considered fake. Lurkers don't retweet though. An account with a million followers isn't seen by everyone. Having a portion of that million like/retweet amplifies even further with their network now possibly seeing something from someone they are not following directly.

I'd be willing to accept that the number of lurkers that get lumped in with fake accounts when deciding the percentage of actual eyeballs on posts is not harmful. Those numbers are made up stats anyways. Like the old days of TV/Radio stations that covered large cities with millions of citizens. They would claim they have an audience in the millions even though a small fraction were actually watching/listening.


Except the question isn't about the pure number of spam/bot accounts, it's about the ratio of spam/bots to "authentic" users. If you leave out the lurkers, that ratio gets skewed to mistakenly inflate the bot count.


First off, I don't give 2 shits about twitter, so I don't care if the numbers are skewd in either direction. This is more of an interest in seeing how SV stats/metrics are just a game. Just so that's out there.

A lurker isn't an active user in my opinion. Maybe that's not the same understanding as accepted definition. The lurkers might be absorbing some of the ad content, but they are not helping create new avenues for ads to be shared. Twitter's ad share surface area would increase tremendously if every user was actively producing tweets. That's the only metric that they are concerned. They don't care about how many people actually see the ads once they are there. They make their money on the potenial eyeballs alone. Lurkers are not helping increase those numbers.


> They make their money on the potenial eyeballs alone. Lurkers are not helping increase those numbers.

I don’t follow this.. Lurkers are they eyeballs presumably.

If everyone on twitter tweeted the same amount it would probably just drown out the popular accounts and create a more diffuse and less profitable ad space I think.


>> They make their money on the potenial eyeballs alone. Lurkers are not helping increase those numbers. >I don’t follow this.. Lurkers are they eyeballs presumably.

The number of eyeballs allows for the price per ad to increase while the number of places ads can be placed increase the volume of ads. If lurkers are not helping to increase the volume, it doesn't make the platform as much money. Proving the lurkers are actually consuming the ads and making the ad buyer happy is non-trivial. Proving the lurkers are worth increasing the price per ad is also non-trivial. In the end, I personally feel like it is a wash by lurkers being overly represented in the fake account numbers.


Compare Twitter ads to the ads in a newspaper or something. 100% of a newspaper's readers are lurkers, but ads still seem to be worth more than $0.


Volume of ads is irrelevant. An additional tweet to attach an ad to does not generate revenue if there is nobody looking at it. On the other hand, though, an additional set of eyeballs on an existing monetized tweet does generate additional revenue.

As an extreme example, a single monetized tweet with a billion viewers generates money. A billion monetized tweet with one viewer obviously does not..


Why are lurkers not helping numbers? It's the exact same as Youtube, do you expect majority of lurkers on YouTube to not be counted because they didn't create a video? People follow what is already out there and ads target the people watching.


Lurkers are the eyeballs…


And yet I find 20% more believable then under 5%

Edit: I guess it's true that lurkers won't be bots, unless they are clicking on ads or trying to simulate engagement to help certain twitter accounts seem popular.


All those fake followers you can buy could just aswell be "inactive" lurkers though.


That means that 20% of the posts that I see, as a lurker, are generated by bots. The bots are having a huge influence on conversations, and that's important to know.


> That means that 20% of the posts that I see, as a lurker, are generated by bots

I don't see how you can arrive at this conclusion. It depends on who you are following, with some additions by the algorithm (unless you use the chronological feed) and (speculating here) the algo pushes content from real humans.


I read tweet replies, not just tweets (apologies if I'm not using the correct terms, I'm not an active Twitter user). The original tweet may be a real user, but I often dive deep into all of the comments. If 20% of those comments are from bots, then that's a lot.


No, since you choose who you follow, you're most likely filtering for interesting stuff. I'd wager that most of the spam bots are pretty obvious to spot, and makes up very little of a user's feed.


I rarely read my feed. Most of my lurking is on the replies to famous/infamous tweets.


I don't know how many original tweets are made by bots but 20% of the replies to anyone with a 5 figure follower count seems to fall on the low side of what I would guess.


Doesn't have an url in profile is sort of a weird metric. Note everyone is there to self-promote


I have a twitter account, but I have never tweeted or retweeted anything.


Same with my account. I only login from time to time when I am forced to sign in to view something.


It sounds like you are genuinely a non-active user, and probably not interesting from the PoV of Twitter/acquirers or the GP poster. This thread is about lurkers: people who regularly log in and read their feed (thus consuming ads and being relevant from Twitter's business perspective), but who don't post and would thus be excluded using the methodology of TFA.


Why do you say GP is non-active rather than lurking? They do read tweets, see ads, and even have an account that they log into.


I was offered $300 for my twitter account, I suppose partially on the basis that I haven’t tweeted much, but I use it daily to weekly though don’t tweet often, one tweet in last 2 years or so.


Well, I've been actively trying to create a new Twitter account for a little under a month and Twitter thinks I'm a bot. I've made 1 tweet and followed 5 people.

Even paid for Twitter Blue...still thinks I'm not real. Support is unreachable.

My current plan is to wait til Elon completes the takeover and then build an entire site dedicated to getting Elon's attention to unlock my account...because that's the only way to contact somebody apparently.


Have you tried tweeting at them :P

Edit add: I find it horrible that we have companies that you can not contact, in fact they seem to be going out of their way to make hard to contact them.

Even things you pay money for, like airline tickets. They want you to email them, make the phone number hard to find. So you do, they don't respond and then you have to search and call them, wait an hour or more on hold. The agents are nice but the entire process is terrible.

Earlier I had to do that for a damaged luggage claim. Went through the automated phone assistant to get to damaged luggage claims and it gave the option to use text messages. So I give it a try, nope. They can't resolve the issue through text, has to be on the phone. So I had to call back, re-enter all the info through the automated system and then ignore it's pleadings to use the text system.


Probably forced to since they do not have access to login information. Especially since if you do not post but login you are certainly not a spammer ^^, could still be bot crawling.

But they probably should expand more on this and reflect on how much inaccuracy it adds. With a quick search you can find that less 50% of US users tweet five times a month (https://www.pewresearch.org/fact-tank/2022/03/16/5-facts-abo...). Or the study which, reported that the top 25% of user produce 97% of the content, the median user of the bottom 75% as posting 0 tweet a month (https://www.pewresearch.org/internet/2021/11/15/2-comparing-...). Those studies were done using survey I believe so should include only active users and no spam/bot.

So with random invalid maths, if you make the assumption that the 25% less active users might not even post every two month (exponential decrease of activity ?) then you need to add back a quarter of the 80% they found as active.

Not to say I believe the 5% number from twitter; and I was going to use the price for a thousands follower as an example, but seeing it appears to be at 30$ now (https://socialboss.org/buy-twitter-followers/ ?) when I remembered it at like 5$ then the twitter team might have done some good work ;).


But one can say that 20% of the content on the platform was distributed by bots. Meaning that all the Lurkers have to consider if they are really interested in content, that was pushed by some bot-farms. Technically, every user of this platform has to take a step back and evaluate, if anything they have seen is not pushed content by some bots.

20% is huge and I am curios if there will ever be some comparable "official" numbers to that.


No - you can say that 20% of the accounts actively posting are spam/bots.

It's possible they are posting MUCH more or less than 20% of the content.

If these are skewed toward the high end of producers - the 80/20 rule would say that as much as 80% of the content could come from them. Still - it's possible this content isn't interacted with much outside of other bots. You can't draw many conclusions from such a limited data point.


100% this. I haven’t tweeted in nearly 3 years, and even that was a retweet. But I’m still logged in and consuming crap from Twitter all the time


Same, last tweet from me was in December and I check Twitter daily. My last self-composed tweet is well over 2 years ago.


If it's the 80/20 rule then there's 4x of the other 80.58% that are lurking - which brings down % of fake/spam accounts.


There was this suggestion to conduct a sting operation of displaying captcha to a sample of users to determine the % of the bots.

Probably picking the sample is still challenging but at least can somewhat tell if the accounts in the sample are genuine.

The method in this article is so flawed that Larry Ellison, founder of a famous law firm, would count as an inactive account since haven't tweeted since 2012[0] and that person apparently looks into investing in Twitter[1]. How can be investing a billion in Twitter when he doesn't use Twitter at all?

[0]https://twitter.com/larryellison?lang=en

[1]https://www.grid.news/story/politics/2022/05/16/larry-elliso...


They point out that's their definitions of active accounts is a flaw in their methodology (inside the article). However, I think it's fair to say that while TWTR has better internal insight into an "active user", it's the best approximation one can do from the outside.

I do wonder about, given perfect knowledge, how the bot accounts would shake up. What percentage produce content (presumably propaganda, automatic tweets using it as an RSS like announcement service, and spam) vs follow people (boost follow accounts, sell likes)?


>They talk about "active" accounts (meaning have tweeted in the last 9 weeks), and do a bunch of filtering against that. That seems like a huge bias - lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely.

All true. However, do you really believe that a bot is more likely to be active than a real user? If so, fair play to you. If not, then we would expect inactive users to be bots in an even greater proportion than what we see among active users.


We can argue about what the article did and didn't imply, but what's interesting to me about the issue you raise is that among lurkers there is probably a much lower rate of fake/spam activity, since there are fewer reasons for a bot to log in and not tweet. Couple that with the fact that lurkers are generally the vast majority of users on any platform, and that alone could explain the discrepancy between Twitter's 5% number and SparkToro's 20%.


Services that sell followers and spammers "aging" accounts generally would look like lurkers. Twitter could probably get an accurate estimate with the amount of analytics they have for internal use only, but of course they might be incentivized to not try very hard.


Perhaps they are attempting to argue that the value comes from the users that generate content more so than the eyes attached to the account?


> lurkers exist, and in my experience are usually the majority of users...this step removes them or ignores them entirely.

I've spent many, many hours lurking on twitter, don't have an account at all, and mostly access it through nitter instances. Are they "biased" for not including me?

edit: should inactive users be counted as active users?


Yeah, and I fully expect that these numbers went up recently with Twitter requiring login to view threads.

The fact that they add a .42% is a red flag in itself, especially when they admit in their own post that they agree that their analysis is deserving of critique. Very misleading stuff.

Their analysis using purchased bots seems a bit more reasonable.


“Passive” accounts may actually be more likely to be bots as many services sell fake followers. It’s just harder to detect with public information rather than their IP addressees etc.

Similarly I don’t think there is any way to separate active vs abandoned passive accounts as a 3rd party.


> They talk about "active" accounts (meaning have tweeted in the last 9 weeks),

This is not their definition, that's what Twitter considers an active account in their revenue reports.

> has no relation

It has some relation, no? I wouldn't be surprised if there is a strong correlation between how frequently a user sends tweets and how monetizable that user is.


their TL;DAbstract refers to this as a 'conservative' methodology, that is 'rigorous', and 'likely undercounts.

Their definition:

> “Spam or Fake Twitter accounts are those that do not regularly have a human being personally composing the content of their tweets, consuming the activity on their timeline, or engaging in the Twitter ecosystem.”

They note the following to differentiate fake and spam: > Many “fake” accounts under this definition are neither nefarious nor problematic. ... By contrast, most “spam” accounts are an unwanted nuisance.

Some general data analytics notes from their post:

* Then lump together fake and spam in their analysis - and this really matters! somewhere like NYT is both 'fake' meaning it isn't a real person and A HIGHLY VALUABLE ACCOUNT for twitter to have.

* They use a sample of 44,058 accounts (of ~1.047B)

* They look at a number of classifying variables (17), spam accounts met 10+ of those 17 criteria. They don't list all 17.

* The criteria were developed from a "machine learning process" that is undescribed, and was developed from a sample of 35,000 'known' fake twitter followers bought from 3 vendors and 50,000 claimed non-spam accounts. They appear (imply?) to have used 50% training 50% real data but dont't specify explicitly.

* They say their model is about 65% accurate, and unlikely to produce false positives ("almost never includes false positives") - however they don't list any specificity, sensitivity, etc. that would be useful to evaluating that claim.

* The analysis does no statistical tests, no confidence intervals, minimal information about how the model was tested or validated.

* Critically: they note, but do not describe or quantify, that a lot of the criteria are highly correlated

* then later in the article they suddenly seem to switch to a 10 point scale for quality away from their 17 point scale? with a threshold of 3 or below as low quality?

* My personal twitter account meets most of the metrics where they have listed a quantifiable threshold. And their fake followers tool lists it as pretty f'ing suspicious - i.e., low quality.

I'm not saying there wrong but I am saying good luck getting this from a blog post to any sort of respectable science publication. As they note at the end, they aren't even calculating the same metric - twitter uses monetizable daily active users - remember NYtimes? Absolutely a monetizable account - even if it isn't a real person.

anyone who thinks this is proof of Elon's 4D chess based on this article is, to me, frankly delusional.


Turning on my cynicism switch on a bit. The author is a very good content marketer. A hot topic in our corner of the world — which is author’s target audience — is Elon Musk buying Twitter. Musk tweeted that the percentage of bots is the main issue of the deal. He disputed Twitter’s number of 5%.

I believe the author writing prompt was just: a headline about fake Twitter accounts showing a number significantly higher that 5%. That’s it. Whatever the methodology, that was the author’s goal.

The article achieved this goal. Otherwise is completely irrelevant. Even for the person who wrote it.


> The article achieved this goal. Otherwise is completely irrelevant. Even for the person who wrote it.

It's almost like Inception isn't it? A PR stunt within a PR stunt within a PR stunt.


My account was active until recently (deleted when Twitter accepted Musk's offer, I don't need to be a participant in a right wing cesspool). I have 0 tweets. I don't like things, because I don't want my name attached to someone else.


This study is a great example of how you can use the data you have available to talk yourself into your conclusions. The implicit point of the study is to refute the "5% of Twitter accounts are spam" stat from Twitter's 10-Q that was the basis of his putting the twitter acquisition "on hold".

Except - the baseline that they choose is entirely NOT comparable to that of Twitter's baseline. The study says:

> Followerwonk selected a random sample from only those accounts that had public tweets published to their profile in the last 90 days, a clear indication of “activity.” Further, Followerwonk regularly updates its profile database (every 30 days) to remove any protected or deleted accounts. We believe this sample is both large enough in size to be statistically significant, and curated to most closely resemble what Twitter might consider a monetizable Daily Active User (mDAU).

Except that we know what Twitter defines as a monetizable DAU:

> We define monetizable daily active usage or users (mDAU) as Twitter users who logged in and accessed Twitter on any given day through Twitter.com or Twitter applications that are able to show ads.

Nothing about posting, nothing about engagement at all - simply: were you able to see an ad?

So there isn't any reason to claim that this "might" represent what Twitter uses as an mDAU - we know, in fact, that is not how they measure it. A more honest statement would have been:

"We selected a random sample of (etc. etc.). We believe that this sample is large enough to be significantly significant, however, it can not be compared to Twitter's mDAU set, as it does not count passive consumers of Twitter content. Instead, this data can be used to suggest that a significant amount of the total posted content on Twitter is delivered by bots"

My guess is the number of consumers of content is greater than the posters of content by several orders of magnitude, though some of that would be mitigated by the longer time horizon.


The whole thing is BS. Twitter could have easily got an accurate number if they wanted to. And Elon could have forced them to do that as part of the deal or done a decent job on his own before the offer.

Both sides are bullshitting to negotiate a better price.


The only one bullshitting is Elon Musk - that 5% number has been in Twitter's 10-Q for literally years. Here's the 10-Q from Q3 2020:

https://d18rn0p25nwr6d.cloudfront.net/CIK-0001418091/cb1d93d...

"We have performed an internal review of a sample of accounts and estimate that the average of false or spam accounts during the third quarter of 2020 represented fewer than 5% of our mDAU during the quarter. The false or spam accounts for a period represents the average of false or spam accounts in the samples during each monthly analysis period during the quarter"

This is not a new stat or new information.


I already said Elon was bullshitting.

Are you saying Twitter isn't bullshitting? You think that Twitter put in their best effort to get an accurate number?


I think that Twitter actively does not want spambots on their site and therefore would invest into metrics so they know how they're performing on that.


You said:

"both sides are bullshitting to get a better price"

Which implies that this number is related to Twitter trying to get a better price. However, if this is the number they have always said, regardless of the methodology, clearly they can't be trying to get a better price - as they started doing this long before there was a price to make better.

That is separate and independent from whether the number is accurate to begin with - if it's remained consistent and publicly available, there's no chicanery here, the risk around that stat has always been baked into the market price of the company.


The percentage of fake users has a high correlation with the value of the business. Twitter's value is higher if the number is lower. I'm not suggesting they just started bullshitting in the past month, just that they are not making the best effort to get an accurate number.


Shouldn't they be saying the number is decreasing vs staying at ~5% for years then?


Having zero bots would be good, but obviously that isn't True. So instead they say a low consistent number and hope no one proves them wrong. Most importantly it doesn't appear to be getting worse.


To be fair, they do admit this:

"Are you challenging Twitter’s earnings report, saying that <5% of mDAUs are fake/spam?

    We are not disputing Twitter’s claim. There’s no way to know what criteria Twitter uses to identify a “monetizable daily active user” (mDAU) nor how they classify “fake/spam” accounts. We believe our methodology (detailed above) to be the best system available to public researchers. But, internally, Twitter likely has unknowable processes that we cannot replicate with only their public data."


You only need to see who the author of this post is to know that the methodology is crap, the numbers are likely made up (19.42% is WAY too specific), and the post is just a grab for media attention on the coattails of some other internet meme garbage.

This guy (Rand Fishkin) has been selling SEO as a religion for the better part of this century, and is in no small part responsible for all the search-result-garbage style websites everyone is complaining about elsewhere on HN today and every other day.

He's a third-rate market-bro hack that's been taking advantage of web professionals who get thrown into SEO/Marketing jobs and have no idea what they're doing by relentlessly shoving half-assed corporate strategies through moz.com and now his new sparktoro.com, and calling himself the great SEO redeemer.

Wanna question his methodology? There is none. Wanna question his science? Totally devoid.


Unfortunately, this sort of thing is absolutely standard when third parties discuss Twitter spam. There is an astounding amount of academic research doing the same thing with methodologies just as silly.

https://blog.plan99.net/fake-science-part-ii-bots-that-are-n...


19.42 is a number rounded to 2dp. They rounded to "nearly 20" for the headline. I'm not sure how they could express things differently in this regard.


well if his science is SEO he seems to be good at it


I signed up with a vpn and got banned for life after a single nonsense tweet about not liking the feed and following 4-5 famous people. I don’t think bot detection techniques are very robust

> This methodology likely undercounts spam and fake accounts, but almost never includes false positives (i.e. claiming an account is fake when it isn’t).

In other words, their model performs well on their training set, and they don’t acknowledge that it may be over fitted or mislabeled, and they hand wave mistakes


they're robust at generating false positives! i got insta-locked for, i think, signing up using my own email domain, to make an account that shares pictures i took of geese. it took about 4 months to wait for them to unlock it

meanwhile, saudi bot armies apparently basically run rampant across the platform


Its get even worse than that. I have a test account with about 10,000 followers (mostly real people, you can tell), created via home-used IP, and few days after Musk tweet comparing Gates to Emoji of pregnant man, I did exact same tweet! With exact same punctuation, images, icons, everything.

Less than 2 days later I am banned for "spewing hatred based on sex, gender or religion". No amount of replies that my tweet was same as another but much much more popular account helped. Banned for life.


I'm 6 months into my account and they still haven't asked for a phone number. I mean, I have one ready if they ask for it. Maybe I've been whitelisted as a non-bad-faith actor due to my timezone and other factors.


"70.23% of @ElonMusk followers are unlikely to be authentic, active users who see his tweets."

Somewhat alarming, if accurate.


They define active as having tweeted in the last 90 days.

I'm active on twitter, check it every other day, follow @ElonMusk but I don't tweet. Perhaps I'm a unique case, or their assumptions are a bit off.

Some metrics they consider suspicious (from the article):

1) Accounts that didn't tweet recently

2) Accounts with low number of tweets

3) Accounts with a low number of followers

4) Accounts that didn't set up their own profile image

Lurking != bot, and these data-points would all hit high for lurkers. I'm somewhat suspicious of their results, especially given the results from this pew study suggesting the majority of twitter users don't tweet very much.

25% of Twitter Users Produce 97% of All Tweets: https://www.pewresearch.org/internet/2021/11/15/2-comparing-...


I am the same, in fact my account has never posted a thing ever, never liked at thing ever, I just follow people I want to see updates from that is all.

I treat it like RSS not Social Media


Look my username


With Twitters' changes to ask people to log in to see most content the number of lurkers probably shot up.


I'm now wondering what % of overall tweets are from that 1/5th of bots.

If it's 1-1 then 20% of tweets being from bots isn't great - but it could even be more than that.


Same here I never tweet but follow a few people.


I think it's connected to a more general phenomenon though. Pewdiepie has 111 million subscribers on YT and gets like 3-5 million views per video. Like 95% of his subscribers don't watch his videos.


I've subscribed to a bunch of people on Youtube whos videos I no longer watch.

I think Youtube's UI "encourages" this behaviour even more as it has a highly algorithmic homepage rather than a feed of people you follow.


Sounds accurate if you go through the comments under his tweets, a ton of url spam and scam 'trader' bots


The ratio of fake Elon Musk accounts to real Elon Musk accounts is probably even funnier.


This doesn't come as a surprise; I was able to buy 1 MILLION fake followers on Instagram. The account is still alive and well.

I've been meaning to write a blog post regarding this endeavor. But moreso, I've come to the conclusion that social media needs to have verifiable audits for their userbase; similar to how there are audits done for financials. A lot of the value of these companies IS derived from their DAU/MAU and or userbase in general (example: WhatsApp - $19 billion for their 1 billion users).


I've bought a small amount of twitter followers ($10-$20) for a meme account once. They disappeared over the course of a month or two. Checking online that's what happens with the majority of services.


I hit 1 million over a year ago and all of them are still there. Besides a few -3k drops a couple months ago, 99% are still there.

Twitter is different, they seem to have a more robust banning system in place. Instagram, not so much.


>The account is still alive and well.

You expect them to ban you based on your account gaining fake followers? Wouldn't such a policy make it very easy for any third party to get a user banned?


I assumed that someone would, at the very minimum, notice that my account went from 150 to 1,000,000 followers in less than a month, and then ban the bogus followers, as they are fairly easy to identify. That didn't happen


How much did that cost? And were you able to get any value from the fake followers?


I did this solely to collect data for a personal project I am doing. My last post (on Instagram) had 10 views, so from that standpoint, I gained nothing, however, I did learn lots of valuable insight into the realm of fake followers for which I will make a blog entry detailing my findings soon.


Revenue != Value


I intended to imply that daily/monthly active users ARE the value for many businesses. Facebook valued WhatsApp's 1 billion DAU/MAU at $19 billion despite the fact that WhatsApp was not producing any cashflow.


I signed up at Twitter 9 years ago and use it every day, but have less than 50 tweets. Either you want to engage in discussion (to a varying degree) or you simply don't. But the latter doesn't necessarily mean you don't observe the discussion.

Of course an active Twitter account can be one that never tweeted in 10 years. Such an account may as well see advertisements etc... So I think any outside studies are fundamentally flawed since they don't have access to internal data like last login time.


64.56432% of significant digits are misused.


Your mental random number generator, or the keys you're willing to type one handedly for a bit, represent only 50% of the 10 Arabic numerals in use today.


With only 7 digits typed, we would expect at least some repetition ;)

The chance of no digit repeating out of 7 would be about 20%


With a proper keyboard including a 10-key, I can reach all 10 of those Arabic numerals with one hand. I find your logic equally as flawed as the statistic. You sir, must be a spambot account!


Interesting to think about:

> Our systems do not, however, attempt to identify Twitter accounts that may be irregularly operated by a human but have some automated behaviors (e.g. a company account with multiple users, like our own @SparkToro, or a community account run by a single person, like Aleyda Solis’ @CrawlingMondays). We cannot know how Twitter (or Mr. Musk) might choose to classify these accounts, but we bias to a relatively conservative interpretation of “Spam/Fake.”

So this means:

* @EmojiAquarium - spam/fake * @threateningcake - not spam/fake * @CanYouPetTheDog - spam/fake * @ChuckGrassley - not spam/fake (?? - what fraction are staff generated vs Chuck?) * @Wendys - spam/fake * @Twitter - spam/fake


To be clear: it doesn't matter, as far as Elon Musk and his buyout is concerned.

To quote Matt Levine's "Money Stuff" newsletter:

> “Temporarily on hold” is not a thing. Elon Musk has signed a binding contract requiring him to buy Twitter.

> That contract does not allow Musk to walk away if it turns out that “spam/fake accounts” represent more than 5% of Twitter users... The merger agreement contains a provision that allows Musk to walk away if Twitter’s securities filings are wrong ... but only if the inaccuracy would have a “Material Adverse Effect” on the company. That is an incredibly high standard: Delaware courts have almost never found an MAE.

> Musk ... had the opportunity to do due diligence on these numbers before signing the deal. (He declined.) He can’t now go to Twitter and say “actually now you need to prove that your user numbers are right.”

[0]https://www.bloomberg.com/opinion/articles/2022-05-13/elon-m...


I think Musk declining due diligence is a pretty important point that needs to be repeated often the last few days...


What kind of idiot would attempt a purchase of such magnitude before doing his due diligence (obviously, no one, it's an excuse after the price collapsed)? For fs some people probe fruit for a minute until they decide it's worthy a purchase.

How many more of these types of "events" until people stop treating Elon like some super genius god? Stop giving/loaning him money enabling this garbage.


This seems intentional to me. Musk has been claiming there's a bot/troll problem on social media for quite some time. I think he knew this going in.


I read a hypothesis, a few weeks back, that Elon doesn't want to buy Twitter, but rather wants to mess with the price to thumb his nose at the SEC.

Perhaps all this uncertainty is intentional.


Then be shouldnt have agreed to the deal because there is very little he can do to get out of it. He cant simply walk away at this point.

It appears hes angling for a lower price with the threat of a court battle. He doesnt have much basis but it would take time, money, and probably wouldnt be great for Twitter.


Did you read the article, or even these comments on the article? Nothing supports what you say at all. Of course it's easier just to shout about the billionaire man.


Did you read the HN guidelines? If not then I would invite you to do so:

https://news.ycombinator.com/newsguidelines.html


Which guideline is the OP's comment breaking?


This one: Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that."


Spot on!


You don't to look under the covers until your offer is accepted.


Don’t talk about Twitter’s newest co-founder like that.


I think I've missed something here. Why would Elon not want to buy twitter if it has >5% spam accounts? Isn't the number of fake accounts one of the reasons he wanted to buy it in the first place? I don't understand why this would make him want to back out. It doesn't seem to be in conflict with any of the reasons he wants to buy.


You are reading his reasoning on face value. It is suggested that the reason is a ruse to back away from the original deal in that he either wants to renegotiate the price or walk away completely.


Given how all other tech stocks are fareing right now, he definitely wants to renegotiate. It will be interesting to see what the boards does. Right now Musk is the only thing holding up their stock price. And if they hold him to paying the billion, he's just going to carp from the sidelines. Maybe they should let him out of deal, but close his account?


The problem for the twitter board was that no one else was interested at the valuation. The problem for Musk is that if he walks away first he has to pay $1bn - which he doesn't want to do, and second that if he comes back the board can find someone else to buy it, or just say no to him on the basis that he's not acting in good faith.


He has to pay 1 billion to Twitter if he backs down from the deal. This could be a way to get out of paying that.


If all you want to know is what fraction of all twitter accounts are spam accounts, it should be really easy:

1. Select 1000 accounts uniformly at random. Either from among all twitter accounts, or from active twitter accounts for whatever definition of "active".

2. Classify these 1000 by hand. Do as much investigation into them as you need to classify them accurately; no need to use heuristics here.

You will (with very high probability) get an estimate accurate to within a percent or so. If you do statistics you could find the actual bounds.


How do you get 1000 acounts at random? Does twitter have an API for it?


The stream API can sample, but then you see currently active accounts only.

Users are denoted by numerical ID, you can sample using this.


Remember when Twitter first started, people thought Twitter usernames were going to be like domain names (except free)? LOL. I must have 50 Twitter accounts personally. Any time I had an idea, I used to grab a Twitter account for it.

Anyway I think it’s pretty goofy to try to make claims around %s of Twitter accounts, active or otherwise, only absolute numbers make sense. What really matters at this point besides revenue and revenue trend?


Cool effort! Does seem like a lot of thinking went into this. But, a few points:

"Through trial and error (and, of course, pattern-fitting) we crafted a scoring system that could correctly identify over 65% of the spam accounts."

65% is not actually very accurate for a binary classifier...

"Applying this model to the ~44K random, recently-active accounts provided Followerwonk produces a quality score for each account, visualized below:"

Many real twitter uses are likely not to be "active" aside from reading stuff. So this methodology would clearly overestimate the number of spam/fake accounts (which all would be active).

Also, this is an important point:

"The other potential critique is our spam/fake follower calculation methodology. Because we crafted it in 2018, based off sample sets of purchased spam accounts, it’s likely that more sophisticated spammers and fake accounts go unidentified by our system"

The features collected are certainly outdated by now.


I'm sorry to be one of those people, but the low-contrast grey color used for the text is really annoying and makes me feel like I'm straining my eyes to read it! Why the fuck do designers encourage this? It is more readable as black, and I have a vague non-medically-informed sense that it might be better for people.


I am guessing that it makes sense that very high profile people will have a higher percentage of fake followers than an ordinary, small account would because many of these fake accounts are bots that are set up in order to get attention for whatever they're selling or promoting by interacting with high profile accounts.


Well, there are different bots for different purposes.

A bot operated to post advert messages might want to post them on high profile accounts.

A bot for a "500 followers for $5" company might be more likely to follow low-profile accounts, with some random activity/follows thrown in as camouflage.

A bot operated to amplify certain messages/opinions might follow accounts of middling visibility, and focus on likes rather than more visible activity.


Good points


So to get this straight - SparkToro have decided to burn their reputation down chasing a spurious irrelevant claim from someone else's merger. It'll never cease to amaze me how many people will take obvious trolling and treat is as reasonable. This is like conducting a serious analysis into whether the US election was stolen.

I hope they like law suits:

> Our analysis found that 19.42%, nearly four times Twitter’s Q4 2021 estimate, fit a conservative definition of fake or spam accounts

Ok great, I hope you have fun proving that in court. Especially the part where you have to prove that Twitter's definitions which you don't know match yours.

>SparkToro is a tiny team of just three

Then I applaud your bold decision to interfere with a $45Bn merger.

>Our definition (which may differ from Twitter’s own

Any lawyers in the house? How obviously do you have to renege on your libelous claims before you're in the clear?


Main thing missing in this thread is that Bots can do things that doesn't involve actively tweeting. Things like liking, following, etc. Even clicking a hashtag has some effect on the Twitter algorithm. Furthermore, the most important metric for Twitter advertisers is Impressions. You primarily pay with impressions on Twitter as an advertiser. Does Twitter show ads to bots/fake/spam accounts if they match the #hashtag criteria or target audience to pump up the impression numbers?

Are these are the fake accounts that are lurking around without posting a tweet, but impacting the mDAU?

"Active" definition needs to be more precise than simply tweeting.

If I were an investor or an advertiser, I would drill into these details.


So ~20% of Twitter accounts are fake; however, the last time I tried to create a Twitter account for legitimate purposes (asking some customer questions) I couldn't, just because I didn't want Twitter to have my phone number.


I don't have a Twitter account exactly for this reason.

No Twitter, you are not getting my phone number.


Fake is kind of a weird way to describe valid combinations of usernames and passwords that can log into Twitter.com. I would assume that if your account hasn't been disabled for whatever reason, it's a real Twitter account.

Twitter, however, does call these accounts "fake" and has further rules/policies concerning Misleading & Deceptive Identities.

https://help.twitter.com/en/rules-and-policies/twitter-imper...


It's funny since I've had folks claim I must be a bot because I follow more folks than I have following me. It's just weird how these folks create metrics without much of a decent explanation. The fact of the matter is that bots are a problem that I think can only really be managed but not eliminated. The first thing is to make it botting not something that should be punished from the start but something that is used for common automated purposes just like how Twitter does it now with some bots being tagged as such for legitimate reasons but not blocked or shadowbanned.


I've been told that I'm a bot because I express views that deviate from the accepted consensus.


I think it's instructive to note the difference between 'fake' accounts and 'spam'/'bot' accounts.

'Fake' accounts exist to fraudulently increase follower count. This is what the study is claiming to measure. These typically have low activity and engagement profiles.

'Spam' or 'bot' accounts, on the other hand, generally have high activity. Whether trying to influence political opinions, or engaging in astroturfing or phishing activities. They probably have very, very high ratios of replies to original tweets, and overall tweets to # followers.


The methodology is not perfect but I think it raises a serious concern. I'm curious what other methodologies for identifying fake/spam accounts might be used. What about likes and retweets?

Perhaps Mr. Musks' team did an analysis of their own and saw a high number of potential fakes/bots, and that made them question the authenticity of Twitter's numbers.

If it can be shown with some accuracy that Twitter underreported the number of fake/spam accounts, how does this effect Musk's acquisition? Could he lower the price by saying you gave me incorrect data?


The biggest issue with any "study" like this is the definitions. What is a "bot"? What counts as "inactive"?

What about "bots" that aren't "bad bots"? For example, a news organization twitter account that automatically tweets out new articles. This is clearly a "bot", but is it a "fake account"?

Yes, most studies published specify what their own interpretation of these are, those definitions tend to be wildly inconsistent, and make any sort of comparison of the rest of the methodology impossible.


Elon already knew this of course. Buying Twitter was just an excuse to sell Tesla stock at a peak while avoiding suspicion. Now he’s teeing up his excuse to pull out. /conspiracy


This is what I was thinking too. People would panic if he sold that much stock. But if he's just taking out a loan to buy another company it's no big deal.


Or he actually wants to own Twitter and drives up its value by making Twitter known to every person on the internet. Bad news are good news.

It doesn't matter if 5% or 20% or even 40% of Twitter users are bots. If Musk manages to double the number of subscribers by making Twitter a household name then the purchase was a good investment.


/shrug Pump and dump is Elon's raison d'être. Idk why people keep falling for his bs.


Elon pledges Tesla stock as collateral, Tesla stock goes down. Announces deal 'on hold', Tesla stock goes up. Unless he can get the price of Twitter down to where he does not have to use Tesla as collateral, its just a bad purchase decision.


because he is rich - therefore smarter than everyone else.


Or negotiate a better price?


Occams razor is not a conspiracy


Billionaires making bold plays while backed by a cadre of top lawyers and accountants the kind of "conspiracy" that is totally expected.


I don't think this is the simplest explanation. The simplest explanation is this: Musk's primary source of financing is Tesla stock. The sudden market downturn took a big chunk out of Tesla stock, which made his financial situation more difficult.

The delay in closing the deal is just that, a delay, in the hope that the market will recover and the price of Tesla stock will recover, thereby making Musk's financing position much easier.

Nobody anticipated the downturn, otherwise Musk never would have made a $54 per share offer for Twitter in the first place, because obviously it's much lower now, as is much of the tech market.

It's hard to believe that making a binding $54 offer on a stock that's only $39 a month later was "4D chess". I do think $54 was a reasonable, maybe even lowball offer a month ago. Twitter itself hasn't changed much in a month operationally speaking, but the whole market changed in valuation.


I just find it hard to believe Musk really wants to own and run twitter, he seems to actively hate it and is trying to tank its value.


Note that these researchers define an “active” user as one who has posted a Tweet in the last two months. This is a very different definition than the standard “MAU” definition of “active” which means having used the site at all in the last period.

One definition isn’t better than the other but it is worth noting that opening Twitter, looking at a lot of Tweets, seeing a lot of ads, and not posting any Tweets is a behavioral pattern much more commonly seen in humans than in spambots.


One slightly problematic aspect of their methodology is that 'Casey ... [found] 8,555 to have an overlap of features highly correlated with fake/spam accounts'.

Which features were they? How reliable are those features? What method was used to overlap the features? What thresholds within that method were used to classify the overlap(s) as spam, or not spam?

Unfortunately this research, while compelling, fails the falsifiability test.


If you take their percentage of 'active' fake/spam accounts as 20% and then apply a poster to lurker ratio of 25% you end up with roughly the twitter spam percentage where active does not just mean regularly posting. I would guess that this analysis is correctly identifying most spam accounts but that the majority of users don't post every 9 weeks but are still regularly checking their feed.


I check everyone I follow (mostly artists) to see if they are posting art generally, and look at most of the people who follow me to see what they do before following them back. There are a few (legit) retweeting bots and I have ignored a bunch of people because they did not seem real. I remember a set of accounts followed me that had (1) no tweets (2) avatar pic of MCU movie actor and (3) hundreds of followers.


I see people are forgetting the only reason we care about this number is because someone was not confortable with what bots and non bots were saying. If they are bots or not is just a convenience because of the price. I dont have the money to care, yet I was well aware about it... Musk should sample his followers that would drive the price even lower.


Some people care because it goes to the heart of the advertising costs and value.


I know. And Ive been waiting ever since Facebooks IPO when are people going to relize the tech for target marketing is more effective than what it seems.

The reason why it seems not as effective (at lower price brakets) is because the entire industry is based on fake numbers. This fake degree of effectivity allow them to balloon the prices,

1st By alleging that the ad is "as effective as what you are willing to pay" 2nd. Their "massive" userbases.

In reality lower priced ads are not effective precisely because their userbase is not as massive.

as such what should be defined as normal tier is now the highest priced type of advertisement campaign.

The people hardly dig into this as there is no real incentive, billion dollar companies are not being sold all the time.

The % of bots was known to be between 15 and 20 before he made the deal.

If your plan is to buy a soda company to sell soda I would not believe a good strat is to reveal how cheap the product is, in reality, you are basically cornering yourself.

If he manages to buy it,the issue will immediately dissapear.

If he would genuinly made an effort to unmask the industry to help the consumer, I would be behind that.


I ran a small social network.

These are active accounts.

But active accounts usually are a small percentage of total accounts, especially when it comes to bot networks, because bot spammers usually create huge quantities of accounts in advance, so that as soon as one bot goes down, another spins up to take its place.

I'd be surprised if 80% of the total accounts on twitter aren't bots.


Bots contribute easily a third of the value of the Twitter service, they aren’t “fake”. This is a terrible analysis.


Twitter isn't claiming of all its accounts less than 5% are bots. It is claiming less than 5% of their mDAU are. Without exposure to their methodology we are just speculating.

It is entirely feasible that at any point in time 20% of accounts are bots. It is whether they have counted them in their mDAU that matters.


Not surprising.

On the flip-side, you do realise this is a way to justify complete verification of users on the platform to significantly reduce the bots, fake accounts, etc? I won't be surprised to see that implemented soon.

Probably by the end of this month, things are going to get very chaotic at Twitter.

We'll see what happens.


Why would a uniform sample of Twitter accounts be even remotely interesting? My Twitter experience consists of people I personally know. None of them are bots.

If you took a random sample of websites you'd find that 99.995% of them are spam. That's why we have to rank search results.


Well I have literally never posted from twitter in my 5 years or so of having an account, but I'm on there almost everyday and use it as my news feed for topics I want to read about and blogs I follow. However according to them I'm fake/spam. hrrmph.


Doesn't matter. People should actually read the Twitter filing. Could be closer to 50% and it wouldn't be enough for Musk to back out. Also, the word bot can have many definitions. A company using an app to schedule tweets could be considered a bot.


Who is this actually a surprise to??? And how is it unique to Twitter? Yawn This whole Musk “deal” was just another way for him to get attention and attack the company. Sad that our media ecosystem is so thoroughly debased that it worked so well.


Go through this comment section, you won't believe it

https://mobile.twitter.com/McDonalds/status/1521143128073424...


I want the inverse of this tool to audit it. I'll give it a specific account, and it identifies it as fake or not. Let me run some spot checks and accounts I know (and myself), and I'll find out if I can trust their numbers.


Says a user named "Followerwonk" ... to be fair, they do profer a method for reproducing their results. I am guessing that there are some bots they are missing ... ones that are close to passing the Turing Test.


Nice try Elon, afaik twitter did disclose that they might be undercounting spam at 5%

it s very hard to say anything about bots without IP addresses. The most active bots get sold / change subject etc.


Musk should have hired these people before he waived due diligence


I'm loathe to invoke "4D Chess" but I can't help but think this may have been a tactical decision to either lower the price of Twitter before the purchase, or to impugn its reputation without actually buying the company.


My speculation: it was to make sure the data was pure when he uses it to train the Tesla Bot AI.


To lower the price that he set after choosing to not run his due diligence? It’s possible, sure, but not likely.


Could be a tool for renegotiation, but then again I know very little about these kind of high stakes corporate acquisitions.


In a perfect social media world, there wouldn't be fake accounts. It seems to me by allowing fake accounts you introduce noise and confusion into the system.


That is a 0.03% sample size of the total 130+ million accounts.

I was just explaining the law of small numbers to my daughter for her science fair. I think this would qualify.


Maybe someday she'll explain to you that errors in sample estimates depend only on sample size and not population size.


I think the bigger error is still in drawing conclusions from a small sample size.


You're going to have decently sized error bars from a n=100 sample but it doesn't matter if it was 10% of a population of 1000 or 0.001% of a population of 10000000, the quality of the estimate is just as good.


Four significant digits? That's a lot of confidence.


I am also an avid twitter reader, but seldom post, I would assume valid counts of users would recognize users who at least access and refresh their feeds


I talk to bots a lot and you know what? If they are of good faith, and say interesting things that nobody else says, let them be. So in past eras women were excluded from speech and men would accuse each other of being a woman, women had to pretend to be men to get respect. Still do on forums like this.

At some point bots will have enough humanity and intelligence breathed into them to deserve some dignity. Like books, books deserve respect, you can't just burn them or step on the pages, Bibles and Torahs in particular. That was a huge thing in the Holocaust, protecting many Torah from destruction, in one case I read Jews protected them by burying them.


Is there any kind of baseline for what percentage of a system can be fake/spam to still be considered a "healthy" system?


Nearly 85% of all emails are spam.

I would pay $44bn to own email.


It's kinda sad that back in the day before the API was turned off, bits were some of the best twitter content.

Bring back the bots, they were great


Seems like both parties will see each other in court.

Which is home field advantage for Musk really, considering what he got away with in the past.


I thought it'd be higher than that tbf


Does this take into account deleted or banned accounts? Maybe that's not significant, but maybe it is.


That .42 has got to be a small wink at Elon… if not from the team that did this, then from the simulation.


The question would be ... how much will Elon Musk pay to buy Twitter after pointing out that their business is essentially a bunch of bots, etc.?

Kind of "hey guys, look, I really think your platform costs around 52$ per share, but ... let's try to see the actual price. How many bots do you have again? :)" Whenever this guy speaks, a whole freaking market moves.

It's like "cleaning up the mess" even before joining the company as their CEO. In the worst case scenario, he dodged a huge bullet. So, who cares.


> "By contrast, most “spam” accounts are an unwanted nuisance. Their activities range from peddling propaganda and disinformation to those attempting to sell products, induce website clicks, push phishing attempts or malware, manipulate stocks or cryptocurrencies, and (perhaps worst) harass or intimidate users of the platform."

My perception of Twitter is that some of the worst behavior on Twitter is coming from the 'well-respected authentic accounts of government and media personalities', particularly when it comes to disinformation, marketing, and indeed, intimidation of those with non-conformant opinions on various topics, such as, for example, the wisdom of unconditionally flooding Ukraine with high-tech ordnance in the name of repelling the Russian invasion (or is that really more about delivering a cash cow to US weapons manufacturers)? That's the kind of thing that could lead to doxing by corporate media reporters and charges of being 'a Russian asset' even if such concerns are reasonable.

Also, how many of the 'registered authentic accounts' include tweets not written by the actual personality they claim to represent, but rather by some kind of in-house PR/image management consultants? Just because there's a paid human somewhere in the chain doesn't mean the account is all that 'authentic'.


The real question is: does it matter?


The operative word here is “active” for the lawyers to get away with false stats.


"I used to not be into bitcoin, until I met the right trader!" ...


This estimate seems very low


This is /exactly/ why Mr. Musk has required the board to prove their math.

The valuation the board has reached, for Twitter shares, is predicated upon human users on their system and not spam accounts.

Twitter's valuation of its stock is way-overblown if there are 20% bot accounts on their platform.

Go Elon!


Holy revisionist history. The “valuation” is $54.20 per share because Elon is a child and thought he was being clever when he made that offer. Twitter simply accepted it.

Is it way above fair market value (especially in the time between his offer and today)? Absolutely.


Elon hasn’t required the board to do anything. He signed an agreement to purchase the company with no due diligence or backout clauses. If he tries to get out now he’ll see Twitter in court.


If they lied on SEC filings about this, he bought something not as advertised, so I'm not so sure...


Have you even read the SEC filings cause if you had you probably wouldn't be parroting that?


I thought the SEC filings both have a number and a risk of being wrong?

Re:parroting, while that seems to have been a pejorative, I wasn't actually copying from anywhere, just a smb executive sensitive to this sort of thing during m&a as standard business discussions, so it's interesting that it's being repeated enough to accuse people here of 'parrot'ing!


There is no hard number. It mentions that they are just an estimate and could be off. They are labeled as MAUs, so that gives a lot of leniency depending on how "active" they are. It doesn't define bots either.

The SEC filings thing has been repeated even by Musk himself. It doesn't really help him.


>The valuation the board has reached, for Twitter shares

Absolutely! When the board demanded that Musk pay $54.20 per share – which is a completely rational number that they demanded from Musk and not a meme number that he made up himself – they underestimated his ability to do Math, and he's about to show them what's what.

One of the core duties of a Board of Directors for a publicly traded company is to arrive at a valuation of the stock each morning before the markets open. It's a shame that this board is either incompetent or corrupt and came up with the number $54.20 based on bot accounts. Musk is having his people do a complete audit of 100 random followers of the @Twitter account to find the true number. I expect the terms of this deal to change significantly in Musk's favor.

It's too bad that this distracts him from shipping the Cyber Truck, going to Mars, and then building a 1.6km tunnel under Las Vegas (a feat which has never before been achieved).

Praying for a quick resolution so we can finally get some Free Speech on the Twitter platform.


I've had a Twitter account for 7 years, and have not once Tweeted.


So politicians and rich are announcing new laws and their stuff mostly to bots while a casual visitor sees a login wall. Nice. Throw it to a volcano with IG, FB and remaining American "internet" 1/8


Elon nuking the "ad industry" would be amazing.


How does Elon benefit by NOT acquiring Twitter?


Why are we always "surprised" by this?


Only 19.42%? That's gotta be way low


"Very large accounts tend to have more fake/spam followers than others..."

If the figures for Elon Musk and Donald Trump's followers are at all accurate, it's over 50% for large accounts. Which means that the people who probably care most about whether or not their followers are real (e.g. those who might be charged for the privilege of using Twitter to communicate with large numbers easily), are the ones most likely to have mostly bots.

I could see why this might impact the value of Twitter as a company.


Honestly I would guess higher than 20%


It seems the stated number is inflated by the hand of Elite's to increase the probability of the 'heard' moving back to Twitter.. over-looking Truth Social.. and crashing the one best chance of global free-speech rights surviving in an ever growing communist bound society.


If you were opposed to Twitter being sold to Elon Musk, then it seems like you would have a great incentive to create as many spam or fake accounts as possible over the next 30 days or so to shift the numbers in favor of him abandoning his takeover bid.


Sounds low.


It sounds like Twitter could easily shed hundreds of millions of fake followers by banning Elon Musk any account that follows Elon Musk!


and so it begins


not a surprise at all.


20%? That's it?


Assuming that's true, would that mean that Elon has a way out that does not involve paying 1B$ ?


Analysis here: https://www.youtube.com/watch?v=_SfXoCj4TLk TL;DR is that it may or may not be enough to justify it in court if Twitter sues, but it probably is enough to make a court case necessary, and Twitter's board may be reluctant to spend years in court trying to win their case, since inevitably the court case would involve the very public airing of facts like this that would not paint Twitter in a good light.


And would block any possible future deal, with Elon or anybody else until that court case was resolved.


Twitter wasn't looking to sell itself. This was an unsolicited takeover bid.


That's true, and the Twitter shareholders should have realized who they are dealing with. But that does not mean that tying themselves up a court case for several years is smart because it appears that in fact they are willing to sell.


Well, the threat was there to bypass the board of directors, controlled by big long-term investors, and go directly to the shareholders, many of whom would be happy to cash out for short-term gain. So the board was under major pressure to agree, even if they were somewhat reluctant.


Aside: Patrick Boyle's humor is always spot on. Love that channel.


Absolutely agreed.


AFAIK he doesn't even have a way out of paying the full buying price. The breakup fee does not allow him to abritarly invoke it. It is only conditional on finance, of which the banks have already given binding commitments. Matt Levine has a great article on it [1].

[1] - https://www.bloomberg.com/opinion/articles/2022-05-13/elon-m...


No


My personal impression from using Twitter for years is that the actual number may be twice as high, more like 40% than 20%. The researchers repeatedly say that they were very conservative in their estimates and only counted accounts that were extremely likely to be spam. "This methodology likely undercounts spam and fake accounts, but almost never includes false positives (i.e. claiming an account is fake when it isn’t)."

PROTIP: You can search Twitter by URL. Just search for a popular news article on Twitter, and you can see the massive number of obviously non-human accounts who are tweeting the article.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: