Estimated read time: 7-8 minutes
This archived news story is available only for your personal, non-commercial use. Information in the story may be outdated or superseded by additional information. Reading or replaying the story in its archived form does not constitute a republication of the story.
SAN FRANCISCO — Spam and fake accounts are more prevalent on Twitter than on comparable social platforms, according to a data analysis firm hired by Elon Musk as part of his legal battle with Twitter.
The claim by Cyabra — in the company's first public interview since conducting a study commissioned by Musk that found spam and bot accounts make up an estimated 11% of Twitter's total user base — highlights the headache that may await Musk now that he appears ready to complete his $44 billion acquisition of the social media platform.
Cyabra CEO Dan Brahmy told CNN last week that in addition to studying Twitter, his company has done similar assessments of the company's rivals — although he declined to name any specific platforms — and that the fake-account problem appears to be comparatively worse for Twitter than its peers. Using a machine-learning algorithm that analyzes "hundreds" of parameters, he said, Cyabra provides estimates of whether certain online accounts and content are authentic.
"We have a number for all the social media platforms, because that's our job," he said, adding that Cyabra's estimate for Twitter is "definitely not in the low end" relative to its competitors, although the company had more access to data about Twitter when it conducted its analysis for Musk.
Bots on Twitter have been central to the dispute over Musk's initial attempt to get out of the acquisition deal. Less than three months after signing the deal, and waiving due diligence in the process, Musk moved to terminate the agreement, citing claims that Twitter had misstated the number of bots on its platform, despite having previously said that he wanted to buy the company to address its bot problem.
'Spam harms the experience'
Twitter has for years said that bots make up less than 5% of its monetizable daily active users. In a series of tweets in May, Twitter CEO Parag Agrawal acknowledged that "spam harms the experience for real people on Twitter," and added that, "as such, we are strongly incentivized to detect and remove as much spam as we possibly can, every single day."
Twitter sued Musk to complete the deal, accusing him of using bots as a pretext to exit the deal after getting buyer's remorse, and the deal was set to go to trial later this month. But last week, Musk told Twitter he is now ready to move ahead with the deal at the original price, and the judge overseeing the case gave the two sides until Oct. 28 to close the deal or face trial.
The dispute over the true number of fake accounts on Twitter — and how to accurately calculate it — has also shone a spotlight on the small army of data scientists, artificial intelligence experts and misinformation researchers both within and outside tech companies who spend their days hunting for patterns or anomalies that can uncover whether a given account, or a network of accounts, represents a living, breathing human being or, potentially, automated accounts or bad actors aiming to spread covert influence, false information or spam.
With Musk and Twitter now expected to close the acquisition deal, the public may never get a close look at the evidence each side gathered on the issue or a court ruling on Musk's claims that bots are more pervasive on the platform than it has let on. But Cyabra is coming forward now, Brahmy said, after the company's work for Musk's team was revealed in court filings and hearings in the acquisition dispute. The Israel-based company has roughly 40 employees and counts the U.S. State Department and various consumer brands as customers of its digital authenticity measurement tools, he said. Among its investors is the Peter Thiel-backed Founders Fund.
The bot challenge
Many practitioners in the field caution that estimating spam or fake accounts can be an extremely subjective exercise, and that anyone claiming to have a definite number likely doesn't grasp the complexity of the issue. Even the creator of Botometer, another service Musk used to estimate of the number of bots on Twitter, has emphasized the challenges behind defining the term "bot" and cautioned that context, intent and even the way an account is managed can complicate matters. There are also good bots designed to share information or entertaining content that are allowed on Twitter and many of which transparently self-identify as automated on the platform.
In the interview last week prior to the revived deal talks, Brahmy acknowledged there are clear limits to his company's use of machine learning algorithms to determine whether accounts may be authentic, which is why his firm's report about Twitter expresses a maximum confidence level of about 80%. The company's algorithms would likely flag Brahmy's own grandmother as a suspicious account holder, he said, given her propensity to whiplash randomly from topic to topic and to post at unusual hours with numerous grammatical mistakes.
Twitter declined to comment for this story.
Twitter lawyers in a hearing earlier this month said that neither Cyabra's analysis, nor an analysis Musk commissioned by another firm that estimated with 90% confidence that bots make up 5.3% of the platform's user base, support Musk's claim in his deal termination letter that the amount of spam is "wildly higher than the Twitter estimate" of less than 5% of its monetizable daily active users.
Cyabra's analysis
Cyabra first made waves in May, after Musk's deal was first announced. Using publicly available data, the company claimed at the time that spam and bot accounts represented 13.7% of accounts on Twitter. Soon thereafter, the company was hired by Musk's team to perform a second analysis, this time based on the "firehose" of data that Twitter had provided directly to the Tesla CEO, according to Brahmy.
That second analysis, which has since been referenced in court filings, concluded with roughly 80% confidence that spam and bot accounts represent 11% of Twitter's total user base. The finding offers a look at the hurdle Musk may face if he takes over Twitter after previously saying he wanted to "defeat the spam bots or die trying."
"The request was precise: Spam and bot accounts, tell us the number, tell us the methodologies and tell us the confidence level," Brahmy said. Brahmy said Cyabra did not work with the other data analysis firms used by Musk's team, and he declined to discuss whether or how much Cyabra was paid to perform the commissioned analysis.
The finding can't be compared directly to Twitter's own estimate that fake and spam accounts comprise less than 5% of monetizable daily active users — Brahmy said Cyabra was unable to make an estimate of the prevalence of bots as a percentage of the active users rather than total Twitter users. He declined to elaborate on why, but it may be due to the fact that Twitter's firehose of data used to conduct the analysis does not show which accounts are or aren't considered daily active users and only shows those accounts that actively tweet rather than those who, for example, only use the platform to read tweets.
For its part, Twitter's disclosures about spam and fake accounts note that its calculation relies on "significant judgment" and the company has not made claims about how the prevalence of such accounts on its platform compares to competitors.
Factors built into Cyabra's algorithm used to measure how likely an account is to be fake or spam include the members in that account's social network and where they're located; what the account talks about, how frequently and at what hours; whether the account uses one or multiple languages and how fluently it uses those languages; and the type of engagement the account's content tends to generate, among many others.