X is the latest social media site letting 3rd parties use your data to train AI models | CBC News

Elon Musk’s X was already using your data to train its own artificial intelligence. Soon, it’ll let other companies do the same.

Starting Nov. 15, the social media site formerly known as Twitter will share user data — including posts, likes, bookmarks and reposts — with third-party platforms that may use the information to train AI models.

The company updated its privacy policy on Wednesday to detail the changes. When the policy takes effect, users are automatically opted in until they opt out.

“If you do not opt out, in some instances the recipients of the information may use it for their own independent purposes in addition to those stated in X’s Privacy Policy, including, for example, to train their artificial intelligence models, whether generative or otherwise.”

This is the latest arms race. Everyone is working towards AI supremacy.– Ritesh Kotak, cybersecurity expert

As user data becomes an increasingly valuable resource, social media platforms are sitting on a goldmine —and selling that information to artificial intelligence companies is a lucrative business.

“This is the latest arms race. Everyone is working towards AI supremacy,” said Ritesh Kotak, a cybersecurity and technology analyst based in Toronto.

“The more data sets you have, the more people that are involved in that data is collected from, the more accurate your model is going to be.”

Why sites such as Reddit are selling data to AI firms

The Reddit logo is seen in this illustration taken on Nov. 7, 2022. Like X, other social platforms have reportedly signed content licensing deals with AI giants, bringing in a new stream of revenue amid tough competition for advertising dollars (Dado Ruvic/Illustration/Reuters)

The change comes just a few months after X quietly shifted its privacy policy, giving itself permission to train the company’s Grok chatbot on user data.

But that led to an investigation by the European Union’s privacy regulator, which ended with X agreeing to stop collecting user data from that region for the purpose of training Grok.

LinkedIn has also given itself permission to train its artificial intelligence models on user data, and Meta used public Instagram and Facebook posts to train its own AI virtual assistant.

Like X, other social platforms have reportedly signed content licensing deals with AI giants, bringing in a new stream of revenue amid tough competition for advertising dollars, noted Ajay Shrestha, a computer science professor at Vancouver Island University.

“The traditional processes that they have used [to] generate revenue, through advertising or through subscription methods, are not working well,” said Shrestha.

The deals include:

Reddit reportedly closed one such agreement with Google this year, with Reuters reporting that the deal is worth $60 million US per year.
Stack Overflow, an online community for developers, started charging AI companies for scraping its data to train their bots last year.
Tumblr and WordPress reportedly struck a deal with generative AI companies Midjourney and OpenAI to sell user data to train their AI tools.

Some news publishers and stock image companies have made similar deals — Shutterstock’s licensing business generated more than $100 million US last year, for example. Many others have sued AI giants for scraping their content without permission, or warned them against doing so.

WATCH | Why AI companies are hungry for Reddit’s data:

Why AI firms are eyeing Reddit’s data, according to investment expert

Shane Obata, a portfolio manager with Middlefield Group in Toronto, explains what an IPO will do for Reddit and why the company could be a goldmine for artificial intelligence firms.

And what’s in it for the big tech companies? Social media posts are a valuable form of data because they can convey emotion, reflecting how people actually speak and think, according to Kotak.

“Social media posts may pose very little quality content from a technical perspective or from what’s going on in the world, but [they are] rich in sentimental analysis,” he said.

Can you opt out?

As of Friday, X didn’t appear to have updated its settings with an option to opt-out of the change in advance of the Nov. 15 start date. CBC News has reached out to the company.

“As a user, you may just not want your posts or personal information being used to train algorithms that the rest of the world is going to be able to leverage,” said Kotak.

“These platforms literally making it by default that your data is going to be used to train these algorithms means that you no longer have a choice in the matter. Unless you go in and you prohibit that from happening.”

Normally, users can opt out of such changes by going into settings, privacy and safety, and under the data sharing and personalization heading, toggling the “data sharing with business partners” option.

But opt-outs aren’t always cut and dry, Kotak said, noting that an AI model can’t necessarily unlearn the data it’s been fed if a user opts out after the training has started.

“There’s no way of reversing that and having any of your any of the data that you’ve already put out essentially being taken out of the learning model as well,” he said.

“If you’re not paying for the product, you are the product. And in this case, the data is the product.”

Source link