August 30, 2023

Global privacy regulators say data protection laws apply to data scraping


Remember when ChatGPT hit the scene last year, and it felt like AI was the only issue anyone cared about suddenly? Everyone and their mother were posting about it. You couldn’t throw a quarter without hitting a breakout session on AI at the IAPP’s most recent Global Privacy Summit. The problem: The U.S. (and everyone else) lacks a law on regulating AI. As usual, the tech outpaces the rules.

Even without a law regulating AI, specifically, privacy regulators say data protection laws apply to some of the practices employed in AI systems.

The Global Privacy Assembly recently issued a statement warning social media platforms to protect users’ public posts from scraping, as TechCrunch reports. The group reminded companies that, in most jurisdictions, personal information publicly available on the internet “is subject to data protection and privacy laws,” adding that personal websites hosting such information have “data protection obligations with respect to third-party scraping from their sites,” whether that information is scraped or not. In fact, simply hosting that information “can constitute a reportable data breach in many jurisdictions,” the GPA said.

Signatories include the U.K. ICO, Canada’s OPC, Hong Kong’s OPCPD, Switzerland, Norway, New Zealand, Argentina, and Mexico, among others.

Here’s why we’re talking about this: AI systems are trained on vast amounts of data. To get the bulk they need, these systems scrape publicly available data from across the internet. Obviously, depending on the site, some of the data hoovered up is going to contain personal information. Think about an AI model training on data from LinkedIn, for example.

Not only does scraping personal data pose a risk in itself, but there are also financial gains to be made by selling that data to a third party for various purposes; in some cases, it could be used for targeted advertising, or even more nefarious or malicious endeavors, like profiling or surveilling people, or for political intelligence gathering.

The GPA said that while there’s no single safeguard that will protect users from scraping, there are a combination of steps companies can take to mitigate risk.

They are:

  • Designating someone within your organization to implement controls against scraping.
  • Monitoring how “quickly and aggressively” a new account starts looking for others users.
  • Implementing steps to detect scrapers by identifying patterns in “bot” activity. For example, if an IP address is accessing your platform from multiple locations within a short timespan, that should indicate red flags.
  • Taking legal action when data scraping is suspected or confirmed.
  • Notifying users and privacy regulators in jurisdictions where data scraping constitutes a breach.

Even before ChatGPT hit the scene, you may recall a high-profile example of data scraping enforcement in hiQ vs. LinkedIn. It took six years to settle the case, but here’s what happened: hiQ, which is now defunct, used to depend on public LinkedIn profiles. LinkedIn found out hiQ was scraping its data, sent a cease-and-disist, and made technical changes to its site preventing hiQ from continuing to scrape. hiQ filed a suit against LinkedIn saying the changes amounted to unfair competition.

In the end, LinkedIn pulled out a win when courts decided it could nail hiQ for breach of contract based on LinkedIn’s user agreement, as well as a violation of the Computer Fraud and Abuse Act, among other charges. The two companies agreed on a settlement.

The point is: Companies are fighting these battles in the wild already. And ChatGPT and similar automations only exacerbate the problem. Based on the Global Privacy Assembly’s warnings, it wouldn’t surprise many to see enforcement actions based on jurisdictional privacy laws hitting newspapers near you in the nearterm.

For more, check out the GPA’s statement here.