Privacy commissioners say OpenAI broke the rules by scraping Canadian data to develop and train ChatGPT.
Commissioners from four of Canada’s privacy watchdogs have found that OpenAI violated Canadian privacy laws while developing and training its early models of ChatGPT.
OpenAI gathered “vast amounts of personal information,” potentially including details like health conditions, political views, or information about children.
At a news conference on Wednesday, Philippe Dufresne, Canada’s privacy commissioner, was joined by his provincial counterparts from British Columbia, Alberta, and Québec to announce the findings of a joint investigation into the tech giant. The investigation examined how OpenAI sourced training data for its early, GPT-3.5 and GPT-4 models, which included scraped content from publicly accessible internet sources like social media and blog posts, licensed third party sources like media outlets and stock image vendors, and user interactions with ChatGPT.
Leveraging “extensive written representations” from OpenAI’s legal counsel, interviews with OpenAI employees, internal testing on ChatGPT by the Office of the Privacy Commissioner (OPC), and publicly accessible sources like studies published by OpenAI and other AI experts, regulators focused on whether OpenAI had followed federal and provincial privacy legislation principles like consent, transparency, and data accuracy when collecting data.
Launched in 2023 on the heels of a complaint alleging OpenAI had collected, used, and disclosed personal information without consent, the investigation came well before OpenAI came under scrutiny in Canada following a deadly mass shooting in Tumbler Ridge, BC. Families of the victims of that shooting are taking OpenAI to court; the company had banned the shooter’s account for “disturbing content,” yet did not tip off law enforcement about any potential dangers.
Following the Tumbler Ridge shooting, Canada’s innovation minister, Evan Solomon, spoke with OpenAI CEO Sam Altman, saying the tech mogul expressed “horror and responsibility” regarding the shooting. After their conversation, OpenAI agreed to strengthen its “law enforcement referral criteria” and include Canadian mental health and law experts in its safety office—where the company assesses threats and whether or not to inform police.
No consent to use personal data
At Wednesday’s press conference, Dufresne noted that all four regulators found OpenAI had violated various federal and provincial privacy laws, including the federal Personal Information Protection and Electronic Documents Act (PIPEDA), and its provincial counterparts in Alberta, BC, and Québec.
PIPEDA regulates how businesses collect, use, or disclose personal information during commercial activity. It operates on several “fair information principles” that include obtaining consent for data collection, among other stipulations. Parallel provincial legislation, like Alberta and BC’s Personal Information Privacy Acts (PIPA) and Quebec’s Law 25, mandate similar requirements.
Among their key findings, regulators concluded that OpenAI gathered “vast amounts of personal information” for use in training data. That data could potentially include sensitive information and details like health conditions, political views, or information about children.
It also found the tech company did not obtain valid consent for the collection of personal information—a key plank under PIPEDA and other Canadian privacy legislation—and that there was not adequate transparency, with many users unaware their data was collected and used to train OpenAI’s chatbot.
RELATED: Evan Solomon will meet Sam Altman as OpenAI faces pressure over Tumbler Ridge response
“Our investigation determined that the manner in which OpenAI initially collected personal information from publicly accessible websites and licensed third-party sources to train the models was overbroad and therefore inappropriate,” an overview of the investigation says. “We came to this determination considering the scale, nature, and varying levels of sensitivity of the personal information collected and used from those sources.”
The privacy watchdogs also found that OpenAI had not provided individuals with “an easily accessible and effective mechanism to access, correct, and delete their personal information,” and that it released ChatGPT without having fully addressed known privacy risks and without data-deletion rules.
A full accounting of the report and its findings can be found here.
OpenAI commits to changes
Dufresne said that throughout the investigation, OpenAI engaged in good faith and took measures to address the regulators’ concerns. As a result, the federal privacy office considers the investigation to be “conditionally resolved.” Québec’s Commission d’acces a l’information du Québec has labelled the investigation as conditionally resolved on several points, but unresolved on the issue of consent. British Columbia and Alberta’s findings label the investigation as unresolved under provincial PIPA requirements. Both provincial regulators noted OpenAI’s efforts to improve compliance.
OpenAI has committed to several measures to address the regulator’s concerns, including implementing a filtering tool to detect and mask personal information like names and phone numbers in publicly accessible datasets, facilitating corrections, enhancing correction and deletion protocols, and implementing a formal retention policy governing personal information.
The company has also committed to several time-sensitive conditions, linked to the publication of the watchdogs’ report. They include:
Within three months, adding a notice to the signed-out web version of ChatGPT that tells users their chats may be reviewed and used to train models, and advising them not to share sensitive information.
Within six months, making it easier to understand and use the data exports that it provides to users who request their personal information. The company will also better explain the avenues available to users who want to challenge the completeness, accuracy, or nature of the information provided.
Within six months, confirming to the privacy commissioners’ offices that it has implemented strong protection for future datasets that are retired and used only as historical references, so they are not used for active model development, and regularly review whether these datasets should still be kept.
Within six months, testing protective measures for the minor family members of public figures, who are themselves not public figures, to ensure that the models refuse requests for their name or date of birth.
The company will also provide quarterly reports to the Office of the Privacy Commissioner and provincial partners until these commitments have been met.
It is unclear at this time what efforts need to be undertaken by the tech company to resolve Alberta and British Columbia’s complaints.
BetaKit reached out to OpenAI for comment on the report’s findings, but it did not respond to our request by press time.
Canada’s privacy laws must change
While much of the announcement focused on OpenAI, regulators also stressed that significant changes are needed to Canadian privacy laws that recognize the realities of a rapidly changing technological landscape.
Canada’s privacy legislation hasn’t been meaningfully updated in more than 40 years; Ottawa announced this spring that it has launched a review of the Privacy Act with the intent of modernizing it. Canadians are also awaiting the launch of the country’s AI strategy, which was initially slated for late 2025.
“This investigation also further reinforces the need to modernize Canada’s privacy laws for the digital age,” Dufresne said. “While current laws apply to AI, updated laws would help further support the safe deployment of new technologies to protect Canadians’ fundamental right to privacy.”
“The methods companies are using … could never be carried out in ways that would meet the consent requirements of [Alberta’s] PIPA.”
Diane McLeod,
Alberta privacy commissioner
Specifically, commissioners cited the challenges that AI, and the internet broadly, pose in meeting consent requirements as currently legislated. Michael Harvey, the BC privacy commissioner, said he has written to BC’s minister of citizen services to encourage modernization of its legislation.
“We’re left at an impasse: on one hand, AI applications have potentially transformative benefits, but in certain cases, such as the one before us, applications are developed without adequate privacy,” he said. “On the other hand, those privacy laws were written for a different era and are strained to the brink. Both companies and the law have to change.”
Alberta commissioner Diane McLeod echoed those sentiments, saying that legislation needed to confront the realities of the digital age. “
“The methods companies are using—scraping data from publicly accessible websites—could never be carried out in ways that would meet the consent requirements of [Alberta’s] PIPA,” she said. “My office has advocated for some time that changes be made to PIPA to allow for tech and innovation but still provide privacy safeguards.
“Consent-based protections, for example, may no longer be feasible in an age where technology companies have easy access to so much information about individuals on the internet. Other options must be found,” she added. In a statement issued Wednesday afternoon, Solomon mirrored the regulators’ comments, saying the report’s findings underscored “the importance of protecting Canadians’ personal information in the age of AI.” He added that modernizing Canada’s privacy laws “remains a priority” for the federal government.
BetaKit’s Prairies reporting is funded in part by YEGAF, a not-for-profit dedicated to amplifying business stories in Alberta.
Feature image courtesy TechCrunch. Licensed under Creative Commons Attribution 2.0 Generic (CC BY 2.0).