Alberta’s privacy commissioner says OpenAI failed to meet provincial consent requirements when it developed some ChatGPT models using personal information scraped from publicly accessible websites.
The finding comes from a joint investigation by privacy authorities in Alberta, British Columbia, Quebec and Ottawa into whether OpenAI’s handling of personal information through ChatGPT complied with federal and provincial private-sector privacy laws in Canada.
The investigation focused on ChatGPT 3.5 and ChatGPT 4, which powered the chatbot when the investigation began in 2023.
Alberta’s Office of the Information and Privacy Commissioner said both Alberta and B.C. privacy regulators found OpenAI’s models were based on data scraped from publicly accessible websites, for which OpenAI “has not obtained, and cannot obtain, consent” under Alberta and B.C. private-sector privacy laws.
Scraped data and consent
The joint overview says more than 99 per cent of the information used by OpenAI to pre-train the models came from crawling, or scraping, publicly accessible sources, with the rest coming from third-party licensed datasets.
Federal, Alberta and B.C. privacy laws contain consent exceptions for some publicly available information, but the joint overview says the legal definition of “publicly available” is different from a common understanding of information that can be accessed online.
OpenAI took the position that it could rely on implied consent to collect and use information from those sources to train its models.
The federal, Alberta and B.C. offices rejected that position, finding OpenAI failed to obtain valid consent, implied or otherwise, for collecting and using personal information from publicly accessible sources for model-training purposes.
“From the Alberta perspective, I want to note first that it is unfortunate and disappointing that technology companies have moved ahead so quickly with new developments and innovations, without first ensuring that they are adhering to privacy legislation,” said Alberta Information and Privacy Commissioner Diane McLeod.
“Our investigation found that that OpenAI did not appear to turn its mind adequately to privacy compliance in its development and deployment of ChatGPT, which is very troubling. The first ChatGPT model was launched in 2022, nearly two decades after privacy law in Canada applied to the private sector.”
Investigation began in 2023
The joint investigation was triggered by an April 2023 complaint alleging collection, use and disclosure of personal information without consent.
It was announced the following month and examined whether OpenAI’s handling of personal information in Canada through ChatGPT complied with federal and provincial private-sector privacy laws.
The joint overview says sources of training data included social media and discussion forums, which can contain personal information about children, sensitive information such as political views or health conditions, and potentially inaccurate information such as opinions or false statements about others.
The joint overview says the privacy offices found OpenAI’s initial collection of personal information from publicly accessible websites and licensed third-party sources to train the models was overbroad and inappropriate.
The joint overview says the offices also found OpenAI’s mitigation measures at the time were not sufficient to limit the collection, use and disclosure of personal information to what was necessary and proportional for model training.
Alberta law
McLeod said the conclusions reached by each privacy office varied because each regulator investigated compliance with its own legislation.
“An important aspect of this investigation is that each of the four regulators investigated compliance with the specific legislation that they oversee,” McLeod said.
“As a result, the conclusions reached by each office varied due to the differences in the laws that they enforce. In the case of our office, we were investigating whether the development of ChatGPT was compliant with Alberta’s Personal Information Protection Act or PIPA, which governs private organizations such as corporations, unincorporated associations, professional regulatory organizations, trade unions and partnerships.”
OpenAI response
The joint overview says OpenAI generally disagreed with the findings, asserting it was compliant with the laws in most respects through a combination of existing practices and communications.
It says OpenAI nonetheless engaged with the privacy offices and said it had implemented or committed to measures including new filtering tools, formal retention policies, data-export improvements, additional privacy notices and more plain-language information about the sources used to train its models.
The Alberta and B.C. offices said the measures are not sufficient to meet the foundational consent requirement under their laws.
Despite that finding, Alberta and B.C. joined the federal and Quebec privacy authorities in making joint recommendations and monitoring implementation of OpenAI’s commitments.
“The privacy laws we currently have in Canada were drafted and enacted during a time when today’s incredible advancements in technologies, such as AI, would have strained believability,” McLeod said.
“Legislators now face the challenge of modernizing privacy laws in ways that will enable AI companies to continue to develop these innovative technologies, but only in a manner that safeguards privacy, reduces potential harms to citizens, and requires accountability and transparency. My hope is that OpenAI has learned from this investigation and that other technology companies that are developing and deploying AI or other novel technologies also learn from this report that privacy must be a top priority and cannot be an afterthought.”