Canadian Privacy Regulators Publish Findings and Guidance on OpenAI Privacy Compliance

Following a multi-year joint investigation, federal and provincial privacy regulators recently published their findings with respect to OpenAI’s collection and use of personal information to train the early models underlying its artificial intelligence (AI) chatbot, ChatGPT. The regulators identified contraventions of Canadian privacy legislation and made recommendations for the privacy-protective development and deployment of future generative AI technology in Canada.

The regulators’ investigation findings were followed very shortly after by publication of the Office of the Privacy Commissioner of Canada’s (OPC) Annual Report, “Championing Privacy in the Age of AI”, which highlighted the impacts of AI on the OPC’s increasingly active enforcement posture and the surge in individually filed complaints under the Personal Information Protection and Electronic Documents Act (PIPEDA). On the same date, the federal government announced its anticipated national AI strategy, “AI for All”. While the strategy was not accompanied by any immediate legislative or regulatory demands, it commits to modernized privacy and online safety laws and can be expected to relate to personal information handling, children’s privacy, and protection of vulnerable groups from online violence and algorithmic biases.

Within this context, the regulators’ findings and recommendations in the OpenAI investigation carry significant implications for organizations developing or deploying AI platforms in Canada and signal the considerable regulatory (and broader) interest in achieving a balance between technological innovation and protection of personal privacy. In particular, the regulators identified the following practices as non-compliant, meaning that similar conduct may attract regulatory scrutiny moving forward:

  • Collecting vast amounts of data through web scraping without adequate safeguards to remove personal information;
  • Failing to obtain valid consent from affected individuals and failing to provide users with accessible and comprehensible mechanisms to access, correct, and delete their personal information;
  • Being insufficiently transparent regarding how personal information is used and obtained; and
  • Providing insufficient notices about potential inaccuracies in AI-generated responses and how personal information may be collected.

Background

ChatGPT, which allows users to submit prompts and receive AI-generated responses using OpenAI’s large language models (LLMs), was launched on November 30, 2022. According to OpenAI, the data used to train its LLMs is collected from four primary sources for training purposes: (1) publicly accessible Internet sources, including via OpenAI’s own and third party web crawling bots; (2) information licensed from third parties, including media outlets and specialized knowledge providers; (3) user interactions with ChatGPT, including personal information the user may include; and (4) conversations generated by human AI trainers. OpenAI acknowledged that its training datasets contain personal information, though it characterized this as incidental to its broader data collection goals.

In May 2023, the OPC, together with the privacy regulators of British Columbia, Alberta and Québec (collectively, the “Regulators”), launched an investigation into OpenAI following a PIPEDA complaint that the company had collected, used, and disclosed personal information without consent.

The Regulators published their findings on May 6, 2026, examining how OpenAI sourced training data and developed its GPT-3.5 and GPT-4 models, and whether those practices complied with the federal PIPEDA and provincial privacy laws in B.C., Alberta, and Québec. The Regulators found they had jurisdiction to investigate OpenAI since PIPEDA applies outside of Canada where a “real and substantial connection” to Canada exists, which the Regulators found in this case given, among other things, OpenAI offers its services in Canada and collects personal information from users in Canada.

Key Findings

The Regulators concluded that OpenAI’s development and deployment of ChatGPT contravened federal and provincial privacy legislation for various reasons. The Regulators also noted instances where OpenAI had taken subsequent steps to mitigate harm to privacy interests. The discussion below focuses on the OPC’s findings with respect to PIPEDA, but in some cases the findings of the provincial regulators with respect to provincial statutes differed, and these should be specifically reviewed by affected organizations and individuals.

Overcollection: Although OpenAI’s purposes for developing ChatGPT (including providing benefits such as assistance with everyday tasks, conducting scientific research, and inspiring creativity) were appropriate collection purposes, the manner in which OpenAI collected personal information, as well as the scale and nature of that collection and use, were found to be overbroad and, thus, not necessary and proportional to these purposes. In particular, OpenAI gathered “vast amounts” of personal information from publicly scraped content, licensed datasets, and user interactions without adequate safeguards to prevent the use of such personal data to train its LLMs. It also did not screen out websites likely to contain sensitive information such as social media websites or websites aimed at children.

OpenAI Mitigation: The OPC noted OpenAI’s disclosure that it had recently developed, and is now using, a tool that can detect and mask identifying information about private individuals in publicly accessible Internet data and licensed datasets used to pre-train OpenAI’s models. The tool can also redact personal identifiers from users’ interactions with ChatGPT. The OPC accepted that this new tool can significantly reduce the risk that the personal information of private individuals, and sensitive information more specifically, will be included in the datasets used to train OpenAI’s future models, and will also reduce the risk of such information being disclosed in model outputs.

Lack of valid consent: The Regulators found that OpenAI did not obtain valid consent for its collection, use, and disclosure of personal information. After noting that consent is a “core requirement” of Canadian privacy laws, the Regulators clarified that OpenAI’s stated reliance on “implied consent” for scraping data from publicly available websites was inadequate, particularly given individuals’ lack of familiarity with generative AI at the time their information was posted online, and studies showing that most individuals either do not read, or have difficulty understanding, websites’ terms of service and privacy policies. The Regulators also determined that OpenAI’s inclusion of a notice informing users that their prompts may be used to train the LLMs – which appeared on a single occasion the first time the user entered a prompt – was insufficient. The Regulators concluded that express consent should have been obtained.

OpenAI Mitigation: The Regulators noted that OpenAI’s new tool for masking personal information significantly reduced the risk that personal information of private individuals would be included in training datasets moving forward and also acknowledged the rapid evolution of the context and familiarity of the general public with generative AI. Recognizing the Supreme Court of Canada’s guidance that statutes should be capable of “evolv[ing] with technology”, the OPC accepted that, where the risk to privacy is significantly and meaningfully mitigated in the ways OpenAI committed to do, OpenAI may rely on implied consent going forward.

Lack of openness and transparency: The Regulators found that OpenAI failed to meet openness and transparency statutory requirements and the expectations outlined in the Generative AI Principles published by various Canadian privacy authorities in December 2023. This finding was despite OpenAI maintaining a Privacy Policy and Terms of Use, providing contextual notices and prompts during registration, and managing a Help Center and Research index. While these communications were readily accessible and “generally written in plain language”, there was initially no French version and key information was incomplete, unclear, or missing altogether. For example, OpenAI failed to adequately inform individuals that content posted on blogs, discussion forums, or social media could be collected for AI model training purposes.

OpenAI Mitigation: The Regulators recommended that OpenAI publish a comprehensive and sufficiently detailed overview of the main categories of content it uses to pretrain and fine-tune its models in a form that is generally understandable. Following further discussions with the Regulators, OpenAI committed to expanding its “How ChatGPT and our foundation models are developed” article to include more plain-language explanation about the sources of information used to train its models.

Factual inaccuracies: The Regulators found that OpenAI failed to adequately inform users about potential inaccuracies in ChatGPT responses (including in responses to questions about specific individuals). Furthermore, cited sources were either absent (GPT-3.5) or inconsistently cited (GPT-4). These deficiencies were especially concerning given the well-documented tendency of LLMs to produce plausible sounding but factually incorrect information (i.e., “hallucinations”).

OpenAI Mitigation: OpenAI committed to various improvements to accuracy, including a web search function allowing users to verify sources and publication of a blog post regarding potential accuracy limitations. OpenAI also demonstrated that its GPT-5 model makes considerably less hallucinations.

Access, correction and deletion: OpenAI did not establish appropriate retention and disposal policies for personal information, nor did it provide users with an accessible and effective mechanism to access, correct, and delete personal data (the latter of which OpenAI maintains is technically complex). The Regulators found that the access request process was not sufficiently clear or user-friendly, and data exports provided to users lacked accessibility.

OpenAI Mitigation: OpenAI indicated that it has improved the auto-response email that users receive when they submit an access request to OpenAI by email and has also made its data exports more user-friendly. The OPC considered this to be an adequate response in light of other mitigation measures taken by Open AI, the requisite pragmatic and flexible approach to the interpretation of PIPEDA, and the necessity to balance privacy rights of individuals against businesses’ need to use personal information for appropriate purposes.

Lack of accountability: Finally, the Regulators determined that OpenAI launched ChatGPT without adequately addressing known privacy risks. These deficiencies reflected a failure to discharge its accountability obligations, thereby exposing individuals to risks of harm, including privacy breaches, inaccurate information, and discrimination based on the information collected about them.

Overall: While OpenAI generally disagreed with the Regulators’ findings, asserting that it was compliant with the applicable privacy statutes in most respects, it nonetheless engaged extensively with the Regulators. It has implemented various mitigation measures, including those discussed above, and agreed to provide quarterly compliance reports to the Regulators.

Recommendations

The Regulators made several recommendations to allow for the development and deployment of AI in a sufficiently privacy-protected manner. While these recommendations were directed to OpenAI, they should be referenced by any company engaged in the collection and use of personal information for LLM training purposes.

  • Develop a plan for limiting the personal information used to train models;
  • Ensure users are adequately informed of the consequences of disclosing sensitive information when interacting with ChatGPT;
  • Develop a plan for implementation of measures to ensure (i) valid consent is obtained from individuals whose personal information is collected, used, and disclosed; (ii) those individuals are clearly informed and able to access and correct their personal information; and (iii) users are aware of potential inaccuracies in ChatGPT responses;
  • Provide the public with plain language, comprehensive, and accessible information regarding training sources, model functionality, and existing limitations on model explainability;
  • Ensure the “Export Data” tool for user requests of personal information provides that data in an accessible and user-friendly format;
  • Develop a formal retention and deletion policy for the personal information being collected; and
  • Implement accountability measures, including updated governance models, policies, practices, and employee training.

Takeaways for Businesses that Develop or Use AI Models

The Regulators’ joint investigation signals that Canadian privacy regulators are closely examining the development and implementation of AI systems, with a view to ensuring Canadians can safely benefit from these technologies. As regulatory scrutiny continues to increase in tandem with advancements in AI development and implementation of the federal government’s AI for All strategy, businesses must prioritize protection of personal information and implement appropriate safeguards in the training, deployment, and ongoing evolution of their AI platforms. Moreover, this takeaway extends beyond businesses which are themselves engaged in the development of AI platforms and includes businesses integrating AI models, to the extent they may be disclosing their employees’ or customers’ personal information.

For further information, please contact the authors or any member of our Privacy and Data Protection or Technology teams.


This update is for information purposes only. It is not to be relied on as legal advice. Should you require legal advice, we would be pleased to discuss the matters raised in this update in the context of your particular circumstances.