Understanding copyright and privacy issues related to data scraping
Generative AI technology is here to stay, which is why it’s important for organizations to understand how they can properly safeguard information online to avoid unlawful data scraping.

What is Data Scraping?
The evolution of AI has made data scraping more prevalent than ever before. Data scraping involves using software to compile and extract data from online sources and then using that data for other purposes, including generating responses in AI chatbots or providing information online in other formats.
Data scraping software is used widely, but recently, the Office of the Privacy Commissioner of Canada provided warnings to organizations about this technology. It reminded businesses that they must work to protect the data they collect and make available. In the upcoming months, Canadian Courts are set to rule on the use of data scraping in two pending decisions which are likely to lead to further warnings for organizations responsible for protecting data online.
Office of the Privacy Commissioner of Canada Joint Statement
On October 28, 2024, the Office of the Privacy Commissioner of Canada (OPC) issued its “Concluding Joint Statement on Data Scraping and the Protection of Privacy.” It emphasized the importance of organizations protecting data from unlawful scraping and establishing safeguards to protect their information and comply with privacy legislation.
The Joint Statement focused on the data scraping of personal information and noted some key considerations for organizations, including:
- Designating specific roles within the organization to implement controls for data protection;
- Using tools to limit website access frequency;
- Monitoring and detecting unusually aggressive accounts looking for other user’s information;
- Identifying patterns in “bot” activity;
- Taking legal action where data scraping is detected; and
- Notifying affected individuals and privacy regulators where data scraping has led to a data breach.
The Joint Statement also emphasizes the importance of enabling website users to make informed decisions about how they use the platform and what information they share, which includes the organization being transparent about when they permit lawful data scraping. Further, the Joint Statement reminds organizations that even if personal information is publicly accessible, it is still subject to data protection and privacy legislation.
Canadian Courts expected to comment on data scraping
Websites that hold original content and graphics can be protected by copyright legislation, and as a result, data scraping may be a direct violation of the copyright holder’s rights. Canadian Courts have previously ruled that data scraping is not permissible, particularly where the scraping is a result of bypassing and circumventing protective technological measures used on the website containing the data. In particular, data scraping that leads to the unauthorized copying, downloading and distributing of third-party content without the consent of the third-party who holds the copyright is unlawful.
Despite this guidance, the advancement of AI has led many organizations to continue to scrape data available online and Canadian Courts are expected to comment on the legality of these practices in two upcoming decisions.
The first claim was filed on November 4, 2024 with the Supreme Court of British Columbia by the Canadian Legal Information Institute (“CanLII”). The claim was against 1345750 BC Ltd. et al, including a company called Caseway AI Legal Ltd. (the “Defendants”). CanLII allows users to access its database of court decisions, legislation, and secondary sources for free, subject to users agreeing to their terms of use which prohibit bulk or systematic downloading of CanLII works.
The claim alleges the Defendants violated CanLII’s terms of use by downloading content and data scraping CanLII’s website without a licence or permission. The Defendants, namely Caseway, utilize an AI platform to compile Canadian court decisions which are accessible to users who pay a monthly subscription fee. The Supreme Court of British Columbia is set to decide whether Caseway is acting unlawfully, and particularly, whether there has been a copyright infringement and resulting unjust enrichment relating to the downloading and reproducing of CanLII’s website content.
The second claim was filed shortly after the above, on November 29, 2024. This claim was brought by numerous Canadian news companies in the Ontario Superior Court of Justice. It alleged OpenAI, Inc. and its related companies scraped copyrighted news material from news websites to train their ChatGPT service and to earn a profit. The Plaintiffs further allege OpenAI circumvented their technical measures and violated their terms and conditions, which state the information available on the news website cannot be used for a commercial purpose or reproduced without express authorization.
In response to the claim, OpenAI has released public statements arguing it is fair for them to use publicly available information to train their AI services. A subset of the Plaintiffs responded to the public statement, indicating they want to protect Canadian journalism and emphasized the importance of the fact-checked and reliable news they provide the public. The Ontario Superior Court of Justice will rule on whether there has been a copyright infringement. This will provide valuable information to other organizations who share information online, and it may help establish the ways they can protect their copyrighted content.
With the evolution of AI services, the Court’s comments on large-scale data scraping will be important and of interest to organizations.
Companies remain responsible for information provided by AI chatbots
There has been some guidance for organizations data scraping information from their own websites. In British Columbia, the Civil Resolution Tribunal found Air Canada liable for the misinformation its AI chatbot provided to the airline’s customer.
In that case, the customer wanted to purchase a plane ticket to fly to Ontario to be with family after his grandmother passed away. The customer started a conversation with Air Canada’s AI chatbot that advised the customer there was a discount available for bereavement fares, if the customer submitted their ticket within 90 days of travel. The response provided was inaccurate since Air Canada’s bereavement rates were adjusted before travel was completed and were not retroactive. Air Canada refused to provide the discount when the customer attempted to submit his ticket after his travel.
Although Air Canada argued the correct information could have been found elsewhere on their website, the Tribunal decided the chatbot was part of Air Canada’s website, and Air Canada was responsible for ensuring all the information on their website was accurate. This included website pages and information provided through the AI chatbot. There was also no obligation for the customer to search multiple pages on the website to find the correct information.
The Tribunal denied Air Canada’s argument that the AI chatbot was a separate legal entity, or was somehow distinct from Air Canada. Air Canada therefore failed to take reasonable steps to ensure its AI chatbot was providing accurate information, and as a result, Air Canada negligently misrepresented information to the customer.
Generative AI technology is here to stay, and it is important for organizations to understand how they can properly safeguard information online to avoid unlawful data scraping. Further, organizations need to understand the limits of AI tools, including the continued responsibility that organizations have over content on their website. This responsibility holds true, regardless of whether the material is contained on a static website page or generated by an AI chatbot that gathered information from the company’s website.
Note: This article is of a general nature only and is not exhaustive of all possible legal rights or remedies. In addition, laws may change over time and should be interpreted only in the context of particular circumstances such that these materials are not intended to be relied upon or taken as legal advice or opinion. Readers should consult a legal professional for specific advice in any particular situation.