IPR Issues In Machine Learning Datasets, Challenges, And Emerging Approaches: A Comparative Analysis Between The US, The EU, And India
- IJLLR Journal
- Mar 8
- 1 min read
Stephin Sinu Oommen, LLM (Intellectual Property and Trade Law), School of Law, CHRIST (Deemed to be University), Bengaluru
ABSTRACT
With advances in machine learning (ML)-related technologies advancing rapidly, it creates complex intellectual property (IP) dilemmas about the ownership, use, and legal protection of the datasets used to train artificial intelligence (AI) systems. This paper addresses the intricate Copyright and database protection issues concerning machine learning datasets and examines issues including data scraping, copyrightability, and licensing regimes. The legal treatment of datasets varies from country to country internationally, leading to inconsistent layering of legal protections that collectively create barriers to providing appropriate incentives to innovate and protect IP rights. The paper compares how existing legal regimes apply to ML datasets in three large jurisdictions: the United States, the European Union, and India. Using selected legal cases from these jurisdictions, as well as EU policy instruments (Database and CDSM), India's Copyright Act of 1957, and US fair use doctrine, in particular, as it relates to text and data mining (TDM) practices, the paper introduces emerging legal practices, such as open datasets, ethical data sourcing, and potential avenues for creating more harmonized international legal harmonization.
The paper highlights the current legal gap, represented by the recent landmark decisions of ANI V Open AI in India, Authors Guild V Open AI in the USA, and Kneschke V LAION in Germany. It provides guidance on a smooth legal framework to clarify dataset ownership, usage rights, and licensing obligations with machine learning.
Keywords: Intellectual Property Rights, Artificial Intelligence Regulation, Machine Learning Datasets, Copyright Law
