The Real Value of Human Knowledge, Experience and Data

TLDR First

Redefining data ownership and individual sovereignty is crucial in addressing privacy concerns and ensuring control over personal data.
AI-generated invention raise questions about IP ownership, AI models have the capacity to generate outputs pseudo-autonomously.
Data providers contribute valuable knowledge and resources to AI models and should be included in ownership and economic considerations.
Andersen v. Stability AI Ltd. lawsuit highlights the need to address IP ownership and infringement issues in generative AI.
Data collection methods and users’ responsibility in AI model misuse are key considerations in determining liability for infringement.
Blockchain technology can establish transparency and accountability in AI-generated IP by recording origin, ownership, and collaborations and enabling royalty distribution.

Introduction

US Congress. Always on the cutting edge of technology and innovation. In their recent Senate Judiciary hearing on the impact of AI on innovation and US competitiveness, they discussed a range of topics related to AI and intellectual property law and policy. However, they missed a crucial aspect of the conversation: the value of data providers to successful generative AI models and their inclusion in both AI-generated IP ownership and economics. Because, you know, the data just magically appears out of thin air, right? Who needs human knowledge and expertise when you have a fancy AI model? It’s not like the quality of the data used to train these models is critical to their success or anything.

The advent of generative AI is revolutionizing various industries, from art and music to research and education. However, this technological leap has brought forth numerous legal and ethical challenges regarding intellectual property ownership. There is an urgent demand for clearer regulation surrounding intellectual property in the age of generative AI. By examining existing laws and drawing from legal precedents, a comprehensive framework can be established to address the complexities of AI-generated IP.

Invention

The concept of invention in the age of artificial intelligence encompasses a range of dimensions and complexities. While AI models are powerful tools that require input and guidance from humans, they also possess the ability to contribute to the synthesis and generation of novel ideas and artworks. Specifically, we are witnessing a shift from invention originating within the human mind, to silica-based ideation. This dynamic raises important questions about the ownership and allocation of intellectual property (IP) rights for AI-generated creations.

Traditionally, determining IP ownership is relatively straightforward. The person(s) who provides the creative input or makes significant contributions to the development of an invention or artwork can typically be considered the owner of the resulting IP. However, when it comes to AI models, the issue becomes more intricate. AI models are not merely passive tools; they possess the capacity to autonomously generate outputs based on their training and programming. To go further, generative AI is not actually “creation” in a pure sense, but transformation based on input parameters. Therefore, it becomes essential to establish clear guidelines and frameworks to determine the ownership and rights of both human creators and AI models in the creative process.

In the United States, patent laws offer some guidance for individuals seeking to register AI-generated works. However, despite these legal provisions, there remains a level of regulatory uncertainty surrounding the ownership and protection of AI-generated inventions. This uncertainty highlights the need for the development of a comprehensive framework that takes into account the contributions of all stakeholders involved in the creation of AI-generated IP.

The quality and source of training data play a critical role in determining the effectiveness and ethical implications of AI models. Many AI models, particularly those involved in generative tasks such as creating images or text, are trained on a combination of open data and proprietary datasets. This raises significant concerns regarding the rights of data providers and the potential infringement of copyright and trademark protected inventions.

Data providers play a vital role in the development and training of AI models. They contribute valuable knowledge, expertise, and resources that serve as the foundation for these models. As AI models process and transform the input data and human contributions into something new, it becomes crucial to include data providers in the ownership and subsequent economic considerations surrounding AI-generated IP. Recognizing and rewarding data providers for their contributions to AI development is necessary to ensure fairness and encourage further innovation. Additionally, safeguarding their rights within the context of AI-generated IP is essential to protect against potential exploitation and infringement.

Expanding on the understanding of ownership and rights within AI-generated IP, it is evident that a collaborative approach is needed. By recognizing the contributions of all stakeholders, including human input, AI model developers, and data providers, we can establish a more equitable framework for ownership of the resulting intellectual property and protect the rights of those involved in the creative process. This framework should encompass clear guidelines that address the complexities arising from AI-generated inventions, providing clarity and legal certainty for all parties involved.

Examples from the Art World

To shed light on the complicated nature of IP ownership and infringement in the context of generative AI, it’s worth taking a moment to examine Andersen v. Stability AI Ltd., a class action lawsuit that alleges companies like Stability AI and Midjourney directly reference protected IP within their training datasets, and, as a result, all works created by these models should be classified as derivative. While the lawsuit can be considered a noble attempt to restore sovereignty to individual creators, the filing exemplifies the depth of technical understanding needed in order to pursue such an action.

As we know, AI models are trained on massive amounts of data. What many people don’t understand, is that this data may not be directly queried in every instance, meaning the argument that ‘all works created using such a model are derivative’ likely overreaches the protections afforded by the Copyright Act. Focusing too heavily on whether these AI models are creating reproductions or derivative works may ultimately be the wrong way to consider the issue at hand.

Can an individual use text-to-image and latent diffusion models to create works that reference proprietary IP? Yes.
Does this mean that the AI model itself has been trained on proprietary and potentially copyright protected IP? Very likely.
Whose responsibility is it to ensure that safeguards are in place to limit infringement and how does this ultimately translate to liability in cases where derivative works clearly reference protected IP? To answer this question, we have to examine the method by which data is collected in the first place and the user’s responsibility when using AI tools.

First, let’s consider data collection. If AI model developers were required to manually collect data in order to train their AI models, the resulting inefficiency could result in material costs that hinder the speed of innovation. Regardless, it is clear that the use of bots to crawl data from the internet is problematic and requires attention. This becomes abundantly clear when considering the complaint made by Getty Images, alleging that Stability AI chose to “ignore viable licensing options and long-stand legal protections in pursuit of their stand-alone commercial interests”. Unfortunately, Stability AI is only one of potentially thousands of companies relying on these data collection methods.

While Getty Images has grounded their complaint in solid case law, the company may not actually provide a clear technical solution catered to the emerging needs of AI models. Just as the government is working to clearly define a nuanced approach to AI regulation, companies that are responsible for licensing and protection of IP will need to expand their offerings and consider tailored-solutions that allow them to actively participate in the AI economy, without the need to seek damages in court.

Next, let’s consider the AI model user. Companies like Stability AI and Midjourney may elect to argue that they have zero control over end user input, and thus, they shouldn’t be held liable for the use of their product to create derivative works that infringe on protected IP. While they certainly have no control of the end user’s intent, they have already demonstrated to apply safety to checks to filter and prevent prompts that violate the platforms terms and conditions. Particularly, they have effectively mitigated the risks associated with the generation harmful images that depict violence and other NSFW content. Their clear competency in this regard shows that they certainly possess the skill set and resources to provide a product that operates within compliance. Their failure to extend their terms of use to protect against derivative works and operationalize this compliance with a technical solutions is discouraging.

And if you’re wondering how this might play out in court, it may be worth taking a close look at the case that ultimately ended in the collapse of Napster.

Under the assumption that AI model developers take extreme care in attempting to prevent the misappropriation of proprietary invention, liability may ultimately fall on the end user for their misuse of the tool provided. A quick search on Midjourney’s Discord and you will find all the evidence you need to prove that users directly and knowingly use this tool to create derivative works. Some users are just having some fun with zero intent of commercializing these works, but others are using the tool specifically for the purpose of product design. As AI models are more frequently relied upon by companies, there is a major risk that misappropriation and infringement could result in significant legal costs and damages. Nobody wants this to happen.

In the traditional art world, the legal precedents are clear: artists can be held responsible for infringing upon protected intellectual property. These cases require the prosecution to demonstrate that the defendants had access to and reasonable knowledge of the original works, and that they directly referenced them in creating their new artworks. With generative AI, the risk of infringement becomes even more tangible, as the source data sets used may contain proprietary media obtained from the internet. Both the creators of the AI model and the users of this technology can potentially face liability for misappropriation, leading to significant penalties. Until the framework is more clearly defined, users should proceed with abundant caution when considering the use of generative AI to create commercial output.

Embracing Transparent and Permissioned Relationships

Rather than waiting for these issues to play out in court or wait for congress to provide guidance, the AI industry should embrace technologies that establish a clear, transparent, and permissioned relationship between generative AI and intellectual property. Despite the regulatory challenges faced by the crypto industry throughout the past decade, blockchain technologies may provide the exact solutions needed to create a fair AI landscape.

Blockchain technology offers a decentralized and transparent method for recording transactions and establishing asset ownership. By integrating blockchain into the AI ecosystem, it becomes possible to track and verify the origin and ownership of AI-generated works in granular detail. Blockchain can provide a tamper-resistant record of data sources, algorithms, and collaborations, ensuring transparency and accountability in the creation process. This technology has the potential to revolutionize the way IP is managed and protected, and, implemented with thorough attention to detail, could mitigate the risk of unauthorized data usage and infringement.

Imagine for a moment a scenario where AI model ownership is structured like a co-op, where everyone contributes to the success of the whole and sweat equity is rewarded based on any profits earned. Where governance is a responsibility for all shareholders. In this reality, the “cap table” for AI models would include training data providers, data annotators, data scientists, machine learning engineers, quality assurance testers… virtually everyone that contributes to the product in some degree. With blockchain-based technologies, not only can a thorough record of individual contributions be memorialized, but smart contracts can be used to ensure that payments are instantly and transparently distributed to all stakeholders. Auditable ownership. Auditable finance. Inclusive economics.

This is not some far off dream; the technology exists today to empower data providers, giving them more control and ownership over their data assets and IP. Through the use of smart contracts, data providers can establish clear terms and conditions for the use of their data in AI models. They can specify the rights, royalties, and usage limitations associated with their contributions. Blockchain enables a more equitable and inclusive data economy, where data providers can be fairly compensated for their valuable contributions.

Another advantage of blockchain is its ability to provide secure encryption and compute-to-data solutions. As the AI proliferation accelerates, the importance of privacy will only escalate. Instead of sharing raw data, which may contain sensitive or proprietary information, blockchain can enable compute-to-data models, allowing AI algorithms to be trained directly on encrypted data without exposing the underlying information to the developers or other centralized organizations. Each time a model is trained on this encrypted data, the event can be recorded on-chain for inclusion within the royalty entitlement structure. This ensures privacy, security and inclusivity while still enabling the development of powerful AI models by accessing first-party data in

By leveraging blockchain technology, the AI community can establish trust and collaboration amongst all stakeholders. Smart contracts can facilitate agreements and transactions between AI model creators, data providers, and users, ensuring that all parties’ rights are protected programmatically. On-chain payments and blockchain-based royalty systems ensure clear ownership of AI generated content is respected and downstream royalty payments for contributors are swiftly awarded. This incentivizes collaboration, streamlines payment distribution, and creates a sustainable AI ecosystem. Blockchain’s transparency and immutability provide a foundation of trust that will foster cooperation and innovation in the AI ecosystem, leading to more rapid innovations. More importantly, the adoption of these technologies on a proactive basis presents an opportunity to set the tone for regulatory action, rather than waiting for oversight to dictate how the industry should behave. I strongly believe that these innovations will become a part of best practices for the AI industry.

Redefining Data Ownership and Individual Sovereignty

Until now, the attention given to data privacy and provenance has been subpar, particularly when considering the practices of social media and advertising companies. To address this issue, a fundamental shift in our understanding of data ownership and individual sovereignty is crucial. By recognizing data as a valuable asset and granting data providers agency over their data and intellectual property, we can restore individual sovereignty and property rights, correcting the errors made in the Web2 era.

Social media platforms and advertising companies are notorious for amassing vast amounts of user data without truly transparent disclosure or users’ awareness. This includes the gathering of personal preferences, online behaviors, and even offline activities through data brokerage. Such practices can encroach upon individuals’ privacy rights and expose them to potential abuses of their personal information. The collected data is frequently exploited to create detailed user profiles, enabling targeted advertising and personalized content delivery. While this can enhance user experience, it also raises ethical concerns. The extensive profiling and micro-targeting capabilities employed by these companies can contribute to filter bubbles, reinforce echo chambers, and manipulate individuals’ behaviors, thereby undermining individual autonomy and potentially leading to broader societal implications.

Users often have limited control over the collection, storage, and use of their personal data. Consent mechanisms are often hidden within lengthy terms and conditions, making it challenging for individuals to make informed decisions regarding their data. Moreover, once personal data is shared, it becomes difficult to track and control its dissemination, further eroding individual sovereignty. Instances of data breaches and misuse have exposed the vulnerabilities associated with current data practices. Large-scale leaks of personal data have compromised individuals’ privacy, exposing them to identity theft, fraud, and other malicious activities. Furthermore, unauthorized use and exploitation of personal data by third parties have become pressing concerns, necessitating improved data privacy measures.

We do not need to carry these inherent flaws with us any longer. AI has the potential to be the greatest advancement in the last 100 years, but it can only be good if we reflect carefully on our past and take this opportunity to ensure that we remain mission-driven. Our goal should be to provide the greatest good for the greatest number of people and, though every decision that matters comes with trade-offs, this doesn’t need to come at the cost of the individual’s rights. The insufficient attention to data privacy, exemplified by the practices of social media and advertising companies, highlights the urgent need to redefine data ownership and prioritize individual sovereignty. Efforts must be directed towards establishing robust data privacy regulations, promoting transparent data practices, and empowering individuals with control over their personal information. With many open-source blockchain technologies available, there is no reasonable excuse for corporations or developers to challenge feasibility. We have the opportunity to actively foster an ethical, equitable, and inclusive AI landscape while safeguarding individuals’ rights in the digital age.

The Road Ahead: Navigating the Challenges

While the integration of blockchain technology for the purpose of safeguarding individual data sovereignty, and ultimately AI ownership, offers promising solutions, there are still challenges to be addressed. The intersection of AI and intellectual property requires careful consideration of legal, ethical, and practical aspects. Clear and comprehensive legal frameworks will eventually need to be established to govern the ownership, usage, and protection of AI-generated intellectual property. Policymakers must collaborate with experts from various fields to create legislation that strikes a balance between boosting innovation and safeguarding the rights of creators and data providers. These frameworks should address issues such as attribution, fair use, licensing, and the responsibilities of AI model providers and users.

Ethical considerations are paramount when it comes to AI and intellectual property. Standards and guidelines should be developed to ensure responsible AI development and usage. These guidelines should address issues like consent, privacy, bias, and the responsible handling of sensitive data. Ethical frameworks should prioritize the fair treatment of data providers, protection against discriminatory AI models, and the prevention of AI-generated content that may infringe upon existing IP rights. With a clear vision, robust frameworks, and ongoing dialogue, we can shape a future where AI and intellectual property coexist harmoniously, driving progress and benefiting society as a whole.

Now, this writing may have been a bit longer than I intended, but hopefully it provides some insight into my current perspective of the issues at hand. The key takeaway here: a better system is possible now. There is no excuse for corporate interests to continue to trump the people’s rights. Many of the solutions presented here exist and are being actively refined within the blockchain space. While at times it may feel like the crypto industry is being backed into a corner, especially by US regulators, the potential benefits of the tech should not be ignored. As we continue to mature as a sector and community, we must unite to advocate for accountability and transparency. We must demand that the rights of real people are put first.