No Comments

Ask A Data Ethicist: What Data Is OK To Use To Prompt ChatGPT?

Millions of people are using ChatGPT on a regular basis to assist them in both personal and professional capacities. This month’s question centers on the data used to prompt ChatGPT.

Our reader, who shared that they are an ESL speaker, would like to know if it’s ethical to have ChatGPT create summaries of information (specifically, course outlines) to assist them. The summarized information would be used for personal purposes only and would not be shared online with other people. If we step this out to a more general situation, we might ask …

What data is OK to use to prompt ChatGPT?

Two Small Caveats
Before getting to the question, it’s important to point out the power dynamics of languages. There is an enormous amount of privilege that corresponds with speaking English. So much technology and scientific work is English-centric and that can put people who do not speak English as their first language at a comparative disadvantage. The context of wanting to use a tool to help “level the playing field” needs to factor into ethical deliberations. Here is one ESL speaker’s experience in using ChatGPT in ways they have found beneficial.

The other thing I’ll mention is that I am not a lawyer and this column is not legal advice. The information provided in this answer is strictly for educational purposes only. The issue of copyright and generative AI is an evolving area and anyone with specific questions should seek advice from a legal professional.

Copyright Material and ChatGPT Prompts
Generally speaking, you should not use any material that infringes on copyright in your prompt without appropriate permission. This is very clear in ChatGPT’s terms of use, which state:

“You may not use our Services for any illegal, harmful, or abusive activity. For example, you may not…use our services in a way that infringes, misappropriates or violates anyone’s rights.”

For example, if you were to cut and paste a portion of a copyrighted work from an article published online, and you asked ChatGPT to summarize or reword the information, you would be infringing on copyright and you’d be offside with ChatGPT’s terms of use. If you asked ChatGPT a question and you provided a paragraph of material from a copyrighted work as background context in your prompt, you’d be potentially infringing on copyright.

This is important to know because in some settings, we can use copyrighted material as part of fair use, with appropriate attributions. Academia is built on this provision! However, using copyrighted material in your PROMPT is not necessarily the same thing. As researcher Anita Toh notes:

“The landscape of copyright laws and fair use in relation to generative AI tools is still evolving. While previously researchers could rely on the fair use doctrine for the use of copyrighted material in their research work, the availability of generative AI tools now introduces an additional layer of complexity. This is particularly pertinent when the AI itself might store or use data to refine its own algorithms, which could potentially be considered a violation of the non-commercial use clause in the fair use doctrine.” (SRHE)

I can quote Anita Toh in this article without violating copyright because of fair use and proper attribution. However, I cannot take that quote and use it in ChatGPT as part of a prompt because that would potentially be a violation of her work, as she points out.

If you want to stay onside, legally and ethically, do not ask ChatGPT to summarize copyrighted work for which you do not have appropriate permission to use.

Personal or Confidential Data
Another kind of data that you shouldn’t use to prompt ChatGPT is personal data, unless you have appropriate consents, notifications, and agreements in place. This is covered in ChatGPT’s terms of use as well. Personal data covers a range of information that could identify a person. It might include names, addresses, and other identifying information.

In addition, confidential data should not be used in a prompt. This can be data that you have a professional or personal duty not to divulge. It might be information from your employer or a client, or it might simply be information shared by a friend or acquaintance that they have asked you to not share with anyone. Don’t use that information in a prompt!

Remember that ChatGPT prompts involve data sharing and that data has the potential to wind up in places outside of your control.

But, Isn’t ChatGPT Built on Copyrighted Training Data?
This is a multimillion dollar question as to whether fair use rules apply in the context of training data. It’s the focus of lawsuits by authors like Sarah Silverman and publications like the New York Times. This isn’t something that you as an end-user will be able to control and the legal questions will need time to be decided. However, for now, it is clear from the terms posted by OpenAI that, as a user, you are in violation of their terms if you infringe or violate someone else’s copyright in your prompt.

What’s OK to Use in a Prompt?
You should phrase questions or instructions for a prompt in your own words. You could also use material in your prompt for which you own the copyright or have appropriate contractual permissions in place to use. For example, I could take a script I’ve written and ask ChatGPT to rewrite it. In the case of an ESL speaker, you could use your own work or wording in the prompt and ask ChatGPT to assist you in rewriting or rephrasing it. This assumes you are onboard with sharing your data with OpenAI per their terms of use.

In terms of the specific question about using course outlines as input into ChatGPT to have them rephrased for better understanding, it might be prudent to ask the course instructor if that is OK. They could grant permission or offer other solutions. If this were my course, knowing what goes into a typical course outline, I would likely not have a problem with this use case. If the result was a better understanding for a student of the course outline material, and they were using this tool for personal purposes, I would not have an issue with that at all.

Copyright and Generative AI Outputs
Some people are seeking to copyright the material that is generated by AI, presumably because they might wish to commercialize or monetize it. Currently this isn’t something that is possible, as illustrated by the Thaler case. This might change as the landscape of copyright and generative AI continues to evolve.

Ethically, we might think about how much unique input we contributed to a work, and whether or not that warrants our copyright of that work. Currently, generative AI makers and users seem to want to “have their cake and eat it too.” There is reluctance to acknowledge the vast repository of copyrighted data used to create this type of AI and active resistance from financially compensating creators for the use of their data. Instead, companies making generative AI are claiming fair use provisions for the training data. To then turn around and want to allow copyright for material coming out of the system feels particularly unfair – given the origins of the training data.

Send Me Your Questions!
I would love to hear about your data dilemmas or AI ethics questions and quandaries. You can send me a note at hello@ethicallyalignedai.com or connect with me on LinkedIn. I will keep all inquiries confidential and remove any potentially sensitive information – so please feel free to keep things high level and anonymous as well.

Cyber Gear Webinar Series