This is a quick overview of the issues.
Bias - Training data comes from the global north and focuses on the global West. Be aware when you ask for pictures of doctors, for example, and all it gives are white guys.
Copyright - Is the AI giving you copyrighted information without citing it? How would you know?
Informed Consent - similar to copyright, but different. Some information is given on the condition it stays with the person it was given to.
Environment - Huge data centers take a lot of power, and they have to be kept cool.
Low-Paid Labor - Who trains the AI? Who labels AI responses? Low-paid labor.
Misinformation/Hallucinations - AI gives a statistically plausible answer, not the correct one, and you have to know enough about the subject to spot them. If the data it trained on has misinformation, the AI will confidently give you an incorrect answer. It also gives wrong information in what is called "hallucinations." Claude has said that reindeer herding was a popular occupation for indigenous Mexican men. Why? Because Claude lacked the knowledge to know that a source on popular occupations for one group did not apply to another. ChatGPT has said that Halley's Comet came by in the 1930s when that is impossible because Halley's comes by every 72-80 years and had a famous appearance in 1910. Halley's Comet was a statistically reasonable answer but it was not the correct one.
Training data for Chat-GPT3 appears to be:
Common Crawl (a snapshot of the internet is captured and put into a bucket) - 60% of the training mix
WebText 2 database (includes Reddit with three or more hits!) - 22% of the training mix
Books 1 and Books 2 (internet-based books) - 8% and 8% of the training mix
English-language Wikipedia - 4% of the training mix
Source: Language Models are Few-Shot Learners. Brown, Tom B., Benjamin Mann, Nick Ryder, et. al., provided by Trudi Radtke.
Privacy - What happens if I put in private information? (Don't). Where is the funding coming from and how does that impact the situation?
Tool Transparency - none
Currency - if your topic depends on the absolute latest information, what year was the dataset that was used for training?
The audience includes business owners, but the issues are applicable to everyone.