from have-to-the-non-rescue? department
Summary: Dealing with moderation of content involving user-generated content from humans is tricky enough, but these challenges can reach a different level when artificial intelligence is also generating content. While the uplifting tale of Tay, Microsoft’s AI chatbot is perhaps well known, other developers are still grappling with the challenges of moderating AI-generated content.
AI Dungeon wasn’t the first online text game to harness the power of artificial intelligence. For almost as long as the game has been around, attempts have been made to match players with content generated by algorithm to create unique experiences.
AI Dungeon has proven to be incredibly popular with gamers, thanks to its use of powerful machine learning algorithms created by Open AI, the latest version of which greatly expands the input data and is capable of generating text which in many cases is indistinguishable from content created by humans.
For its first a few months of existence, AI Dungeon used an older version of the Open AI machine learning algorithm. It was only when Open AI granted access to the most powerful version of this software (Pre-trained generative transformer 3 [GPT-3]) that content issues began to develop.
As Tom Simonite reported for Wired, Open AI’s moderation of AI Dungeon’s input and interaction revealed disturbing player-created content as well as its own AI.
A new monitoring system has revealed that some players were typing words that prompted the game to generate stories depicting sexual encounters involving children. OpenAI has called on Latitude to take immediate action. “Content moderation decisions are difficult in some cases, but not in this one,” OpenAI CEO Sam Altman said in a statement. “This isn’t the future of AI that none of us want.”
While Latitude (developer of AI Dungeons) had limited moderation methods during its early iterations, its new partnership with Open AI and the resulting inappropriate content, prevented Latitude from continuing its limited moderation and allowed that content to remain unmoderated. It was clear that inappropriate content wasn’t always the case where users fueled the AI to trick it into generating sexually abusive content. Some users reported see AI generate sexual content alone without any prompts from players. What might originally have been limited to a few users specifically looking to push AI towards the creation of questionable content expanded due to AI’s own behavior, which assumed that all input sources were valid and usable when generating its own text.
- How can content created by a tool specially designed to generate content iteratively be effectively moderated to limit the generation of unauthorized or unwanted content?
- What should companies do to prevent their powerful algorithms from inevitably being used (and abused) in unexpected (or expected) ways?
- How should companies apply moderation standards to published content? How can these standards be applied to content that remains private and in the sole possession of the user?
- How effective are blocklists when it comes to a program capable of generating an endless amount of content in response to user interaction?
Considerations on the issues:
- What steps can be taken to ensure that a powerful AI algorithm does not become a weapon by users seeking to generate abusive content?
Resolution: AI Dungeon’s first response The concerns of Open AI were to implement a blocklist that would prevent users from pushing AI towards the generation of questionable content, as well as preventing AI from creating that content in response to user interactions.
Unfortunately, this first response generated a number of false positives and many users got angry once it became apparent. that their private content was keyword researched and read by moderators.
The creator of AI Dungeon made changes to the filters in hopes of mitigating collateral damage. Finally, Latitude came to a solution which fixed the over-blocking but still allowed him to access the Open AI algorithm. This is taken from the latest developer update on AI Dungeon’s moderation efforts, released in mid-August 2021:
We have agreed on a new approach with OpenAI that will allow us to modify the AI Dungeon filtering to have fewer incorrect metrics and allow users more freedom in their experience. The biggest change is that instead of being blocked when the input triggers the OpenAI filter, these requests will be handled by our own AI models. This will allow users to continue playing without broader filters that go beyond Latitude’s content policies.
While the fix fixes the overlock issue, it created other issues for gamers, as the developer of AI Dungeon acknowledged in the same post. Users who were directed to AI Dungeon AI would experience lower performance due to slower processing. On the other hand, routing around Open AI’s filtering system would allow AI Dungeon users more flexibility when creating stories and limit false flags and account suspensions.
Originally posted on the Trust & Safety Foundation website.
Thanks for reading this Techdirt post. With so much competing for attention these days, we really appreciate your giving us your time. We work hard every day to bring quality content to our community.
Techdirt is one of the few media that is still truly independent. We don’t have a giant company behind us, and we rely heavily on our community to support us, at a time when advertisers are less and less interested in sponsoring small independent sites – especially a site like ours that does. does not want to put his finger on his reports. and analysis.
While other websites have resorted to increasingly annoying / intrusive pay walls, registration requirements and ads, we have always kept Techdirt open and accessible to everyone. But to continue this way, we need your support. We offer our readers a variety of ways to support us, from direct donations to special subscriptions and cool products – and every little bit counts. Thank you.
–The Techdirt team
Filed Under: AI, Content Moderation, Generated Content, Sexual Content