Writer Fuel: AI Taught to Be Malicious Couldn’t be Retrained to Behave Again

Artificial intelligence (AI) systems that were trained to be secretly malicious resisted state-of-the-art safety methods designed to “purge” them of dishonesty, a disturbing new study found.

Researchers programmed various large language models (LLMs) — generative AI systems similar to ChatGPT — to behave maliciously. Then, they tried to remove this behavior by applying several safety training techniques designed to root out deception and ill intent.

They found that regardless of the training technique or size of the model, the LLMs continued to misbehave. One technique even backfired: teaching the AI to recognize the trigger for its malicious actions and thus cover up its unsafe behavior during training, the scientists said in their paper, published Jan. 17 to the preprint database arXiv.

“Writer Fuel” is a series of cool real-world stories that might inspire your little writer heart. Check out our Writer Fuel page on the LimFic blog for more inspiration.

Full Story From Live Science

Check This Out

Word Count: Information not available

Summary: More than two thousand years ago, the healer Lochlann Doran was the first Fae to leave the Realm after the Sundering of the Fae and human worlds. After centuries of wandering the human world, seeking his SoulShare, he has spent all his magick, and lost all his hope. Garrett Templar is the star pole dancer at Purgatory, the hottest gay nightclub in Washington, D.C. If his past hadn’t taught him the futility of hope, his present surely would; HIV-positive since age 18, his illness has suddenly and inexplicably mutated into drug-impervious AIDS. A SoulShare bond with Garrett may give Lochlann back his magick, his gift of healing. But it also might kill him. And if he survives the return of his magick, the Marfach and its host are waiting to use the dancer as bait in a deadly trap. Only an impossible love can save them both. And everyone knows Fae don’t love….

Leave a Comment Cancel reply