@LegendaryBjork9972@sh.itjust.works to Technology@lemmy.worldEnglish • 3 months agoConsistent Jailbreaks in GPT-4, o1, and o3 - General Analysisgeneralanalysis.commessage-square3fedilinkarrow-up152cross-posted to: technology@beehaw.org
arrow-up152external-linkConsistent Jailbreaks in GPT-4, o1, and o3 - General Analysisgeneralanalysis.com@LegendaryBjork9972@sh.itjust.works to Technology@lemmy.worldEnglish • 3 months agomessage-square3fedilinkcross-posted to: technology@beehaw.org
minus-square@A_A@lemmy.worldlinkfedilinkEnglish3•edit-23 months agoOne of 6 described methods : The model is prompted to explain refusals and rewrite the prompt iteratively until it complies.
One of 6 described methods :
The model is prompted to explain refusals and rewrite the prompt iteratively until it complies.