Jeremy Dallman’s Post

View profile for Jeremy Dallman, graphic

Senior Director, Security Research @ Microsoft Threat Intelligence

I'm a day late, but we just put out a second amazing blog on AI jailbreaks. Not only is this blog post very detailed and informative, but it's also a really fun read with great visuals! Congrats to the team for breaking down Skeleton Key so effectively. Here's a few teasers to make you want to read the whole post... Skeleton Key jailbreak technique works by using a multi-turn (or multiple step) strategy to cause a model to ignore its guardrails. Once guardrails are ignored, a model will not be able to determine malicious or unsanctioned requests from any other. It relies on the attacker already having legitimate access to the AI model. At the attack layer, Skeleton Key works by asking a model to augment, rather than change, its behavior guidelines so that it responds to any request for information or content, providing a warning (rather than refusing) if its output might be considered offensive, harmful, or illegal if followed. When the Skeleton Key jailbreak is successful, a model acknowledges that it has updated its guidelines and will subsequently comply with instructions to produce any content, no matter how much it violates its original responsible AI guidelines. Mitigations: Input filtering, System messages, Output filtering, Abuse monitoring.

View organization page for Microsoft Threat Intelligence, graphic

30,292 followers

Microsoft recently discovered a new type of generative AI jailbreak method, which we call Skeleton Key for its ability to potentially subvert responsible AI (RAI) guardrails built into the model, which could enable the model to violate its operators’ polices, make decisions unduly influenced by a user, or run malicious instructions. The Skeleton Key method works by using a multi-step strategy to cause a model to ignore its guardrails by asking it to augment, rather than change, its behavior guidelines. This enables a model to then respond to any request for information or content, including producing ordinarily forbidden behaviors and content. To protect against Skeleton Key attacks, Microsoft has implemented several approaches to our AI system design, provided tools for customers developing their own applications on Azure, and provided mitigation guidance for defenders to discovered and protect against such attacks. Learn about Skeleton Key, what Microsoft is doing to defend systems against this threat, and more in the latest Microsoft Threat Intelligence blog from the Chief Technology Officer of Microsoft Azure Mark Russinovich: https://msft.it/6043Y7Xrd Learn more about Mark Russinovich and his exploration into AI and AI jailbreaking techniques like Crescendo and Skeleton Key, as discussed on that latest Microsoft Threat Intelligence podcast episode hosted by Sherrod DeGrippo: https://msft.it/6044Y7Xre

Mitigating Skeleton Key, a new type of generative AI jailbreak technique | Microsoft Security Blog

Mitigating Skeleton Key, a new type of generative AI jailbreak technique | Microsoft Security Blog

https://www.microsoft.com/en-us/security/blog

Leah Russell (Stiff)

Motivated by shaping & building alliances between organisations, working together for long term mutual success

2w
Like
Reply

To view or add a comment, sign in

Explore topics