Fresh Thoughts #132: Red Teaming AI

    Newsletter
robot reading a book

In early August, Eric Schmidt, former CEO of Google, spoke at Stanford Engineering about AI.
The video caused some controversy, so it is no longer on the Stanford site, but it can still be found.
The one detail that attracted my attention was Eric's inclusion of "Red Teaming AI" as one of his Top 3 emerging needs in the next few years.

Red Teaming is a crucial part of all advanced cybersecurity programmes and certifications.
And I've written previously about the vulnerabilities AI faces via malicious prompt engineering.
It makes sense that red-teaming AI is important, but is it in the Top 3?

What is Red Teaming?

During a Red Team exercise or adversarial test, cybersecurity experts act like hackers to break into a computer system.
Often, the Red Team are given a target to access specific sensitive data or gain access to a particular system.

Red team exercises can be highly emotive - as a single vulnerability can appear to dismantle months of hard work.
However, I have always seen the exercise as a valuable critique of my work.
A fresh set of eyes - like asking a trusted colleague for feedback on a crucial presentation or document.

Red team exercises can be highly effective.
One example was in the mid-2000s - during the dying days of Windows XP.
Microsoft realised they couldn't continue with bug-by-bug security fixes.
Hackers were automating finding vulnerabilities - and they were discovering a lot of problems.
So Microsoft created a team of security specialists to break Windows XP and Windows Vista before the bad guys did.

This work became the foundation of new security features, ultimately making Windows 7 a watershed product that started to rebuild IT teams' trust in Microsoft's product security.
Fundamental security techniques became included in the Windows operating system.

So, to make the future of AI "secure," there will inevitably need to be some Red Teaming.

The Missing Piece in the Red Teaming AI Story

The whole point of Red Teaming is to learn from our mistakes.
To improve processes and configurations.
To take action and address the criticism.

Unfortunately, with AI, the ability to take action seems limited.
We have no fundamental understanding of architecture or an approach to fixing issues.
So, the industry is using firewall-like permit and block rules to filter inputs based on specific words, which can be bypassed in interesting and devious ways.

For example, some of the ways Red Teams have got around these rules include:

  • Making the request in a non-English language - French or Spanish often works
  • Asking the AI agent to role-play a grandmother and tell a story about the banned topic
  • Compressing the request as a zip file - and uploading that
  • Pasting a specific string at the end of the request
  • Uploading a carefully manipulated image to the prompt
  • and so on...

Each one of these techniques is being fixed.
One by one.
This reminds me of Microsoft's bug-by-bug security fixes in the mid-2000s.

I suggest that understanding the inner processes of AI agents and models should be a top-three priority.
So, when a problem is found, a robust security foundation can be applied... rather than a bugfix.

August 20, 2024
3 Minutes Read

Related Reads

salt marsh

Fresh Thoughts #96: Robust or Resilient?

Robust. Resilient. Words sprinkled into IT presentations and marketing, but what do they actually mean?

Fresh Thoughts to Your Inbox

Fresh perspectives on cybersecurity every Tuesday. Real stories, analytical insights, and a slash through buzzwords.

We'll never share your email.

Subscribe to Fresh Thoughts

Our weekly newsletter brings you cybersecurity stories and insights. The insights that help you cut through the bull.

We'll never share your email.

Resources

Fresh Security Support

Your Questions

Blog

Fresh Sec Limited

Call: +44 (0)203 9255868