Fresh Thoughts #132: Red Teaming AI

Newsletter

In early August, Eric Schmidt, former CEO of Google, spoke at Stanford Engineering about AI.
The video caused some controversy, so it is no longer on the Stanford site, but it can still be found.
The one detail that attracted my attention was Eric's inclusion of "Red Teaming AI" as one of his Top 3 emerging needs in the next few years.

Red Teaming is a crucial part of all advanced cybersecurity programmes and certifications.
And I've written previously about the vulnerabilities AI faces via malicious prompt engineering.
It makes sense that red-teaming AI is important, but is it in the Top 3?

What is Red Teaming?

During a Red Team exercise or adversarial test, cybersecurity experts act like hackers to break into a computer system.
Often, the Red Team are given a target to access specific sensitive data or gain access to a particular system.

Red team exercises can be highly emotive - as a single vulnerability can appear to dismantle months of hard work.
However, I have always seen the exercise as a valuable critique of my work.
A fresh set of eyes - like asking a trusted colleague for feedback on a crucial presentation or document.

Red team exercises can be highly effective.
One example was in the mid-2000s - during the dying days of Windows XP.
Microsoft realised they couldn't continue with bug-by-bug security fixes.
Hackers were automating finding vulnerabilities - and they were discovering a lot of problems.
So Microsoft created a team of security specialists to break Windows XP and Windows Vista before the bad guys did.

This work became the foundation of new security features, ultimately making Windows 7 a watershed product that started to rebuild IT teams' trust in Microsoft's product security.
Fundamental security techniques became included in the Windows operating system.

So, to make the future of AI "secure," there will inevitably need to be some Red Teaming.

The Missing Piece in the Red Teaming AI Story

The whole point of Red Teaming is to learn from our mistakes.
To improve processes and configurations.
To take action and address the criticism.

Unfortunately, with AI, the ability to take action seems limited.
We have no fundamental understanding of architecture or an approach to fixing issues.
So, the industry is using firewall-like permit and block rules to filter inputs based on specific words, which can be bypassed in interesting and devious ways.

For example, some of the ways Red Teams have got around these rules include:

Making the request in a non-English language - French or Spanish often works
Asking the AI agent to role-play a grandmother and tell a story about the banned topic
Compressing the request as a zip file - and uploading that
Pasting a specific string at the end of the request
Uploading a carefully manipulated image to the prompt
and so on...

Each one of these techniques is being fixed.
One by one.
This reminds me of Microsoft's bug-by-bug security fixes in the mid-2000s.

I suggest that understanding the inner processes of AI agents and models should be a top-three priority.
So, when a problem is found, a robust security foundation can be applied... rather than a bugfix.

August 20, 2024

3 Minutes Read

Fresh Thoughts #132: Red Teaming AI

What is Red Teaming?

The Missing Piece in the Red Teaming AI Story

Related Reads

Fresh Thoughts #96: Robust or Resilient?

Fresh Thoughts #95: Secure Access for Microsoft 365 & Google Workspace

Fresh Thoughts #94: What is Secure Access?

Fresh Thoughts to Your Inbox

Fresh Thoughts #132: Red Teaming AI

What is Red Teaming?

The Missing Piece in the Red Teaming AI Story

.css-1fw4haz{font-family:var(--chakra-fonts-heading);font-weight:400;font-size:var(--chakra-fontSizes-4xl);line-height:1.4;color:#00a0f0;display:block;}@media screen and (min-width: 62em){.css-1fw4haz{font-size:var(--chakra-fontSizes-5xl);}}Related Reads

Fresh Thoughts #96: Robust or Resilient?

Fresh Thoughts #95: Secure Access for Microsoft 365 & Google Workspace

Fresh Thoughts #94: What is Secure Access?

Fresh Thoughts to Your Inbox

Related Reads