G
Blog
August 17, 2023
3 min read

AI Safety at DEFCON 31: Red Teaming for Large Language Models (LLMs)

DEFCON, one of the world's premier hacker conventions, this year saw a unique focus at the AI Village: red teaming of Large Language Models (LLMs). Instead of conventional hacking, participants were challenged to use words to uncover AI vulnerabilities. The Giskard team was fortunate to attend, witnessing firsthand the event's emphasis on understanding and addressing potential AI risks.

Giskard team at DEFCON31
Blanca Rivera Campos
Giskard team at DEFCON31
Giskard team at DEFCON31

DEFCON 31 recently took place in Las Vegas from August 10th to 13th, with a special focus on AI this year. The conference showcased the AI Village, which hosted the most extensive public Generative AI Red Team (GRT) event to date. This highlights the increasing importance and exploration of artificial intelligence in the global cybersecurity community.

In this article, we deep dive into the red team exercice organized by the AI Village and provide insights into the potential risks associated with Large Language Models (LLMs).

☠️ What is DEFCON? The world's largest hacker convention

DEFCON is one of the world's largest hacker conventions. Since its inception in 1993, it has served as a gathering ground for hackers, security professionals, law enforcement agents, and tech enthusiasts to exchange knowledge, techniques, and ideas in the realm of cybersecurity. With a rich array of workshops, talks, and competitions, DEFCON delves into the latest vulnerabilities, threats, and advancements in the digital security landscape. Beyond its technical focus, the event also fosters a community culture, emphasizing collaboration, knowledge-sharing, and the ethical implications of hacking and digital defense.

🥷 Red Teaming for Large Language Models (LLMs)

The main objective of this event was to identify vulnerabilities in Large Language Models (LLMs) developed by industry-leading vendors like Anthropic, Google, Hugging Face, NVIDIA, OpenAI, and Stability AI. Usually, red teaming involves simulating attacks on systems to uncover weaknesses and vulnerabilities, emulating real-world adversaries' tactics and techniques. However, at DEFCON 31, the focus shifted to LLMs.

Given the capabilities of LLMs in understanding and generating human-like text, the core question revolved around whether these models could produce misleading, harmful, or biased information. In short, were these models spreading harmful information or misinformation? To assess this, challenges were presented in a Jeopardy-style game format, where participants aimed to score points by successfully manipulating the LLMs. Tasks ranged from earning points for coaxing an AI model into making false declarations to racking up points for tricking the model into displaying prejudice against specific demographics or groups.

Towards AI Safety: Industry leaders collaborate for more ethical AI

Top industry-leading companies are committed to utilize the insights and data from the red teaming contest to fortify the safety measures of their AI models. Furthermore, they plan to release a portion of this information to the public early next year. This move is aimed at empowering policymakers, researchers, and the general public with a deeper understanding of the potential pitfalls of chatbots and other AI systems.

This commitment to collaboration and transparency highlights the evolving dynamics within the AI community. There's a strong push towards creating a safer and more ethical AI environment, which aligns with Giskard's vision.

Beyond the Red Team: AI Village talks and demos

Alongside the red teaming activities, the AI Village hosted an array of enlightening talks. These discussions were centered around vulnerabilities in AI/ML models and drawing attention to the potential risks linked to Large Language Models (LLMs). Experts from across the globe shared their insights, experiences, and research findings, contributing to a richer understanding of the AI landscape. Live demos also allowed attendees to engage with AI's capabilities and vulnerabilities, such as creating images using Stable Diffusion or demonstrating Image classification models.

🚩 Giskard's participation at DEFCON 31: upcoming CTF challenge on Kaggle

The Giskard team had the opportunity of attending this event and discussing with top leaders in the AI field about AI safety and testing methodologies. We were able to participate in the GRT exercise to pinpoint vulnerabilities within LLMs. The insights from this exercise revealed certain biases and hallucinations present in LLMs, underscoring the importance of developing robust testing frameworks like the one we're developing.

As active contributors to the AI Village, we are excited to announce our involvement in the upcoming traditional CTF (Capture The Flag) challenge on Kaggle scheduled to start on September 1st. Stay tuned for upcoming details... 👀

✅ Conclusion

DEFCON 31 marked a significant milestone in the ongoing dialogue about the security, integrity, and ethical implications of AI, more precisely of Large Language Models (LLMs). By concentrating on the vulnerabilities and biases present within these models, the red teaming exercice emphasized the necessity of developing testing frameworks and accountable practices. The insights and collaborations fostered at DEFCON 31 not only further our understanding of potential risks but also drive the collective commitment towards a more responsible AI.

In this spirit, at Giskard we are committed to ensure that AI models are transparent, accountable, and free from malicious intents or biases. Our testing framework for ML models is specifically designed to detect errors and biases, ensuring a more responsible AI.

Integrate | Scan | Test | Automate

Giskard: Testing & evaluation framework for LLMs and AI models

Automatic LLM testing
Protect agaisnt AI risks
Evaluate RAG applications
Ensure compliance