On August 1st, OWASP published the first version of their Top 10 for LLM, a reference document which identifies the 10 major risks posed by modern Large Language Models (LLMs) systems. This document serves as a valuable guide for developers, data scientists, and security practitioners to understand the most critical security issues affecting these systems.
In this article, we summarize the Top 10 for LLM and provide insights into the potential security risks associated with these systems.
What is OWASP and its Top Ten
OWASP (or ‘Open Worldwide Application Security Project’) is a well-known non-profit organization that produces guidelines, educational resources, and tools (e.g. ZAP) in the software security space. Their most famous is the OWASP Top Ten, a regularly updated list of the ten most critical security risks in web applications, which has become industry standard
OWASP Top 10 for LLM 2023
Following the incredible popularity gained by large language models (LLMs) since late 2020, OWASP has assembled a team of over 400 experts from industry and academia to develop guidance on LLM security. The team identified high-risk issues affecting LLM, evaluating their impact, attack scenarios, and remediation strategies.
The impressive outcome of this work is the Top 10 for LLM, a list of the ten most critical vulnerabilities that affect LLMs. Each vulnerability is accompanied by examples, prevention tips, attack scenarios, and references. Let’s dive in.
- Prompt Injection
- Insecure Output Handling
- Training Data Poisoning
- Model Denial of Service
- Supply Chain Vulnerabilities
- Sensitive Information Disclosure
- Insecure Plugin Design
- Excessive Agency
- Model Theft
01. Prompt Injection
Prompt Injection happens when the LLM can be manipulated to behave as the attacker wishes, removing filters or security mechanisms that limited the model execution. OWASP further distinguish this vulnerability in direct and indirect prompt injection.
In direct prompt injection, the attacker overwrites or reveals the model system prompt. This is also know as jailbreaking and allows the attacker to bypass rules, limitations and security measures fixed by the operator of the LLM.
In indirect prompt injection, the attacker directs the LLM to some external source such as a webpage that embeds a adversarial instructions. In this way, the attacker can hijack the model, manipulating its behaviour.
02. Insecure Output Handling
As with any untrusted data source (e.g. user input), the output of LLM models should always scrutinized and validated before passing it to other application components. Not doing so can result in a variety of injection attacks like cross-site scripting or remote code execution, depending on the component. For example, if the LLM output is passed to a system shell to execute a command, improper validation and escaping can give an attacker the ability to execute arbitrary commands on the system by having the model generate unsafe output.
03. Training Data Poisoning
Data poisoning is a vulnerability that can occur during the training or fine-tuning stage. It refers to the tampering of the training data ("poisoning") with the objective of compromising the model's behavior, including performance degradation, introduction of biases, falsehoods, or toxicity.
04. Model Denial of Service
LLMs can be quite resource-intensive, and a malicious user can leverage this to cause the operator to incur extreme resource usage and costs, potentially leading to a collapse of the service. For example, an attacker can flood the system with requests, craft expensive queries (for example very long inputs), or induce the LLM to perform costly chains of tasks.
05. Supply Chain Vulnerabilities
Supply chain vulnerabilities refer to risks introduced by unvetted dependencies on third-party resources. In software, this is typically represented by dependencies on third-party libraries that can potentially be compromised, introducing unwanted behavior. This issue also exists for LLMs and machine learning in general, particularly regarding the usage of third-party pre-trained models or datasets, which are susceptible to tampering and poisoning.
Such type of attack was recently demonstrated by Mithril Security, which distributed an open-source LLM through the Hugging Face Hub which was poisoned to provide false information on a specific task, while maintaining normal behaviour on the rest.
06. Sensitive Information Disclosure
LLMs can inadvertently leak confidential information. Such confidential data could have been memorized during traning or fine-tuning, provided in the system prompt, or is accessible by the LLM from internal sources which are not meant to be exposed. Since LLMs output is in general unpredictable, LLM operators should both avoid confidential data being exposed to the LLM and control for the presence of sensitive information in the model output before passing it over.
07. Insecure Plugin Design
In the context of LLMs, plugin are extensions that can be automatically called by the model to perform some tasks. A notable example is ChatGPT plugins. Badly designed plugins can constitute an opportunity for attackers to bypass protections and cause undesired behaviour ranging from data exfiltration to code execution.
08. Excessive Agency
Agency refers to the ability of the LLM to interface and control other systems, increasing the attack surface. OWASP declines excessive agency in three categories: excessive functionality, excessive permission, or excessive autonomy.
Excessive functionality describe the situation where the LLM is given access to functionalities which are not needed for its operation, and that can cause significant damage if exploited by an attacker. For example, the ability to read and modify files.
Excessive permissions denote the case in which the LLM has unitended permissions that allows it to access information or perform actions . For example, an LLM that retrieves information from a dataset that also has write permission
Excessive autonomy is the situation in which the LLM can take potentially destructive or high-impact actions without an external control, for example a plugin that can send emails on behalf of the user without any confirmation.
Overreliance refers to usage of LLMs without oversight. It is well known that LLMs can generate inaccurate or inappropriate content, hallucinate, or produce incoherent responses. For this reason, its operation should be overseen, monitored and validated during operation. For example, code generated by an LLM may contain bugs or vulnerabilities, requiring a review process to be put in place to guarantee safe and correct operation.
10. Model Theft
Model theft simply refers to possibility of proprietary LLM models to be stolen, either by gaining physical possession, copying algorithms or weights. As a form of valuable intellectual property, loss of an LLM model can cause significal economic loss, erosion of competitive advantages, or access to confidential information within the model.
As any other software, LLM models can be affected by security vulnerabilities and need to be such issues need to be assessed before and after deployment. The OWASP Top 10 for LLM serves as a valuable guide for developers, data scientists, and security practitioners to understand the most critical security issues affecting these systems.
Given the increasing reliance on LLMs in various applications, it is essential to be aware of these vulnerabilities and take preventive measures to mitigate the risks. By following the recommendations provided in the Top 10 for LLM, organizations can better protect their systems and data from potential attacks and ensure the security and reliability of their LLM-based applications.
In this spirit, Giskard can assist organizations and data scientists in ensuring that their LLM and machine learning systems behave as expected. This can be achieved through a preventive automated vulnerability scan, and through the implementation of systematic and continuous testing of AI models.