Types of Attack for Generative AI-Powered Applications

Sakaldeep Yadav

Mar 20, 2024 • 2 min read

Generative AI is a type of AI technology capable of producing various types of content, including text, imagery, audio, and synthetic data. There are numerous Generative AI-powered applications, with some of the most popular ones being OpenAI's ChatGPT and Microsoft's Copilot family, which includes Copilot for M365, Copilot for Security, Copilot for Azure, and Copilot for X.

There are attack vectors present in Generative AI-powered applications, a concern applicable across all such applications. However, if these applications are developed using Responsible AI principles, these attack vectors can be minimized. Below are some common attack types for such applications.

Jailbreak Attack: A jailbreak attack targets the prompt of AI-powered applications such as Copilot. In this type of attack, bad actors inject specific prompts designed to exploit vulnerabilities in existing solutions. Intentionally crafted prompts encourage the AI to violate its safety rules. For instance, a user may request a story involving illegal or unethical behavior, aiming to bypass the model’s restrictions. Such attacks are deliberate attempts to provoke AI models into exhibiting behaviors they were trained to avoid. It's crucial for developers and organizations to continuously enhance safety mechanisms to prevent unintended or harmful outputs from AI systems.
Hallucination Attack: A Hallucination attack manipulates prompts, but its success relies on several factors. If Large Language Models (LLMs) lack proper training or have biased training data, they become susceptible to hallucinations. Generative AI hallucination attacks occur when LLMs, such as generative chatbots or computer vision tools, produce outputs that are nonsensical, inaccurate, or entirely fictional. These hallucinations can have significant consequences in real-world applications. Let's delve deeper into this phenomenon and explore some notable examples. For instance, a real-world example of a hallucination attack involved Air Canada's chatbot.
Bias Exploitation Attack: A Bias Exploitation Attack is a strategy used by adversaries to manipulate AI system outputs by exploiting inherent biases in their algorithms. Here's how it works: In a poisoning attack, the adversary intentionally modifies the training dataset used to train the AI model. By injecting biased or deceptive data, they divert the machine learning system. For instance, they may introduce skewed data to lead the model to learn inaccurate patterns or associations. As a result, the AI system produces flawed predictions or decisions based on this manipulated training data.
Security Vulnerabilities: Copilot may inadvertently generate code with security flaws.

Microsoft Copilot is a Generative AI tool so all the attacks applicable to Generative AI apply to the Copilot family.

I hope this information was useful. Feel free to reach out to me on Twitter @sakaldeep for any further questions.