A Crash Course into Attacking AI [2/4]: What can AI attacks do?

By: Tania Sadhani

Published on: 04/08/2024

In the second installment of our series on attacking AI, we delve into the goals of AI attacks, categorized using the 3D Model: Deceive, Disrupt, and Disclose. Deceive techniques manipulate AI systems to produce incorrect outputs by exploiting AI's complexity and lack of explainability. Disrupt techniques interfere with normal AI functions, often through data poisoning, leading to detrimental outputs or performance issues. Disclose techniques extract sensitive information by taking advantage of AI's tendency to overfit, revealing training data or proprietary models. Key case studies include malware bypassing AI detection, Microsoft's chatbot Tay being manipulated into using offensive language, and researchers extracting training images from AI models. Despite AI's rapid integration into various systems, traditional cybersecurity methods fall short due to AI's unique stochastic nature, necessitating specialized AI security measures. The article highlights the need for increased awareness and interdisciplinary approaches to enhance AI security practices.

AI Security