Every time a new technology emerges, some people will inevitably attempt to use it maliciously – language models are no exception. Recently, we have seen numerous examples of attacks on large language models (LLMs). These attacks elicit restricted behavior from language models, such as producing hate speech, creating misinformation, leaking private information, or misleading the model into performing a different task than intended.
In this hands-on workshop, we will examine differences in navigating natural and algorithmic adversarial attacks, concentrating on prompt injections and jailbreaks. We first explore a few examples of how such attacks are generated via state-of-the-art cipher and language suffix approaches. We then focus on adaptive strategies for detecting these attacks in LLM-based applications using LangKit, our open-source package for feature extraction for LLM and NLP applications, with practical examples and limitation considerations. Namely, semantic similarity against a set of known attacks and LLM-based proactive detection techniques.two approaches for detecting the attacks with LangKit.
By the end of this workshop, attendees will understand:
What LLM prompt injections and jailbreaks are and measures to mitigate those attacks
How to use semantic similarity techniques to verify incoming prompts against a set of known jailbreak and prompt injection attacks
How to use LLM-based proactive detection techniques to preemptively detect prompt injection attacks
Felipe Adachi, Applied Scientist, WhyLabs
Felipe is a Sr. Data Scientist at WhyLabs. He is a core contributor to whylogs, an open-source data logging library, and focuses on writing technical content and expanding the whylogs library in order to make AI more accessible, robust, and responsible. Previously, Felipe was an AI Researcher at WEG, where he researched and deployed Natural Language Processing approaches to extract knowledge from textual information about electric machinery. He also received his Master’s in Electronic Systems Engineering from Universidade Federal de Santa Catarina with research focused on developing and deploying fault detection strategies based on machine learning for unmanned underwater vehicles.
LinkedIn Profile: https://br.linkedin.com/in/felipe-adachi-1450b457/en
Bernease Herman, Sr. Data Scientist, WhyLabs
Bernease is a Sr. Data Scientist at WhyLabs. At WhyLabs, she is building model and data monitoring solutions using approximate statistics techniques. Earlier in her career, Bernease built ML-driven solutions for inventory planning at Amazon and conducted quantitative research at Morgan Stanley. Her ongoing academic research focuses on evaluation metrics for machine learning and LLMs. Bernease serves as faculty for the University of Washington Master’s Program in Data Science program and as chair of the Rigorous Evaluation for AI Systems (REAIS) workshop series. She has published work in top machine learning conferences and workshops such as NeurIPS, ICLR, and FAccT. She is a PhD student at the University of Washington and holds a Bachelor’s degree in mathematics and statistics from the University of Michigan.