Navigating LLM Threats: Detecting Prompt Injections and Jailbreaks

giovanni.lignarolo · December 26, 2023, 4:08pm

Every time a new technology emerges, some people will inevitably attempt to use it maliciously – language models are no exception. Recently, we have seen numerous examples of attacks on large language models (LLMs). These attacks elicit restricted behavior from language models, such as producing hate speech, creating misinformation, leaking private information, or misleading the model into performing a different task than intended.

In this hands-on workshop, we will examine differences in navigating natural and algorithmic adversarial attacks, concentrating on prompt injections and jailbreaks. We first explore a few examples of how such attacks are generated via state-of-the-art cipher and language suffix approaches. We then focus on adaptive strategies for detecting these attacks in LLM-based applications using LangKit, our open-source package for feature extraction for LLM and NLP applications, with practical examples and limitation considerations. Namely, semantic similarity against a set of known attacks and LLM-based proactive detection techniques.two approaches for detecting the attacks with LangKit.

By the end of this workshop, attendees will understand:

What LLM prompt injections and jailbreaks are and measures to mitigate those attacks
How to use semantic similarity techniques to verify incoming prompts against a set of known jailbreak and prompt injection attacks
How to use LLM-based proactive detection techniques to preemptively detect prompt injection attacks

Speakers
Felipe Adachi, Applied Scientist, WhyLabs

Felipe is a Sr. Data Scientist at WhyLabs. He is a core contributor to whylogs, an open-source data logging library, and focuses on writing technical content and expanding the whylogs library in order to make AI more accessible, robust, and responsible. Previously, Felipe was an AI Researcher at WEG, where he researched and deployed Natural Language Processing approaches to extract knowledge from textual information about electric machinery. He also received his Master’s in Electronic Systems Engineering from Universidade Federal de Santa Catarina with research focused on developing and deploying fault detection strategies based on machine learning for unmanned underwater vehicles.

LinkedIn Profile: https://br.linkedin.com/in/felipe-adachi-1450b457/en

Bernease Herman, Sr. Data Scientist, WhyLabs

Bernease is a Sr. Data Scientist at WhyLabs. At WhyLabs, she is building model and data monitoring solutions using approximate statistics techniques. Earlier in her career, Bernease built ML-driven solutions for inventory planning at Amazon and conducted quantitative research at Morgan Stanley. Her ongoing academic research focuses on evaluation metrics for machine learning and LLMs. Bernease serves as faculty for the University of Washington Master’s Program in Data Science program and as chair of the Rigorous Evaluation for AI Systems (REAIS) workshop series. She has published work in top machine learning conferences and workshops such as NeurIPS, ICLR, and FAccT. She is a PhD student at the University of Washington and holds a Bachelor’s degree in mathematics and statistics from the University of Michigan.

Deepti_Prasad · December 26, 2023, 5:50pm

Hello Gio,

Interesting!! looking forward to listen her ideas.

Regards
DP

Gotham · January 8, 2024, 8:57pm

I was not able to find a zoom link or something such that I should use to join this talk. Where might I find one?

SponeBob · January 9, 2024, 2:41pm

well, actually I’m new in the community, but I think you should use the link above when the meet start.

giovanni.lignarolo · January 9, 2024, 3:27pm

hello @Gotham , @SponeBob
Reserve a spot here.

SponeBob · January 9, 2024, 3:46pm

Thanks a bunch. @giovanni.lignarolo

raya22 · January 10, 2024, 1:28am

I’m a beginner of this topic. Anybody can help with this error, I get this when I try to run the second example.

Deepti_Prasad · January 10, 2024, 6:47am

Kindly make a new post mentioning your issue and choosing specific specialisation, course, week and assignment name with the error you mentioned.

Posting your query comment on this thread will be misleading as you won’t get mentor response better.

Regards
DP

lukmanaj · January 10, 2024, 2:38pm

Hi @Deepti_Prasad ,
@raya22’s error is from the notebook shared with registered participants during this event yesterday, not from a specific course.
Here is the new topic.

Deepti_Prasad · January 10, 2024, 2:43pm

Hi lukmanaj,

One can choose short course if it was a new course or AI discussion as created by the learner now.

I am not sure of the solution but seems like all the objectives are not in correct place, giving a value error of error 404.

raya22 · January 10, 2024, 4:14pm

All the error saying that the model: text-davinci-003 has been deprecated. I’ve tried to another model, the error still there, does it have something wrong with my account?

Deepti_Prasad · January 10, 2024, 4:57pm

No then this could be technical issue. I am not sure. Let me tag someone from the QA. Team

@chris.favila could this be issue looked upon? I wasn’t sure who would address this course issue.

Regards
DP

Topic		Replies	Views
🔴 New short course built in collaboration with Giskard: Red Teaming LLM Applications News and Announcements short-course , dl-ai-learning-platform	3	292	April 8, 2024
Any luck with preventing prompt injection based on setup outlined in the course? ChatGPT Prompt Engineering for Developers	2	122	June 2, 2023
Prompt based task training Generative AI with Large Language Models week-module-2	2	517	July 11, 2023
[AI Projects] LLMs Exploits Events	3	1031	March 27, 2024
Prompt injections in Guidelines ChatGPT Prompt Engineering for Developers	6	275	May 7, 2023

Navigating LLM Threats: Detecting Prompt Injections and Jailbreaks

Related topics