Navigating LLM Threats: Detecting Prompt Injections and Jailbreaks

Every time a new technology emerges, some people will inevitably attempt to use it maliciously – language models are no exception. Recently, we have seen numerous examples of attacks on large language models (LLMs). These attacks elicit restricted behavior from language models, such as producing hate speech, creating misinformation, leaking private information, or misleading the model into performing a different task than intended.

In this hands-on workshop, we will examine differences in navigating natural and algorithmic adversarial attacks, concentrating on prompt injections and jailbreaks. We first explore a few examples of how such attacks are generated via state-of-the-art cipher and language suffix approaches. We then focus on adaptive strategies for detecting these attacks in LLM-based applications using LangKit, our open-source package for feature extraction for LLM and NLP applications, with practical examples and limitation considerations. Namely, semantic similarity against a set of known attacks and LLM-based proactive detection techniques.two approaches for detecting the attacks with LangKit.

By the end of this workshop, attendees will understand:

What LLM prompt injections and jailbreaks are and measures to mitigate those attacks
How to use semantic similarity techniques to verify incoming prompts against a set of known jailbreak and prompt injection attacks
How to use LLM-based proactive detection techniques to preemptively detect prompt injection attacks

Speakers
Felipe Adachi, Applied Scientist, WhyLabs

Felipe is a Sr. Data Scientist at WhyLabs. He is a core contributor to whylogs, an open-source data logging library, and focuses on writing technical content and expanding the whylogs library in order to make AI more accessible, robust, and responsible. Previously, Felipe was an AI Researcher at WEG, where he researched and deployed Natural Language Processing approaches to extract knowledge from textual information about electric machinery. He also received his Master’s in Electronic Systems Engineering from Universidade Federal de Santa Catarina with research focused on developing and deploying fault detection strategies based on machine learning for unmanned underwater vehicles.

LinkedIn Profile: https://br.linkedin.com/in/felipe-adachi-1450b457/en

Bernease Herman, Sr. Data Scientist, WhyLabs

Bernease is a Sr. Data Scientist at WhyLabs. At WhyLabs, she is building model and data monitoring solutions using approximate statistics techniques. Earlier in her career, Bernease built ML-driven solutions for inventory planning at Amazon and conducted quantitative research at Morgan Stanley. Her ongoing academic research focuses on evaluation metrics for machine learning and LLMs. Bernease serves as faculty for the University of Washington Master’s Program in Data Science program and as chair of the Rigorous Evaluation for AI Systems (REAIS) workshop series. She has published work in top machine learning conferences and workshops such as NeurIPS, ICLR, and FAccT. She is a PhD student at the University of Washington and holds a Bachelor’s degree in mathematics and statistics from the University of Michigan.

2 Likes

Hello Gio,

Interesting!! looking forward to listen her ideas.

Regards
DP

3 Likes

I was not able to find a zoom link or something such that I should use to join this talk. Where might I find one?

2 Likes

well, actually I’m new in the community, but I think you should use the link above when the meet start.

1 Like

hello @Gotham , @SponeBob
Reserve a spot here.

1 Like

Thanks a bunch. @giovanni.lignarolo

1 Like

I’m a beginner of this topic. Anybody can help with this error, I get this when I try to run the second example.

1 Like

Kindly make a new post mentioning your issue and choosing specific specialisation, course, week and assignment name with the error you mentioned.

Posting your query comment on this thread will be misleading as you won’t get mentor response better.

Regards
DP

Hi @Deepti_Prasad ,
@raya22’s error is from the notebook shared with registered participants during this event yesterday, not from a specific course.
Here is the new topic.

1 Like

Hi lukmanaj,

One can choose short course if it was a new course or AI discussion as created by the learner now.

I am not sure of the solution but seems like all the objectives are not in correct place, giving a value error of error 404.

1 Like

All the error saying that the model: text-davinci-003 has been deprecated. I’ve tried to another model, the error still there, does it have something wrong with my account?

1 Like

No then this could be technical issue. I am not sure. Let me tag someone from the QA. Team

@chris.favila could this be issue looked upon? I wasn’t sure who would address this course issue.

Regards
DP