The landscape of LLM guardrails: intervention levels and techniques
An introduction to LLM guardrails
The capacity of the latest Large Language Models (LLMs) to process and produce highly coherent human-like texts opens up a large potential to exploit LLMs for a wide variety of applications, like content creation for blogs and marketing, customer service chatbots, education and e-learning, medical assistance and legal support. Using LLM-based chatbots also has its risks — recently, many incidents have been reported, like chatbots permitting to buy a car for 1$ , guaranteeing airplane ticket discounts , indulging in offensive language use, providing false information, or assisting with unethical user requests like how to build molotov cocktails . Especially when LLM-based applications are taken in production for public use, the importance of guardrails to ensure safety and reliability becomes even more critical.