Skip to main content
Roger Lam

Guardrails for LLM Chatbots (March 2024)

LLMs are great at generating text. Trained on the corpus of the internet, fine-tuned to respond in a human-centric manner. LLMs from OpenAI, Google, Anthropic, and more are battling to convince people to use their models over others. Open source models from Meta and Mistral give an alternative to people that want to host their own models too.

The use case for chatbots is compelling. As the Turing Test and Westworld proposed, "If you can't tell, does it matter?". Klarna said it handles 2/3rds of customer support tasks with chatbots with similar satisfaction ratings to human agents.

Throwing an LLM into your app also doesn't seem like the best approach. A Chevy dealership got pwned to promote its competitor. Air Canada was ordered to be responsible for hallucinated responses around it's bereavement policy.

Call it automation or 2023's word of the year, enshittification, companies are looking for ways to cut costs and increase efficiency. Gartner predicts that chatbots will be the primary customer service channel for roughly a quarter of organizations by 2027.

Whether it's a chatbot, a recommendation system, or a content generator, the guardrails are the same. Left as an afterthought and you'll be in the news for the wrong reasons.

Below is a proposal I did for prototyping guardrails around LLM chatbots.

Overview

Gartner predicts that chatbots will be the primary customer service channel for roughly a quarter of organizations by 2027[1].

While LLMs can produce higher quality conversational interactions, the inherent generative nature remains a risk.

Our goal is to create a reliable and robust LLM-powered chatbot with the latest safety and guardrail tools and methodologies. Additionally, we want to maintain a high level of transparency and modularity to monitor, debug, and improve the system over time.

Core Components

Base model

We lean on foundation models with higher levels of safety and alignment[2]. Our implementation will also support multiple models for our own domain-specific benchmarking and the ability to fallback if there are external provider outages.

Framework

Nvidia’s NeMo-Guardrails is an open-source toolkit for adding programmable guardrails to LLM-based conversational systems. The built-in guardrails allow us to quickly get started with checks - with, for example, jailbreak detection and input/output moderation. While NeMo relies on using LLM calls to steer dialog and perform most of the guardrail checks, it can also integrate other endpoints, APIs, and approaches.

External guardrails to add
Other considerations

Evaluation

When building and deploying LLM-based chatbots, one aspect that shouldn’t be overlooked is evaluation. How do the different layers of guardrails impact the reliability of our chatbot? What can we test ahead of time? When we find a gap or vulnerability, how confident are we in fixing it and not breaking something else?

NeMo provides an evaluation tool and red-teaming interface to help streamline evaluation.

We also need to evaluate the chatbot based on performance - including accuracy, fluency, coherence and relevance. This will include automated and human evaluation. Creating the test set and scenarios, continuously evaluating, monitoring and improving is necessary to uphold the highest standards for customer satisfaction.

Every domain vertical will require its own test set. Each customer will require a further specialization. While it presents additional challenges, it’s also an opportunity to provide the most robust and reliable system to our customers.

Monitoring

On the chat level, all conversations will be accessible through an admin panel. Future iterations can include customer service representatives joining the conversation as needed.

Internally, all LLM calls will be instrumented and monitored. We know some major LLM providers have inconsistent response times and chaining multiple guardrails can lead to long latency. This will be a trade-off we will continually monitor.

Timeline

For the prototype, we’re aiming for a 6-8 week release, running in 2 week sprints.

Roughly, we want to spend 50% on building the chatbot and 50% on evaluation and monitoring.

Out of Scope