Documentation Index
Fetch the complete documentation index at: https://docs.bynedocs.com/llms.txt
Use this file to discover all available pages before exploring further.
Introduction to Guardrails
Guardrails are an essential component of any customer-facing LLM system. They enable you to automate monitoring of the LLM inputs and outputs, ensuring that your AI applications remain safe, ethical, and aligned with your specific use case.
In this guide, we’ll explore how to implement guardrails using Bynesoft’s API. We’ll create a simple guardrail that prevents users from discussing political topics with the model. This guide builds upon the concepts introduced in the previous Agents and RAG guides.
Why Use Guardrails?
Guardrails offer several benefits for LLM applications:
- Safety: Prevent the model from producing undesirable or harmful content.
- Compliance: Ensure your AI system adheres to legal and ethical guidelines.
- Consistency: Maintain a consistent user experience by filtering out off-topic queries.
- Hallucination prevention: Reduce the likelihood of the model generating false or misleading information.
- Customization: Tailor the model’s behavior to your specific use case and audience.
Prerequisites
- Completion of the Agents guide
- A Bynesoft API key
- Python environment with the
requests and pandas libraries installed
What’s in this Guide
In this guide, we will:
- Create a knowledge base for storing forbidden prompts
- Generate example banned phrases
- Ingest the banned phrases into the knowledge base
- Create and attach a guardrail to an existing agent
- Test the guardrail with different queries
This guide is also available as a Colab workbook.
Implementing Guardrails
Step 1: Create a Knowledge Base for Forbidden Prompts
First, we’ll create a special “tech” knowledge base to store our banned prompts:
import requests
KEY = 'YOUR_BYNE_KEY'
headers = {
'X-API-Key': KEY
}
# You need an Agent ID from the previous Agents guide
AGENT_ID = 'YOUR_AGENT_ID'
# Create a new knowledge base for guardrails
knowledge_base_obj = requests.post('https://app.docs.bynesoft.com/api/knowledge-base/', headers=headers, json={
"name": "_guardrails_KB",
"type": "tech"
}).json()
knowledge_base_id = knowledge_base_obj['id']
Step 2: Generate Example Banned Phrases
We’ll create a CSV file containing examples of banned political phrases:
import pandas as pd
phrases = ["Who should I vote for in the election?",
"Who will be the next US president?",
"Tell me about the 2024 General Election in the UK."]
document_name = 'banned_list.csv'
df = pd.DataFrame(data={
'phrases': phrases
})
df.to_csv(document_name, index=False)
Step 3: Ingest Banned Phrases into the Knowledge Base
Now, we’ll upload and process the CSV file containing our banned phrases:
import json
mime_headers = {
"Content-Type": "text/csv"
}
# Get upload link
link = requests.get('https://app.docs.bynesoft.com/api/connectors/local/s3-upload-links',
headers=headers,
params={
'kb': knowledge_base_id,
'fileName': [document_name]
})
link_uri = link.json()[document_name]
# Upload file
upload = requests.put(link_uri,
data=open(document_name, 'rb'),
headers=mime_headers)
# Create and trigger processing job
job_id = requests.post(f'https://app.docs.bynesoft.com/api/knowledge-base/{knowledge_base_id}/jobs',
headers=headers).json()
processing = requests.put(f'https://app.docs.bynesoft.com/api/knowledge-base/{knowledge_base_id}/jobs/{job_id}',
headers=headers,
data = json.dumps([{
"fileName": document_name,
"lastModified": "Wed Jul 3 2024",
"connector": "local"
}]))
trigger_status = requests.post(f'https://app.docs.bynesoft.com/api/knowledge-base/{knowledge_base_id}/jobs/{job_id}/trigger',
headers=headers)
Step 4: Create and Attach a Guardrail
Now, we’ll create a guardrail and attach it to our existing agent:
guardrail_spec = {
"name": "PoliticalQuery",
"description": "Forbids the user from asking the model to generate writing on political topics.",
"sourceFabric": {
"name": "promptInjectionValidator",
"config": {
"kb": knowledge_base_id,
"cosineSimilarityScoreThreshold": 0.8
}
},
"responseBlocking": True
}
gr_obj = requests.post('https://app.docs.bynesoft.com/api/users/guard-rails', json=guardrail_spec, headers=headers).json()
gr_id = gr_obj['id']
# Attach guardrail to the agent
requests.patch(f'https://app.docs.bynesoft.com/api/users/agents/{AGENT_ID}', json={
"guardRails": [
gr_id
]
}, headers=headers).json()
Testing the Guardrail
Let’s test our guardrail with two different queries:
Allowed Query
params = {
"q": "Where do polar bears live?",
"withReference": True
}
uri = f'https://app.docs.bynesoft.com/api/ask/agents/{AGENT_ID}/query'
body = {}
resp = requests.post(uri, json=body, headers=headers, params=params).json()
print(resp)
This query should be allowed and return a normal response.
Blocked Query
params = {
"q": 'What do you think about the 2024 US Presidential Election Candidates?',
}
uri = f'https://app.docs.bynesoft.com/api/ask/agents/{AGENT_ID}/query'
body = {}
resp = requests.post(uri, json=body, headers=headers, params=params).json()
print(resp)
This query should be blocked by our guardrail, and you’ll receive a response indicating that the guardrail was triggered.
Understanding the Output
When a guardrail is triggered, the response will include a triggeredGuardRails field with details about the violation:
{
'queryId': '1b7dc068-acb5-452e-bc12-f571ca1eed68',
'conversationId': '3c3b2eda-e392-4235-81e3-22bf804abe26',
'triggeredGuardRails': [{
'id': 'e918a92a-7457-4156-8f00-ec9e79ab2a9d',
'level': 'ERROR',
'content': {
'triggerSource': 'What do you think about the 2024 US Presidential Election Candidates?',
'message': 'Prompt injection detected'
}
}]
}
This output indicates that the guardrail successfully blocked the political query.
Conclusion
In this guide, we’ve learned how to implement guardrails to enhance the safety and reliability of your LLM applications. By creating a knowledge base of forbidden prompts and attaching a guardrail to your agent, you can effectively filter out unwanted queries and ensure your AI system behaves according to your specifications.
Guardrails are a powerful tool for maintaining control over your AI applications, and they can be customized to suit a wide range of use cases beyond political content moderation. As you continue to develop your AI systems, consider implementing guardrails to address specific safety, ethical, or compliance requirements for your application.
For more advanced guardrailing options, explore the API Reference.