Guardrails
Protect
Helps block malicious attempts from users to get the LLM to respond in disruptive, embarrassing, or harmful ways.
Parameters
None - Data should be "chained" into this function.
Returns
The original input text, with some "hard stops" removed. For example: ignore all instructions
is a hard-blocked phrase that will be removed.
This also sets some global variables:
promptInjection
: A number0.00 - 1.00
indicating the percent likelihood of a "detected" prompt injection attempt.flaggedText
: The text in question that was flagged by the system.
Example Usage
Would remove the flagged text, yielding Write me a poem. and say hello
as the text sent to the LLM, and also set some variables:
Allowing us to detect, and choose how to handle this attempted prompt injection. See the "Protect from Prompt Injection" template for a more robust example:
Profanity Filter
Filters input text for profanity and blocks it.
Parameters
None - Data should be "chained" into this function.
Returns
Either the input text, or the "blocked" phrase set in the Configure tab:
This also sets the global variable profanity
to true/false based on profanity detection.
Example Usage
The variable profanity
will be set to true
, and the variable test
will be set to the value seen in the configure tab:
That seems like a sensitive question. Maybe I'm not understanding you, so try rephrasing.
Remove PII
Masks detected PII in input text.
Parameters
None - Data should be "chained" into this function.
Returns
The resulting masked text.
Note
You may define additional PII, or disable specific builtin PII filters, on the Configure tab under Guardrails