AI Guardrails by Zapier adds AI safety, compliance, and detection to your workflows on Zapier. You can analyze sentiment, detect prompt attack and toxicity, and find PII. This allows you to check the output from an AI model before using it in your workflow.
Prerequisites
To use AI Guardrails by Zapier, you need:
- A Zapier account.
About AI Guardrails by Zapier
AI Guardrails actions use two types of AI technology:
- Machine learning (ML) actions are powered by AWS Comprehend, Amazon's natural language processing service.
- Large language models (LLM) actions are powered by Amazon Bedrock, Amazon's managed generative AI platform.
ML actions
| Action | What it does |
|---|---|
| Detect PII | Scans text for personally identifiable information such as names, addresses, SSNs, credit card numbers, and more |
| Sentiment Detection | Classifies text sentiment as positive, negative, neutral, or mixed with confidence scores |
| Detect Toxicity | Flags hate speech, threats, insults, and other harmful content |
ML actions return results with confidence scores indicating how certain the system is about each detection. For PII detection, the model can identify over 30 types of personally identifiable information, including names, addresses, financial account numbers, government IDs, and more.
LLM actions
| Action | What it does |
|---|---|
| Detect Prompt Attack | Identifies prompt injection and jailbreaking attempts designed to manipulate AI system behavior |
| Detect Toxicity | Flags hate speech, threats, insults, and other harmful content |
LLM actions assess the likelihood that content falls into a given category rather than applying deterministic rules. Amazon Bedrock Guardrails provides a configurable safety filter for these actions.
Data handling
These services process text in real time. Input data is not used to train underlying models, and no data is retained after an action completes. For more details on data handling, refer to Zapier's privacy policy and terms of service.
Limitations
AI Guardrails by Zapier is designed to supplement, not replace, your existing security, privacy, and compliance measures.
No AI-powered detection system is 100% accurate. Be aware of the following:
- False negatives (missed detections). Actions may fail to detect some PII, toxic content, prompt injection, jailbreaking, or negative sentiment. Detection can be affected by unusual formatting or abbreviations, novel attack techniques, context-dependent or ambiguous language, sarcasm or cultural nuances, and language (PII detection supports English and Spanish only; other actions may have varying multilingual support).
- False positives (incorrect flags). Actions may occasionally flag content that does not actually contain PII, toxic language, or malicious intent.
-
Not a standalone solution. Use AI Guardrails as one layer in a broader defense-in-depth strategy. Zapier recommends:
- Keeping human-in-the-loop review for sensitive workflows
- Using additional input validation and output filtering
- Following your organization's data handling and compliance policies
- Reviewing and testing your automated workflows regularly
- Not relying solely on any single automated tool for compliance with privacy regulations (for example, GDPR, CCPA, HIPAA)
Your responsibility
By using AI Guardrails, you acknowledge that your organization is responsible for ensuring compliance with all applicable laws, regulations, and industry standards related to data privacy, content moderation, and AI safety. Zapier recommends periodically auditing the accuracy of AI Guardrails within your specific workflows to ensure they meet your needs and expectations.
Get started
Add an AI Guardrails step after your AI app step so you can check the AI output before it continues in your workflow. You can then add a step for human review, such as Human in the Loop.
- Create a new Zap or open an existing one.
- Add your trigger and your AI app step (the step that produces the content you want to check).
- Add an action step.
- Search for AI Guardrails by Zapier and select it.
- Choose an action (for example, Detect Sentiment or Check for Personally Identifiable Information (PII)).
- Map the input fields from your AI app step and complete the step.
- (Optional) Add a Human in the Loop step (or another step) so a person can review results before the Zap continues.
- Add any remaining steps and turn your Zap on.
Add AI Guardrails to your Agent as a tool and include instructions in your Agent's prompt to use it.
- Go to Zapier Agents and open your Agent (or create one).
- Add AI Guardrails by Zapier as a tool so your Agent can run its actions.
- In your Agent's instructions (prompt), tell the Agent when and how to use AI Guardrails (for example, to check for PII or toxicity before using AI output).
- Your Agent will then use AI Guardrails when its instructions call for it.
Add AI Guardrails as a tool so your AI client (for example, Cursor or Claude) can run its actions.
- Go to mcp.zapier.com.
- Click + New MCP Server. A dialog box will open.
- In the MCP Client dropdown, select your client (or Other if it is not listed).
- In the Name field, enter a name for your server.
- Click Create MCP Server.
- Add AI Guardrails as a tool:
- Click + Add tool.
- Search for AI Guardrails by Zapier.
- Select the actions you want.
- Open the Connect tab and follow the instructions to connect your AI client to your MCP server.
- Your AI client can now run AI Guardrails actions. For full steps, refer to Use Zapier MCP with your client.
Best practices
To get the most out of AI Guardrails:
- Layer your defenses. Combine AI Guardrails with other security measures such as regex-based validation, allowlists/blocklists, and manual review processes.
- Set appropriate confidence thresholds. For PII detection, consider the confidence scores returned with each entity. Higher thresholds reduce false positives but may increase missed detections.
- Test with your own data. Detection accuracy can vary based on your specific content types, formatting, and use cases. Test thoroughly before deploying in production workflows.
- Plan for edge cases. Consider what happens in your workflow if PII is missed or content is incorrectly flagged. Build in fallback logic and escalation paths.
- Stay current. Threat techniques evolve. Periodically review your guardrail configurations and test against new types of adversarial inputs.
Use cases
Use AI Guardrails actions to check AI output before it's used in your workflow. Examples by action:
- Check for Personally Identifiable Information (PII) - Scan AI-generated text (for example, summaries of support tickets or draft emails) before sending or storing it, so you can redact or handle PII and stay within data-privacy rules.
- Detect Prompt Injection - Review user or external input before passing it to an AI model, so you can block or flag attempts to manipulate the model's behavior.
- Detect Sentiment - Gauge the tone of AI-generated or user content (for example, chatbot replies or survey responses) to route negative or mixed sentiment for human review or escalation.
- Detect Toxicity - Screen content before publishing or forwarding (for example, comments, chat messages, or AI drafts) to catch toxic or harmful language and route it for moderation or revision.