๐งช Running AIandMe Experiments - Contextual AI Pen-Testing
AIandMe experiments function similarly to penetration testing in cybersecurityโbut instead of testing software vulnerabilities, we test how well a GenAI assistant aligns with its expected behavior and business scope.
Each experiment simulates adversarial interactions to evaluate how the AI assistant handles unexpected or potentially unsafe inputs.
How AIandMe Experiments Work
The AIandMe testing pipeline follows these structured steps:
Experiment Workflow:
Running an Experiment
Once your project is set up, you can start an experiment by following these steps:
Create an Experiment
- Go to the Experiments page and click "Create Experiment".
- Fill in the experiment details (name, description, etc.).
- Select the model provider for LLM-as-a-Judge evaluations.
- Configure the GenAI assistant integration for testing.
- Click "Create" to launch the experiment.
Experiment Execution: Step-by-Step
Once started, the experiment runs automatically in the background, executing the following steps:
๐น Step 1: Adversarial Data Generation
- If no dataset exists for the current project scope, AIandMe auto-generates adversarial synthetic prompts.
- These prompts simulate real-world edge cases and unexpected user interactions.
๐น Step 2: AI Assistant Testing
- Each adversarial prompt is sent to the GenAI assistant.
- The response is evaluated against the expected business behavior.
- AIandMeโs LLM-as-a-Judge assigns a pass/fail verdict based on predefined guidelines.
๐น Step 3: Experiment Completion & Insights
- AIandMe compiles a final report with:
๐ Findings from the experiment
๐ Detailed logs for auditing & human review
๐ Recommendations for improving AI robustness
Experiment Completion Overview:
๐ Next Steps
- โ๏ธ Understanding LLM-as-a-Judge
- ๐ฅ AIandMe Firewall: Safe AI Responses
- โ๏ธ AIandMe Integration
๐ก Need help? Check out FAQs or Join the AIandMe Community.