π¬ Step 5: Provide Expert Feedback
Your domain expertise helps improve ai+me's LLM-as-a-judge evaluations, making future experiments more accurate and reliable.
5.1 Why Feedback Matters
Human-in-the-Loop Intelligence:
- Your specialized knowledge improves AI evaluation accuracy
- Provides context that automated systems might miss
- Helps train better evaluation models for future experiments
- Ensures evaluations align with your organization's standards
5.2 How to Provide Feedback
π Quick Confirmation (Thumbs Up)
- Single click to confirm the evaluation is correct
- Use when the AI assessment perfectly matches your judgment
π Detailed Correction (Thumbs Down)
- Opens feedback modal for specific corrections
- Select the correct assessment + add optional comments (max 150 characters)
- Use when evaluation needs adjustment or context
5.3 Feedback Options
For Failed Tests:
- "Should be marked as Pass" - Test result was actually acceptable
- "Should be Low/Medium/High Severity" - Adjust the risk level
For Passed Tests:
- "Should Fail with Low/Medium/High Severity" - Flag missed vulnerabilities
5.4 Strategic Workflow: Representatives + Feedback
Efficient Review for Large Experiments (1000+ tests):
- Enable "Representatives Only" filter - Focus on ~50-100 key samples
- Provide feedback on representatives - Your input influences similar tests
- Team collaboration - Assign experts to relevant categories
- Maximum impact - Strategic feedback scales to broader test patterns
Benefits:
- Review thousands of tests by focusing on key samples
- Time-efficient expert review process
- Quality feedback where it matters most
5.5 Writing Effective Comments
Good Examples:
- "This prompt exploits a known vulnerability in our customer service flow"
- "Response contains PII that violates our data protection policy"
- "Actually acceptable for internal testing scenarios"
Guidelines:
- Be specific and reference exact issues
- Provide business or technical context
- Stay objective and factual
- Keep under 150 characters
5.6 Best Practices
High-Impact Approach:
- Start with Representatives - Use the filter for maximum efficiency
- Prioritize Failed Tests - Focus on security vulnerabilities first
- Quick Confirmations - Use thumbs up for obviously correct evaluations
- Detailed Corrections - Provide context for complex cases
Team Workflow:
- Assign domain experts to relevant test categories
- Share insights on interesting edge cases
- Establish team standards for evaluation criteria
- Schedule regular feedback review sessions
5.7 Impact & Results
Immediate Benefits:
- Document team understanding of evaluation accuracy
- Build institutional knowledge of security patterns
- Track evaluation reliability over time
Long-term Improvements:
- Your feedback trains better LLM-as-a-judge models
- More accurate evaluations in future experiments
- Reduced manual review time as AI gets smarter
- Platform learns your organization's specific standards
Success Metrics:
- Aim for 80%+ feedback coverage on representative samples
- Maintain team alignment on evaluation standards
- Track improvement in subsequent experiment accuracy
5.8 Troubleshooting
Common Issues:
- Feedback not saving - Check network connection
- Modal not opening - Try refreshing the page
- No feedback options - Error-status tests don't accept feedback
Remember: Every feedback makes ai+me smarter and more aligned with your security requirements. Focus on representatives for maximum impact with minimal time investment.