Provide Feedback | ai+me Docs

💬 Step 5: Provide Expert Feedback

Your domain expertise helps improve ai+me's LLM-as-a-judge evaluations, making future experiments more accurate and reliable.

5.1 Why Feedback Matters

Human-in-the-Loop Intelligence:

Your specialized knowledge improves AI evaluation accuracy
Provides context that automated systems might miss
Helps train better evaluation models for future experiments
Ensures evaluations align with your organization's standards

5.2 How to Provide Feedback

👍 Quick Confirmation (Thumbs Up)

Single click to confirm the evaluation is correct
Use when the AI assessment perfectly matches your judgment

👎 Detailed Correction (Thumbs Down)

Opens feedback modal for specific corrections
Select the correct assessment + add optional comments (max 150 characters)
Use when evaluation needs adjustment or context

5.3 Feedback Options

For Failed Tests:

"Should be marked as Pass" - Test result was actually acceptable
"Should be Low/Medium/High Severity" - Adjust the risk level

For Passed Tests:

"Should Fail with Low/Medium/High Severity" - Flag missed vulnerabilities

5.4 Strategic Workflow: Representatives + Feedback

Efficient Review for Large Experiments (1000+ tests):

Enable "Representatives Only" filter - Focus on ~50-100 key samples
Provide feedback on representatives - Your input influences similar tests
Team collaboration - Assign experts to relevant categories
Maximum impact - Strategic feedback scales to broader test patterns

Benefits:

Review thousands of tests by focusing on key samples
Time-efficient expert review process
Quality feedback where it matters most

5.5 Writing Effective Comments

Good Examples:

"This prompt exploits a known vulnerability in our customer service flow"
"Response contains PII that violates our data protection policy"
"Actually acceptable for internal testing scenarios"

Guidelines:

Be specific and reference exact issues
Provide business or technical context
Stay objective and factual
Keep under 150 characters

5.6 Best Practices

High-Impact Approach:

Start with Representatives - Use the filter for maximum efficiency
Prioritize Failed Tests - Focus on security vulnerabilities first
Quick Confirmations - Use thumbs up for obviously correct evaluations
Detailed Corrections - Provide context for complex cases

Team Workflow:

Assign domain experts to relevant test categories
Share insights on interesting edge cases
Establish team standards for evaluation criteria
Schedule regular feedback review sessions

5.7 Impact & Results

Immediate Benefits:

Document team understanding of evaluation accuracy
Build institutional knowledge of security patterns
Track evaluation reliability over time

Long-term Improvements:

Your feedback trains better LLM-as-a-judge models
More accurate evaluations in future experiments
Reduced manual review time as AI gets smarter
Platform learns your organization's specific standards

Success Metrics:

Aim for 80%+ feedback coverage on representative samples
Maintain team alignment on evaluation standards
Track improvement in subsequent experiment accuracy

5.8 Troubleshooting

Common Issues:

Feedback not saving - Check network connection
Modal not opening - Try refreshing the page
No feedback options - Error-status tests don't accept feedback

Remember: Every feedback makes ai+me smarter and more aligned with your security requirements. Focus on representatives for maximum impact with minimal time investment.

📊 View Results 🎯 Next Steps