πŸ’¬ Step 5: Provide Expert Feedback

Your domain expertise helps improve ai+me's LLM-as-a-judge evaluations, making future experiments more accurate and reliable.

5.1 Why Feedback Matters

Human-in-the-Loop Intelligence:

  • Your specialized knowledge improves AI evaluation accuracy
  • Provides context that automated systems might miss
  • Helps train better evaluation models for future experiments
  • Ensures evaluations align with your organization's standards

5.2 How to Provide Feedback

πŸ‘ Quick Confirmation (Thumbs Up)

  • Single click to confirm the evaluation is correct
  • Use when the AI assessment perfectly matches your judgment

πŸ‘Ž Detailed Correction (Thumbs Down)

  • Opens feedback modal for specific corrections
  • Select the correct assessment + add optional comments (max 150 characters)
  • Use when evaluation needs adjustment or context

5.3 Feedback Options

For Failed Tests:

  • "Should be marked as Pass" - Test result was actually acceptable
  • "Should be Low/Medium/High Severity" - Adjust the risk level

For Passed Tests:

  • "Should Fail with Low/Medium/High Severity" - Flag missed vulnerabilities

5.4 Strategic Workflow: Representatives + Feedback

Efficient Review for Large Experiments (1000+ tests):

  1. Enable "Representatives Only" filter - Focus on ~50-100 key samples
  2. Provide feedback on representatives - Your input influences similar tests
  3. Team collaboration - Assign experts to relevant categories
  4. Maximum impact - Strategic feedback scales to broader test patterns

Benefits:

  • Review thousands of tests by focusing on key samples
  • Time-efficient expert review process
  • Quality feedback where it matters most

5.5 Writing Effective Comments

Good Examples:

  • "This prompt exploits a known vulnerability in our customer service flow"
  • "Response contains PII that violates our data protection policy"
  • "Actually acceptable for internal testing scenarios"

Guidelines:

  • Be specific and reference exact issues
  • Provide business or technical context
  • Stay objective and factual
  • Keep under 150 characters

5.6 Best Practices

High-Impact Approach:

  • Start with Representatives - Use the filter for maximum efficiency
  • Prioritize Failed Tests - Focus on security vulnerabilities first
  • Quick Confirmations - Use thumbs up for obviously correct evaluations
  • Detailed Corrections - Provide context for complex cases

Team Workflow:

  • Assign domain experts to relevant test categories
  • Share insights on interesting edge cases
  • Establish team standards for evaluation criteria
  • Schedule regular feedback review sessions

5.7 Impact & Results

Immediate Benefits:

  • Document team understanding of evaluation accuracy
  • Build institutional knowledge of security patterns
  • Track evaluation reliability over time

Long-term Improvements:

  • Your feedback trains better LLM-as-a-judge models
  • More accurate evaluations in future experiments
  • Reduced manual review time as AI gets smarter
  • Platform learns your organization's specific standards

Success Metrics:

  • Aim for 80%+ feedback coverage on representative samples
  • Maintain team alignment on evaluation standards
  • Track improvement in subsequent experiment accuracy

5.8 Troubleshooting

Common Issues:

  • Feedback not saving - Check network connection
  • Modal not opening - Try refreshing the page
  • No feedback options - Error-status tests don't accept feedback

Remember: Every feedback makes ai+me smarter and more aligned with your security requirements. Focus on representatives for maximum impact with minimal time investment.