← All docs

Evaluation template

Report findings from your test with a standard format.

RAGfly — Evaluation Report Template

Integrator: [name / organization]
Date: [YYYY-MM-DD]
Environment tested: production (api.ragfly.ai)
Tenant(s) used: A / B / both


Executive summary

[2–3 sentences: what was tested, overall result, most important finding]


Test battery checklist

Battery Status Notes
1.1 JWT login ✅ / ❌ / ⏭ skipped
1.2 GET /auth/me ✅ / ❌ / ⏭
1.3 Create API Key ✅ / ❌ / ⏭
2.1 List documents ✅ / ❌ / ⏭
2.2 Semantic search Tenant A ✅ / ❌ / ⏭
2.2 Semantic search Tenant B ✅ / ❌ / ⏭
3. Multi-tenant isolation ✅ / ❌ / ⏭
4. Queue and workspaces ✅ / ❌ / ⏭
5. MCP (Claude Code / Cursor) ✅ / ❌ / ⏭
6. Codex (MCP) ✅ / ❌ / ⏭
7. CLI ✅ / ❌ / ⏭
8. Python SDK direct ✅ / ❌ / ⏭

Findings

Finding 1 — [brief title]

Severity: Critical / High / Medium / Low / Info
Battery: [guide section number]
Tenant: A / B / both

Request:

METHOD /endpoint
Header: Authorization: Bearer [REDACTED]
Body: { ... }

Response received:

{ ... }

Expected behavior per the guide: [description]

Reproduction: [minimum steps to reproduce]


Finding 2 — [brief title]

[repeat structure]


RAG questions — results

Fill in with questions relevant to the evaluator's corpus and their expected answers.

Question asked System response Expected ✅/❌
[question 1] [expected answer]
[question 2] [expected answer]
[question with no evidence] Should indicate no information found

Isolation test (if two accounts are available)

Question (with account B over corpus A) Result ✅/❌
[question specific to corpus A] 0 results

Codex / MCP configuration

Client used: [Claude Code / Cursor / Cline / Codex / other]
MCP transport: SSE / streamable_http / not applicable
Setup issues: [if any, describe]


Additional observations

[Friction points, documentation suggestions, open questions]


Evaluator environment

Attribute Value
OS [macOS / Linux / Windows]
Python [version]
MCP client / tool [name and version]
Observed average latency [ms approx.]