Before you start
You must create an agent in the Agent builder.Configure evaluation
Configure an evaluation with test cases that define prompts and expected responses.1
Start configuration
- In CDF, navigate to Atlas AI > Evaluate agents.
- Select + Create new evaluation.
2
Enter evaluation details
- Enter a Name for your evaluation.
- Add a Description to explain the purpose of the evaluation.
3
Define test cases
Add test cases to define prompts and the expected responses.
- In the Test cases section, enter a Prompt.
- Enter the Expected response. Atlas AI compares your expected response to the agent’s response to test performance.
- Optionally, choose an agent and select Generate answer to use the agent to generate the expected response.
Run and monitor evaluation
Run an evaluation to test your agent. Monitor real-time progress as the agent processes test cases.1
Navigate to evaluation
- Go to Evaluation overview.
- Select Run evaluation.
2
Select your agent
- Use the search or filter to find your agent by name, status, or latest version.
- Select the agent from the list.
- Select Run with selected agent to run the evaluation.
3
Confirm code tool access
If Atlas AI prompts you to allow code tool access for your agent, select Confirm and run to continue running the evaluation or Cancel to go back to the list of agents.
4
Monitor evaluation progress
When you run the evaluation, Atlas AI sends your prompts to the agent and records the responses in real time. This can take several minutes depending on how many test cases you have.You can select Cancel run to stop the evaluation at any time.
View and analyze results
Review evaluation results to identify where agents need improvement and track performance across multiple runs.1
View completed evaluations
On the Evaluation overview tab, review the list of completed evaluations.
2
Review test case status
Select an evaluation to view the status of your test cases.
3
Compare responses
Select View details to compare the agent responses to your expected responses for each test case.
Learn more
- Build and publish agents - Learn how to create and configure agents
- Agent tools - Reference for available agent tools