Evaluate Atlas AI agents

Configure evaluations to verify that your agents provide the results that you expect. Create test cases with prompts and expected responses, run evaluations to generate agent responses, then review results to identify where your agents can improve.

Before you start

You must create an agent in the Agent builder.

Configure evaluation

Configure an evaluation with test cases that define prompts and expected responses.

Start configuration

In CDF, navigate to Atlas AI > Evaluate agents.
Select + Create new evaluation.

Enter evaluation details

Enter a Name for your evaluation.

Use descriptive names like “Pump maintenance agent - v1” to help you identify your evaluations.

Add a Description to explain the purpose of the evaluation.

Define test cases

Add test cases to define prompts and the expected responses.

In the Test cases section, enter a Prompt.
Enter the Expected response. Atlas AI compares your expected response to the agent’s response to test performance.
Optionally, choose an agent and select Generate answer to use the agent to generate the expected response.

Be specific about what details, like operational status, metrics, or alert information, you want agents to include in their responses.

Run and monitor evaluation

Run an evaluation to test your agent. Monitor real-time progress as the agent processes test cases.

Navigate to evaluation

Go to Evaluation overview.
Select Run evaluation.

Select your agent

Use the search or filter to find your agent by name, status, or latest version.
Select the agent from the list.
Select Run with selected agent to run the evaluation.

Confirm code tool access

If Atlas AI prompts you to allow code tool access for your agent, select Confirm and run to continue running the evaluation or Cancel to go back to the list of agents.

Monitor evaluation progress

When you run the evaluation, Atlas AI sends your prompts to the agent and records the responses in real time. This can take several minutes depending on how many test cases you have.You can select Cancel run to stop the evaluation at any time.

Do not close the browser tab while the evaluation is running. Evaluations run in the browser, and closing or navigating away from the tab will stop the evaluation.

View and analyze results

Review evaluation results to identify where agents need improvement and track performance across multiple runs.

View completed evaluations

On the Evaluation overview tab, review the list of completed evaluations.

Review test case status

Select an evaluation to view the status of your test cases.

Compare responses

Select View details to compare the agent responses to your expected responses for each test case.

Use evaluation insights to improve your agent by refining instructions in the Agent builder, updating tools or data access, or editing test cases.

Learn more

Build and publish agents - Learn how to create and configure agents
Agent tools - Reference for available agent tools

Build

​Before you start

​Configure evaluation

​Run and monitor evaluation

​View and analyze results

​Learn more

Before you start

Configure evaluation

Run and monitor evaluation

View and analyze results

Learn more