Software Architect - Agentic Evals at Datagrid AI in San Mateo, California

Posted in Other 6 days ago.

Type: full-time

Job Description:

Fully remote, with the exception of occasional meetings in San Francisco to collaborate.

Bay Area residency required.

We believe that everyone deserves their own personal army of AI helpers with deep access to company data to automate any task. Datagrid ingests business data continuously from 100+ sources, makes it all available to AI, and eliminates grunt-work such as categorizing 10k support tickets in minutes.

We are a Series-A startup headquartered in San Francisco, but operate as a distributed company. We offer competitive salaries and health benefits, along with equity and respect for work/life balance.

Join our tight-knit team that ships fast and pushes the boundaries of AI! In the last few months, our agents learned to use Microsoft Teams, write SQL queries, and automate tasks on complex schedules like "MWF at half past 9". Our Agents live where people work (Slack, Microsoft Teams, etc.) and automatically take useful actions like producing safety reports from worksite photos.

Responsibilities

Datagrid Agents operate where our customers work- across Teams, Slack, and even SMS. Agents make multistep plans, leverage vectorized data from 100+ sources, use tools like Docusign, and manipulate the Datagrid app. We cannot possibly test this all manually.

Your job will be to:

Work closely with an ex-Googler who built Gemini evals to create a harness for evaluating Agent performance, make that harness available both for local development and in CI/CD pipelines, and set up alerting for when Agents misbehave.
Influence and contribute to the extension of Datagrid's Agentic capabilities.
Choose the best open/closed source components to build out the testing infra.
Integrate publicly available benchmarks such as RAGBench into the testing system.
Grant subject matter experts the ability to add to the test library using customer queries, manually authored cases, and synthetically generated questions.
Expose evaluation performance so the company can track improvement over time.

Desired Experience

Proven track record of building test harnesses for Chat Agents from 0 ? 1.
10+ years of B2B software engineering experience.
Ability to write effective LLM prompts without assistance.
Proficiency with nodejs and server side frameworks such as NestJS or NextJS.
Familiarity with JavaScript frameworks such as React, Angular JS.
Experience with databases such as Weaviate and BigQuery.
Experience working with GCP or similar cloud providers.

Salary Range: $200k - $240k

Equity

100% covered medical, dental and vision

401k

All candidates for this role will be asked the following interview question: "Work with me to design a system to evaluate the Agent's performance at SQL queries." We don't expect you to have the perfect answer, but will evaluate you on your ability to clearly explain your thinking.

More jobs in San Mateo, California

Other about 9 hours ago Associate Personal Banker Mid Peninsula District Wells Fargo San Mateo, California
Other about 13 hours ago Director of Culinary The Hunter Group Associates San Mateo, California
Other about 13 hours ago Temporary Administrative Assistant Clarity Recruiting San Mateo, California

More jobs in Other

Other 1 minute ago Physical Scientist Lynker Technologies College Park, Maryland
Other 14 minutes ago ROAD DEPARTMENT Goshen Township Salem, Ohio
Other 14 minutes ago LABORER/PARK AND CEMETERY DEPT City Of Columbiana Columbiana, Ohio

Software Architect - Agentic Evals at Datagrid AI in San Mateo, California

Job Description:

More jobs in San Mateo, California

More jobs in Other

About

More Info

Job Seekers

Employers

Jobs