Full-Time

Customer Support Engineer

Customer Support

Posted on 10/4/2025

Galileo

Galileo

51-200 employees

Data-centric NLP ML quality and labeling

Compensation Overview

$150k - $180k/yr

+

Burlingame, CA, USA

Hybrid

Two in-office days per week required in Burlingame, CA.

Category
Customer Experience & Support (4)
, , ,
Required Skills
LLM
Datadog
Python
Grafana
Machine Learning
TypeScript
Zendesk
JIRA
C/C++
DevOps
Requirements
  • Experience: 3+ years of experience in technical support, customer support engineering, or DevOps roles, preferably at B2B SaaS or developer tools companies.
  • Technical Troubleshooting: Strong debugging and problem-solving skills with experience investigating complex technical issues, analyzing logs, reviewing code, and using monitoring tools.
  • Passion & Curiosity: Passion to unblock someone in building with Galileo and curious to constantly tinker with different products and technologies in the GenAI space. Being a self-learner is absolutely critical in this fast moving space of Agentic AI.
  • Self Directed: Ability to work independently, prioritize effectively, and manage multiple concurrent issues in a fast-paced startup environment.
  • GenAI Application Understanding: Familiarity with GenAI applications, LLM-based systems, and common challenges in building AI applications (RAG systems, agentic workflows, evaluation patterns preferred).
  • Programming Knowledge: Working knowledge of Python or TypeScript with ability to read and understand customer code, write code samples, and debug integration issues.
  • Platform & Tools Expertise: Experience with observability tools (Grafana, Datadog, or similar), logging systems, and debugging production applications.
  • Communication Skills: Excellent written and verbal communication skills, with ability to explain technical concepts clearly and maintain professional, empathetic customer interactions.
  • Customer-Centric: Passionate about delivering exceptional customer experiences, with patience and dedication to seeing issues through to resolution.
  • Process Oriented: Ability to document issues, maintain organized ticket management, and contribute to building scalable support processes.
  • Support Platform Experience: Familiarity with customer support platforms (Pylon, Zendesk, Intercom, or similar) and ticketing system best practices.
  • Issue Tracking Tools: Experience with project management and issue tracking platforms (Shortcut, JIRA, Linear, or similar) for creating and managing escalations.
Responsibilities
  • First-Line Support: Serve as the primary point of contact for inbound customer inquiries and tickets, providing timely and professional responses across all support channels while evangelizing Galileo’s capabilities.
  • Technical Triage & Investigation: Assess severity and priority of incoming issues, conduct initial investigation including isolating problems, identifying affected product areas, and reproducing issues to perform root-cause analysis.
  • Technical Troubleshooting: Review product telemetry (Grafana dashboards), API requests/responses, logs, and customer application environments and code to diagnose complex technical issues related to GenAI evaluation and observability workflows.
  • Issue Resolution: Determine when issues can be resolved within Support or Customer Success teams, providing solutions through workarounds, code samples, product documentation, or configuration guidance.
  • Process Innovation: Drive continuous improvements in support processes, including establishing and maintaining knowledge base content, implementing AI-driven support efficiencies, and optimizing support workflows.
  • Product Feedback Loop: Analyze customer issue patterns and provide regular, actionable feedback to Product teams to drive improvements in documentation, platform usability, and feature development.
  • Escalation Management: For issues requiring deeper investigation, create detailed tickets with reproduction steps and technical context, then route appropriately to Product and Engineering teams.
  • Customer Communication: Maintain ongoing, transparent communication with customers throughout the investigation and resolution process via Pylon ticketing system, Slack, and direct calls as needed.
  • Cross-Functional Coordination: Partner with Sales, Product and Engineering teams on issue resolution, facilitating additional data collection from customers and coordinating fix deployment and validation.
  • Stakeholder Updates: Keep Customer Success team informed on ongoing issue status, ensuring seamless coordination and customer relationship management.
Desired Qualifications
  • Previous experience at a developer-focused or AI/ML platform company
  • Hands-on experience building or deploying LLM-based applications using frameworks like LangChain, LangGraph, or similar
  • Experience with API troubleshooting and integration debugging
  • Background in DevOps, Site Reliability Engineering (SRE), or platform engineering
  • Familiarity with cloud platforms (AWS, GCP, Azure) and containerization technologies
  • Experience building or maintaining technical documentation and knowledge bases
  • Track record of implementing support process improvements and automation

Rungalileo.io is a platform for machine learning teams to improve models and cut annotation costs. It uses data-centric NLP techniques to quickly find and fix data issues that hurt model performance and provides a collaborative data bench to manage and track models from raw data to production. It also detects when a model goes down in production and identifies the exact data it failed on. Unlike others, it integrates with existing tools in minutes and prioritizes actionability, security, and privacy. It lets teams choose which data to label, automatically detect mis-annotated data, and bulk label all in one place. The company earns revenue by charging a subscription fee for its services.

Company Size

51-200

Company Stage

Series B

Total Funding

$68.1M

Headquarters

San Francisco, California

Founded

2021

Simplify Jobs

Simplify's Take

What believers are saying

  • Cisco acquires Galileo, closing July 2026, boosts Splunk AI observability.
  • NVIDIA NeMo integration on March 18, 2025, accelerates GenAI data flywheel.
  • $45M Series B funding expands generative AI evaluation platforms.

What critics are saying

  • Cisco acquisition fails by July 2026 from antitrust scrutiny over Splunk dominance.
  • Open-sourcing Agent Control lets LangChain, Glean fork technology in 6-12 months.
  • Honeycomb, Datadog capture 30% Splunk pipeline in 12-18 months.

What makes Galileo unique

  • Galileo integrates in minutes with OpenAI, Anthropic, Azure OpenAI, AWS Bedrock.
  • Galileo Luna EFMs launched June 6, 2024, transform enterprise GenAI evaluations.
  • Galileo open-sources Agent Control for scalable AI agent governance.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Disability Insurance

Parental Leave

Flexible Work Hours

401(k) Retirement Plan

401(k) Company Match

Growth & Insights and Company News

Headcount

6 month growth

-1%

1 year growth

1%

2 year growth

3%
Dolphin Publications
Apr 10th, 2026
Cisco acquires Galileo to strengthen Splunk's AI observability capabilities

Cisco is acquiring Galileo, an AI observability specialist, to strengthen Splunk's position in the AI monitoring market. The deal is expected to close in July 2026. Galileo provides tools to evaluate AI output quality, detect errors before they reach users, and improve AI agent behaviour in production. The platform monitors hallucinations, bias, security risks and cost metrics across the entire agent development lifecycle, offering real-time observability for multi-agent systems. The acquisition will integrate Galileo into Splunk Observability Cloud, expanding existing AI agent monitoring capabilities. Galileo offers over 20 evaluation metrics including hallucination detection and supports major AI platforms like OpenAI, Anthropic, Azure OpenAI and AWS Bedrock. Cisco and Galileo previously collaborated on Cisco's AGNTCY initiative. Both companies will operate independently until the deal closes.

SiliconANGLE Media
Apr 10th, 2026
Cisco buys Galileo to strengthen Splunk’s agentic monitoring capabilities

Cisco buys Galileo to strengthen Splunk's agentic monitoring capabilities - SiliconANGLE

The Associated Press
Mar 11th, 2026
Galileo open sources Agent Control plane for enterprise AI agent governance at scale

Galileo has released Agent Control, an open source control plane enabling organisations to govern AI agents at scale. The platform allows users to write policies once and deploy them across all AI agents, addressing a critical barrier to enterprise AI adoption. CrewAI, Glean, Cisco AI Defense and Strands Agents will be the first to integrate with Agent Control. The platform provides centralised policy management, runtime mitigation for real-time updates, and supports guardrail evaluators from any vendor. Distributed under the Apache 2.0 licence, Agent Control addresses enterprise concerns around trust and governance that have prevented agents from reaching production. Use cases include preventing hallucinations, blocking data leaks, reducing token costs and enforcing brand standards. The platform is backed by Battery Ventures, Scale Venture Partners, Databricks Ventures and ServiceNow.

VentureBeat
Mar 28th, 2025
New Approach To Agent Reliability, Agentspec, Forces Agents To Follow Rules

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. AI agents have a safety and reliability problem. Agents would allow enterprises to automate more steps in their workflows, but they can take unintended actions while executing a task, are not very flexible, and are difficult to control.Organizations have already sounded the alarm about unreliable agents, worried that once deployed, agents might forget to follow instructions. OpenAI even admitted that ensuring agent reliability would involve working with outside developers, so it opened up its Agents SDK to help solve this issue. But researchers from the Singapore Management University (SMU) have developed a new approach to solving agent reliability. AgentSpec is a domain-specific framework that lets users “define structured rules that incorporate triggers, predicates and enforcement mechanisms.” The researchers said AgentSpec will make agents work only within the parameters that users want.Guiding LLM-based agents with a new approachAgentSpec is not a new LLM but rather an approach to guide LLM-based AI agents. The researchers believe AgentSpec can be used not only for agents in enterprise settings but useful for self-driving applications.   The first AgentSpec tests integrated on LangChain frameworks, but the researchers said they designed it to be framework-agnostic, meaning it can also run on ecosystems on AutoGen and Apollo. Experiments using AgentSpec showed it prevented “over 90% of unsafe code executions, ensures full compliance in autonomous driving law-violation scenarios, eliminates hazardous actions in embodied agent tasks, and operates with millisecond-level overhead.” LLM-generated AgentSpec rules, which used OpenAI’s o1, also had a strong performance and enforced 87% of risky code and prevented “law-breaking in 5 out of 8 scenarios.”Current methods are a little lackingAgentSpec is not the only method to help developers bring more control and reliability to agents

PR Newswire
Mar 20th, 2025
Galileo Announces Integration With Nvidia Nemo For Rapid Genai Development

Platform Powers End-to-End Continuous Improvement of Agentic ApplicationsSAN FRANCISCO, March 18, 2025 /PRNewswire/ -- Galileo, the AI Evaluation company, today announced an integration with NVIDIA NeMo ™, enabling customers to continuously improve their custom generative AI models. Now, customers can evaluate models comprehensively across the development lifecycle, curating feedback into datasets that power additional fine-tuning. As a result, customers ship GenAI apps that are more reliable, trusted, and cost-effective.Data Flywheel for AIThe majority of enterprises are developing GenAI applications – including agents and RAG-based chatbots – but it can be challenging to ship and scale these applications due to the non-deterministic outputs of Large Language Models (LLMs). There's even more complexity when AI teams wish to test new LLMs, which are constantly evolving in capability and price point. The solution is to build an AI data flywheel, enabling continuous testing and refinement, collecting data about user interactions for subsequent improvement. When AI teams use data to improve outcomes (whether by fine-tuning, prompt engineering, or in-context learning), they gain a competitive advantage.Galileo and NVIDIA accelerate a data flywheel by collecting and curating better data about the interactions of an AI application

INACTIVE