Full-Time

Open Source Engineer

Ruby

Posted on 10/31/2025

Braintrust

Braintrust

51-200 employees

Enterprise AI development platform with evaluations

No salary listed

Seattle, WA, USA + 2 more

More locations: San Francisco, CA, USA | New York, NY, USA

In Person

Category
Software Engineering (1)
Requirements
  • Have deep expertise in Ruby and understand what it takes to build fast, idiomatic, and reliable libraries in that ecosystem.
  • Are proficient with the tooling needed to build robust SDKs, such as testing frameworks, profilers, CI/CD pipelines, and packaging systems.
Responsibilities
  • Build elegant, idiomatic, and resilient SDKs that power Braintrust’s LLM evaluation and AI observability platform.
  • Ensure our libraries are easy to use, efficient to run, and a delight to work with, prioritizing developer experience and performance.
  • Integrate with major AI providers, frameworks, and platforms our customers rely on, such as OpenAI, Anthropic, and Gemini.
  • Build tools and automation to improve testing, profiling and simplify release workflows.
  • Collaborate closely with backend, platform, and product teams to ensure a cohesive and polished developer experience.
  • Be a great community ambassador: talk with our users, understand their issues, help them get their fixes merged and have deep empathy for our users.
Desired Qualifications
  • Customer-obsessed and passionate about solving real-world problems for developers.
  • Deep expertise in OpenTelemetry or related tracing and observability tools is a plus (bonus).
  • Experience contributing to or maintaining production-grade open-source libraries, ideally used across multiple environments (desirable).
  • Familiarity with LLM/Artificial Intelligence ecosystem tools and building SDKs in that space (bonus).
  • Experience leading SDK adoption at an organization (bonus).
  • Familiar with tooling such as testing frameworks, profilers, CI/CD pipelines, and packaging systems (already in requirements)

Preparing a concise, high-school-friendly company summary for BrainTrust based on the provided description.

Company Size

51-200

Company Stage

Series B

Total Funding

$121.1M

Headquarters

San Francisco, California

Founded

2023

Simplify Jobs

Simplify's Take

What believers are saying

  • AI observability market expanding as enterprises embed agents into mission-critical workflows.
  • Series B funding of $80M in February 2026 enables geographic expansion and feature development.
  • Continuous evaluation frameworks becoming standard practice, similar to CI/CD adoption in software.

What critics are saying

  • May 2026 AWS breach exposed customer API keys, forcing all clients to rotate credentials.
  • Arize AI's $131M funding and established ML observability platform captures large enterprises.
  • Langfuse acquisition by ClickHouse at $15B valuation commoditizes proprietary observability infrastructure.

What makes Braintrust unique

  • Brainstore database queries AI traces 80% faster than traditional databases.
  • Adopted by eight leading enterprises: Notion, Replit, Cloudflare, Ramp, Dropbox, Vercel, Navan, BILL.
  • Integrated eval, tracing, and prompt playground in single platform for teams.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Unlimited Paid Time Off

Competitive salary and equity

AI Stipend

Growth & Insights and Company News

Headcount

6 month growth

0%

1 year growth

9%

2 year growth

15%
Ministry of Information, Orientation and Strategy, Bayelsa State
Feb 20th, 2026
Braintrust Raises $80M Series B to Power AI Observability

Braintrust raises $80M Series B to power AI observability. Published on Feb 20, 2026 Key takeaways. * Braintrust Data Inc. secured $80 million in Series B funding, led by ICONIQ Capital, with participation from Andreessen Horowitz, Greylock, Basecase Capital, and Elad Gil. The round values the company at $800 million, reflecting strong investor confidence in AI infrastructure platforms. * As enterprises integrate AI agents and large language models into mission critical workflows, structured evaluation frameworks have become essential. Braintrust delivers infrastructure that measures model performance, identifies hallucinations, detects data drift, and flags regressions before they affect end users. * The platform is already embedded within leading AI driven enterprises such as Notion, Replit, Cloudflare, Ramp, Dropbox, Vercel, Navan, and BILL. This adoption indicates increasing demand for continuous AI observability and production grade monitoring tools. * The newly raised capital will be allocated toward expanding engineering capabilities, strengthening go to market operations, establishing additional office locations, launching enhanced observability features, and entering new geographic markets. Quick recap. San Francisco-based Braintrust Data Inc. has officially announced the close of an $80 million Series B funding round, led by ICONIQ Capital at an $800 million post-money valuation. The round included returning backers Andreessen Horowitz, Greylock, Elad Gil, and Basecase Capital. The announcement was made via the company's official X (formerly Twitter) account, with CEO Ankur Goyal signaling that Braintrust is "building the infrastructure that helps teams measure, evaluate, and improve their AI products". Inside Braintrust's AI observability platform. Braintrust has built an AI-native observability and evaluation platform designed specifically for monitoring the quality of AI models and their outputs in production a fundamentally different challenge than traditional system-health monitoring. The platform integrates several critical workflows: * Exhaustive Tracing: Automatically captures every step of an AI model or agent's reasoning process, including prompts, tool calls, retrieved context, and metadata on latency and cost. * Automated Evaluation: Uses built-in scorers and an LLM-as-a-judge approach to evaluate model outputs for accuracy, relevance, and safety. Teams can run both offline experiments during development and online scoring on live production traffic. * Prompt Playground: A visual interface to test and version-control prompt changes against real production data before deployment. * AI-Powered Assistant: Analyzes millions of traces to suggest better prompts, create new datasets, and identify patterns that cause specific hallucination types. Critically, all of this runs on Brainstore, Braintrust's purpose-built database, which is reportedly 80% faster at querying complex AI traces than alternatives. This performance advantage is essential as enterprise AI deployments scale to millions of daily interactions. What leadership is saying? Matt Jacobson of ICONIQ noted that companies with enduring impact typically demonstrate strong and consistent customer focus. He stated that Ankur and the Braintrust team have embedded this principle into their product strategy from the outset, aligning development closely with evolving user requirements. Competitive landscape. The competitive intensity in AI observability is increasing at a measured but decisive pace. In February 2025, Arize AI secured $70 million in Series C funding to expand its large language model evaluation and monitoring capabilities. The round was positioned as one of the largest investments in the AI observability segment, reflecting growing enterprise demand for structured performance tracking and risk management across AI systems. At the same time, Langfuse, widely adopted within the developer community, was acquired by ClickHouse in January 2026 as part of a $400 million Series D financing at a $15 billion valuation. The transaction highlights how observability is moving beyond a developer focused capability and becoming a core component of enterprise grade AI infrastructure, supporting governance, reliability, and scalable deployment. Strategic analysis. Braintrust leads in developer experience and UI-driven evaluation workflows, making it the strongest choice for product and engineering teams that want an integrated, non-code-heavy approach to AI observability. Arize AI, with $131M in total funding and deep roots in traditional ML observability, holds the edge for large enterprises with complex, multi-model production environments. Langfuse, now backed by ClickHouse's $15 billion infrastructure, offers the most compelling option for teams that prioritize open-source flexibility and self-hosting. Bayelsa Watch's takeaway. I think this is a big deal, $800 million valuation at the Series B stage for an observability focused company indicates a structural shift in the AI ecosystem. The industry is moving beyond rapid model deployment toward ensuring models operate reliably, consistently, and within defined performance standards. Across the AI infrastructure landscape, capital is increasingly being allocated to accountability rather than experimentation. While many startups previously raised funding based on model capability claims, Braintrust's positioning centers on evaluation, transparency, and measurable outcomes. Add Bayelsa Watch as a Preferred Source on Google for instant updates! Sources. Pramod Pawar Pramod Pawar is the Founder of Bayelsa Watch and a digital entrepreneur behind multiple technology focused ventures. With 10+ years of experience in SEO and content strategy, he is known for converting complex research into clear statistics and practical insights. He holds a Bachelor of Engineering in Information Technology from Shivaji University, and his work is centered on AI, machine learning, big data analytics, and other emerging technologies. Coverage is frequently focused on fast moving areas such as AR, VR, robotics, cybersecurity, and next generation digital platforms, where trends are best understood through data. A strong focus is placed on accuracy, source checking, and simple explanations that support both general readers and business decision makers. Outside of work, cricket and reading across multiple genres are enjoyed, which helps new ideas and continuous learning remain part of his writing process. Statistics

SiliconANGLE Media
Feb 17th, 2026
Braintrust raises $80M to monitor AI models and agents in production

Braintrust Data has raised $80 million in a Series B round led by Iconiq, valuing the AI observability startup at $800 million. Andreessen Horowitz, Greylock, basecase capital and Elad Gil participated in the funding. The company provides an AI-focused observability platform that monitors model quality, hallucinations and drift, addressing limitations of traditional monitoring tools. Its platform includes exhaustive tracing, automated evaluation using LLM-as-a-judge and a testing playground, all running on Brainstore, a purpose-built database that's 80% faster at querying AI traces. Braintrust counts Notion, Replit, Cloudflare, Ramp and Dropbox among its customers. The funding will support team expansion, new observability tools and geographical expansion. Co-founder Ankur Goyal emphasised close customer collaboration as key to the company's approach.

VentureBeat
Nov 14th, 2024
How Custom Evals Get Consistent Results From Llm Applications

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More. Advances in large language models (LLMs) have lowered the barriers to creating machine learning applications. With simple instructions and prompt engineering techniques, you can get an LLM to perform tasks that would have otherwise required training custom machine learning models. This is especially useful for companies that don’t have in-house machine learning talent and infrastructure, or product managers and software engineers who want to create their own AI-powered products.However, the benefits of easy-to-use models are not without tradeoffs. Without a systematic approach to keeping track of the performance of LLMs in their applications, enterprises can end up getting mixed and unstable results. Public benchmarks vs custom evalsThe current popular way to evaluate LLMs is to measure their performance on general benchmarks such as MMLU, MATH and GPQA

Tech Company News
Oct 11th, 2024
Braintrust Raises $36M for AI Accuracy

Braintrust, an AI evaluation and monitoring startup, raised $36 million in Series A funding led by Andreessen Horowitz, with participation from Datadog and Databricks Ventures. The funding aims to enhance AI accuracy for clients like Notion and Zapier by enabling continuous experimentation and real-time monitoring. This investment will help Braintrust expand its customer base, including companies like Airtable, Instacart, and Stripe, and develop tools to address AI inaccuracies.

Braintrust
Oct 10th, 2024
Announcing our $36M Series A - Blog - Braintrust

We’re thrilled to announce that we've raised $36 million to advance the future of AI software engineering, bringing our total funding to $45 million.

INACTIVE