Full-Time

Senior Devops Engineer

Posted on 8/23/2025

DataRobot

DataRobot

501-1,000 employees

Enterprise AI platform automating ML lifecycle

No salary listed

Remote in India

Remote

Category
DevOps & Infrastructure (3)
, ,
Required Skills
Kubernetes
Microsoft Azure
Python
Operating Systems
AWS
Go
DevOps
Google Cloud Platform
Requirements
  • 7+ years of proven experience with high-quality infrastructure solutions in a collaborative environment including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • 3+ years of experience building infrastructure solutions in at least one major cloud provider (AWS, Azure, or GCP)
  • Expert proficiency in Kubernetes. Experience in building and running software systems on Kubernetes clusters in production
  • Expert proficiency in Kubernetes architecture and operations including resource management scheduling, auto-scaling and cluster networking
  • Hands-on experience with infrastructure provisioning and configuration using Infrastructure as Code (IaC) principles
  • Hands-on experience in developing wide variety of software and automation scripts with Python/Golang
  • Experience designing and operating diverse CI/CD pipelines with Harness.io or similar platforms such as Github Actions, Gitlab CI, JenkinsX or ArgoCD
  • You have a deep understanding of core computer science — including operating systems, distributed systems, networking, and concurrent programming
  • You have experience and insight into designing, implementing, and supporting highly scalable cloud services from the ground up
  • You have an aptitude to deal with ambiguity, and enthusiasm to help tackle difficult issues
  • You can work effectively asynchronously and face-to-face in a multicultural team in multiple timezones around the world
  • You have excellent critical thinking skills and can objectively evaluate multiple solutions with different tradeoffs
Responsibilities
  • Develop a fully-featured Kubernetes platform built around industry standards to improve developer experiences and enable self-service capabilities
  • Work closely with internal application teams to improve their Kubernetes onboarding experience
  • Work with app teams to understand their potential challenges and help them choose the best way to architect their systems on Kubernetes
  • Design and implement new platform features to meet business and internal team goals
  • Monitor and maintain the performance and reliability of the existing Kubernetes platform clusters, and identify and troubleshoot any issues that may arise
  • Closely follow trends in the Kubernetes community and take advantage of new technologies as they emerge
  • Work with the customers, and stakeholders to understand their needs and build the right products and solutions
  • Take an active part in the strategy and roadmap definition and prioritization
  • Seek, give, and receive feedback in a constructive manner, including but not limited to code reviews
  • Engage in engineering on-call escalated support of services owned by the team
Desired Qualifications
  • Open-source contributions
  • Experience building Kubernetes operators
  • Experience in building Infrastructure Platforms
  • Expert in developing a wide variety of software with Python/Golang

DataRobot provides an enterprise AI platform that automates the end-to-end machine learning lifecycle, from data preparation to model deployment and management. It uses Automated Machine Learning to try many algorithms in parallel, then handles deployment, monitoring, and governance, with options for both code-first and no-code workflows and support for generative AI features. The platform runs in the cloud or on-premises and offers MLOps, model safety, and governance tools to manage models in production. Its goal is to democratize AI by making advanced machine learning accessible to a broad range of users while ensuring governance and safety at scale.

Company Size

501-1,000

Company Stage

Series G

Total Funding

$1.1B

Headquarters

Boston, Massachusetts

Founded

2012

Simplify Jobs

Simplify's Take

What believers are saying

  • Nebius partnership delivers NVIDIA GPU agents with predictable costs March 2026.
  • Fortune 50 clients like BCG and U.S. Army drive subscription revenue.
  • Multi-cloud deployment on AWS, Azure, Google supports hybrid environments.

What critics are saying

  • SAP embeds native AI in S/4HANA, displacing DataRobot by 2028.
  • NVIDIA NeMo Guardrails commoditizes DataRobot governance within 18 months.
  • Hyperscalers' SageMaker Agents erode multi-cloud advantage in 12 months.

What makes DataRobot unique

  • DataRobot invented AutoML and Automated Time Series since 2012.
  • Agent Workforce Platform operationalizes AI agents with NVIDIA integration.
  • SAP-endorsed AI suites automate finance and supply chain for S/4HANA.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Unlimited Paid Time Off

Paid Holidays

Paid Parental Leave

Global Employee Assistance Program (EAP)

Growth & Insights and Company News

Headcount

6 month growth

0%

1 year growth

-2%

2 year growth

0%
DataRobot
Mar 18th, 2026
DataRobot and Nebius partner to bring enterprise AI agents to production at scale on NVIDIA AI infrastructure.

DataRobot and Nebius partner to bring enterprise AI agents to production at scale on NVIDIA AI infrastructure. March 18, 2026 - BOSTON - DataRobot and Nebius today announced a strategic partnership that pairs the DataRobot Agent Workforce Platform, co-engineered with NVIDIA, with purpose-built AI cloud infrastructure for the modern enterprise. This solution delivers a fully optimized and validated AI factory, combining Nebius's dedicated NVIDIA-GPU based infrastructure with the DataRobot platform to build, deploy, monitor, and govern agents seamlessly - enabling enterprises to take agents to production within days versus months while bypassing the operational complexity of traditional hyperscalers. The partnership addresses a real operational gap. Running AI agents requires sustained inference performance, runtime governance, and infrastructure that behaves predictably as demand grows. Due to performance and cost variability, enterprises are increasingly evaluating infrastructure designed specifically for AI rather than retrofitted general-purpose clouds. Together, DataRobot and Nebius are addressing this structural shift by delivering a validated, AI-optimized stack that pairs the DataRobot Agent Workforce Platform with Nebius's purpose-built AI cloud and NVIDIA's GPUs and open software unlocking scalable, high-performance agentic AI beyond conventional cloud constraints. "When agents run continuously, performance variability and unpredictable costs create operational risk. By partnering with DataRobot and building on NVIDIA's AI foundation, we're providing a validated deployment environment on Nebius AI Cloud that delivers consistent latency, predictable pricing, and the performance required for sustained agent workloads," said Laurelle Roseman, VP of Global Partnerships, Nebius. The initiative allows DataRobot to deploy NVIDIA GPU-backed inference workloads - such as custom models and agent execution services - directly on Nebius managed Kubernetes, integrating agent intelligence and governance with high-performance NVIDIA infrastructure purpose-built for AI. "As we expand our use of AI, access to scalable infrastructure and the ability to operationalize models efficiently are critical. DataRobot and Nebius are helping simplify how we deploy and manage AI solutions in production, which is an important step as we continue building more AI-driven capabilities across the business," said Vijay Raghavendra, GEICO's Chief Product and Technology Officer. By eliminating noisy-neighbor overhead found in legacy clouds, this joint solution offers bare-metal-like performance with the low latency and predictable throughput required for always-on agents. Powered by the NVIDIA Hopper and NVIDIA Blackwell-generation data-center GPUs and secured by NVIDIA NeMo Guardrails, the platform provides a clear path to have AI to operate as a dependable, production-grade system within the enterprise. This co-designed stack delivers enterprise-grade capabilities without the lock-in, cost volatility, or architectural limits and premium costs of traditional providers - allowing organizations to scale their agent workforce with absolute confidence. "Agentic AI is only valuable if it works every time, not just in demos. This collaboration addresses the industry's biggest blind spot: operationalizing agents safely and predictable costs at scale. Together with Nebius and NVIDIA, we're delivering a validated AI factory that enterprises can trust in production - not tied to a single cloud, and not compromised by infrastructure uncertainty," said Debanjan Saha, CEO of DataRobot. "Enterprises are rapidly evolving from isolated AI projects to always-on agentic systems that can be trusted in production. By combining the DataRobot Agent Workforce Platform with Nebius' dedicated AI cloud and NVIDIA AI infrastructure, organizations can deploy a validated AI factory that delivers low-latency performance, predictable costs, and the governance required to safely scale an agent workforce across the business," said John Fanelli, vice president, AI Software, NVIDIA. About DataRobot DataRobot empowers AI teams to deliver the agentic workforce of the future. Its platform enables organizations to create and scale AI agents that integrate directly with business processes - driving efficiency, transforming operations, and delivering real results. With built-in governance and safeguards, DataRobot help enterprises deploy AI securely and confidently. For more information, visit its website and connect with DataRobot on LinkedIn. About Nebius Nebius, the AI cloud company, is building the full-stack platform for developers and companies to take charge of their AI future - from data and model training to production deployment. Founded on deep in-house technological expertise and operating at scale with a rapidly expanding global footprint, Nebius serves startups and enterprises building AI products, agents, and services worldwide. Nebius is listed on Nasdaq (NASDAQ: NBIS) and headquartered in Amsterdam. For more information. please visit www.nebius.com.

Dolphin Publications
Mar 16th, 2026
Okta launches platform to secure AI agents

Okta launches platform to secure AI agents. Okta for AI Agents is a platform that treats AI agents as full-fledged, non-human identities. It provides organizations with tools to discover agents, manage access, and immediately revoke access tokens. Okta for AI Agents is designed to help organizations answer three fundamental questions: where are my agents, what can they connect to, and what are they allowed to do? The impetus is a growing security problem. Only 22 percent of organizations treat AI agents as independent, identity-bearing entities, while 88 percent have already dealt with suspected or confirmed security incidents involving AI agents. But the problem extends beyond known agents. Ninety percent of AI usage occurs through unauthorized personal accounts, with an average of 223 shadow AI incidents per month. Okta addresses this with Shadow AI Agent Discovery, a feature that automatically detects when employees link AI agents to corporate applications. Three pillars for secure AI agents. The platform is built on three pillars. For registration and visibility, Okta is expanding its Okta Integration Network with dedicated support for platforms such as Boomi, DataRobot, and Google Vertex AI. Currently, that network already includes 8,200 integrations. Agents are registered as non-human identities in the Universal Directory, with a lifecycle spanning from onboarding to decommissioning. The second pillar is access management. An Agent Gateway serves as a central control plane for all connections between agents and resources: MCP connections, tools, APIs, and databases. Agent credentials are automatically rotated via a secure vault, ensuring they never appear in plain text or logs. The third pillar is the ability to revoke access immediately. Through Universal Logout, Okta can deactivate all access tokens if an agent deviates from its intended mission. All activity, including tool calls and authorization decisions, is forwarded to the organization's SIEM.

Technology AI Insights
Jul 31st, 2025
DataRobot Launches Agent Workforce Platform to Operationalize AI Agents at Scale

DataRobot launches Agent Workforce Platform to operationalize AI agents at scale.

RTInsights
Jul 12th, 2025
Real-time Analytics News for the Week Ending July 12

DataRobot's syftr, integrated with Cerebras' AI inference performance, delivers a toolchain for production-grade agentic apps.

Business Wire
Jun 2nd, 2025
DataRobot is a Leader in the 2025 Gartner(R) Magic Quadrant(TM) for Data Science and Machine Learning Platforms - Again

DataRobot is a Leader in the 2025 Gartner(R) Magic Quadrant(TM) for Data Science and Machine Learning Platforms - again.

INACTIVE