Full-Time

Principal AI Site Reliability Engineer

Oracle

Oracle

10,001+ employees

Enterprise software and cloud computing provider

Compensation Overview

$86.4k - $199.5k/yr

+ Bonus + Equity

No H1B Sponsorship

Remote in USA

Remote

US Citizenship Required

Category
DevOps & Infrastructure (1)
Required Skills
Power BI
Kubernetes
Microsoft Azure
Python
Grafana
Java
ETL
Docker
Tableau
AWS
Go
Prometheus
Terraform
DevOps
Requirements
  • U.S. citizenship is required for this position to obtain and maintain a U.S. government security clearance after hire.
  • Ten years of software engineering experience, including eight years in cloud infrastructure, Site Reliability Engineering, or DevOps.
  • Proven ownership of production system reliability in cloud environments.
  • Experience building and operating high-availability, fault-tolerant systems.
  • Strong understanding of distributed systems, performance monitoring, and resiliency patterns.
  • Experience with incident response, root-cause analysis, and production troubleshooting.
  • Proficiency with Terraform, Docker, and Kubernetes.
  • Observability tools experience with Prometheus and Grafana.
  • Programming proficiency in Python, Java, or Go.
  • Experience with Data Warehousing platforms and ETL frameworks; understanding of columnar storage systems.
  • Experience with Vertica and ETL frameworks.
  • Experience supporting or integrating Business Intelligence tools (Tableau, Power BI, Oracle Analytics).
  • Familiarity with multi-cloud environments (Oracle Cloud Infrastructure, Amazon Web Services, Microsoft Azure) and hybrid or cross-cloud architectures.
  • Knowledge of cloud infrastructure design, deployment, and resource optimization.
Responsibilities
  • Work with the Site Reliability Engineering team to take shared ownership of services and platform components. Develop a strong understanding of end-to-end system architecture, dependencies, and production behavior.
  • Design, build, and operate reliable, scalable, and secure infrastructure supporting large-scale analytics workloads.
  • Improve system reliability through automation, monitoring, and performance optimization.
  • Contribute to AI-assisted approaches for operations, including enhancing observability and alerting, supporting automated incident detection and remediation, and exploring intelligent automation for infrastructure lifecycle management.
  • Partner with development teams to enhance service architecture, scalability, and operability.
  • Participate in on-call rotations and act as an escalation point for complex production issues.
  • Perform root cause analysis and implement long-term fixes to prevent recurrence.
  • Apply knowledge of distributed systems to troubleshoot issues and optimize system performance.
  • Drive continuous improvement in DevOps/SRE practices, including CI/CD, Infrastructure as Code, and automation at scale.
  • Implement and optimize infrastructure for Oracle HDI Analytics Platform; ensure uptime, reliability, and scalability.
  • Build GenAI-powered or agent-based solutions for observability, anomaly detection, incident triage, remediation, and infrastructure provisioning and lifecycle management.
  • Build tools and frameworks that enable self-service and autonomous operations.
  • Build and optimize scalable data pipelines using Vertica and ETL frameworks.
  • Apply DevOps/SRE practices to automate deployments and operations; enhance observability using Prometheus/Grafana and AI-driven insights.
  • Support multi-cloud initiatives across OCI, AWS, and Azure; optimize cost, performance, and compliance across environments.
  • Participate in on-call rotations; implement preventative and automated remediation solutions.
  • Collaborate with engineers to execute technical roadmaps; contribute to code reviews and infrastructure improvements.
Desired Qualifications
  • Experience in healthcare or regulated environments (HIPAA, compliance frameworks).
  • Familiarity with Oracle HDI or large-scale analytics platforms.
  • Experience working in environments requiring security clearance.
  • Experience building self-healing or autonomous infrastructure systems.
  • Demonstrated experience applying GenAI / LLMs / agentic frameworks to infrastructure or operations.
  • Experience building or integrating AI-powered automation for DevOps/SRE workflows.
  • Familiarity with LangChain, AutoGPT, or custom AI agents.

Oracle provides enterprise software and cloud services, including database management, middleware, applications, and developer tools. Its core product is the Oracle Database, a relational database that stores and retrieves data using SQL; customers can run it on premises or in the Oracle Cloud, and they can use accompanying tools for data analytics, security, and integration. Oracle also offers a broad suite of applications (like ERP, HR, and health IT) and a cloud platform to build, deploy, and manage software. The company differentiates itself through a long history and large, integrated portfolio that connects database technology with applications, middleware, hardware, and cloud services, plus a track record of major acquisitions that broaden its scope. Its goal is to help organizations store, manage, and analyze data at scale, automate business processes, and run software in a unified, multi-environment cloud and on-premises setup.

Company Size

10,001+

Company Stage

IPO

Headquarters

Austin, Texas

Founded

1977

Simplify Jobs

Simplify's Take

What believers are saying

  • Cloud infrastructure revenue grew 84% YoY; enterprise demand independent of OpenAI.
  • NAND flash capacity sold out; 70-75% price increases benefit data centre expansion.
  • IHH Healthcare consolidates 190 facilities onto Oracle platform; signals strong adoption.

What critics are saying

  • Cerner VA contract failures cause $33B overruns; contract termination erodes $10B revenue.
  • OpenAI partnership falters on missed targets; $300B deal faces renegotiation or cancellation.
  • $124.7B debt and negative free cash flow strain finances if growth slows.

What makes Oracle unique

  • Oracle's $553B backlog provides unmatched revenue visibility for AI infrastructure services.
  • Embedded AI agents in Fusion Cloud workflows automate enterprise processes at scale.
  • Healthcare board expertise positions Oracle to capture regulated industry digital transformation.

Help us improve and share your feedback! Did you find this helpful?

Your Connections

People at Oracle who can refer or advise you

Benefits

401(k) Savings and Investment Plan with company match

Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.

11 paid holidays

Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.

Paid parental leave

Adoption assistance

Employee Stock Purchase Plan

Financial planning and group legal

Voluntary benefits including auto, homeowner and pet insurance

Company News

PR Newswire
Apr 14th, 2026
Oracle extends agentic AI platform to corporate banking with automated agents

Oracle Financial Services has launched new agentic AI capabilities for corporate banking, introducing pre-built AI agents for treasury, trade finance, credit and lending operations. The platform aims to automate mission-critical processes and accelerate decision-making for financial institutions and corporate banks. Key innovations include AI agents that extract and validate data from lengthy loan contracts, pull financial metrics from statements, monitor external news sources for risk signals, and generate credit memo narratives. For trade finance, agents validate bank guarantee applications and create supply chain finance programmes from sales contracts. The enhancements are part of Oracle's next-generation banking platform, which embeds AI directly into customer engagements and business processes whilst maintaining human oversight. Oracle plans to release hundreds of additional corporate and retail banking agents within the next 12 months.

PR Newswire
Apr 14th, 2026
Oracle enhances Primavera Unifier with AI-driven workflows and compliance features for capital projects

Oracle has launched AI-enabled capabilities in its Primavera Unifier platform to help project and asset management teams improve compliance and efficiency in capital projects. The new features include AI-driven workflow summarisation, enhanced data integration, and improved safety monitoring. The updates enable teams to prioritise critical activities through AI-powered business process summaries that provide structured chronologies of project progress, including participant details, timestamps and decisions. Enhanced Oracle Integration capabilities allow real-time data automation across enterprise systems with audit trails. Additional features include no-code process design tools, dashboards for performance tracking, and safety monitoring through Oracle Construction and Engineering Advisor. The platform aims to support high-compliance projects in regulated environments such as federal programmes and aerospace.

Yahoo Finance
Apr 14th, 2026
SMRT pilots Oracle AI platform to enhance rail maintenance for 2M daily passengers

SMRT, Singapore's leading public transportation provider, is piloting an AI-enabled rail maintenance platform called JARVIS using Oracle Cloud Infrastructure Enterprise AI and Oracle Autonomous AI Database. The platform aims to improve safety and reliability for the rail network, which handles over two million passenger journeys daily. Developed by STRIDES Technologies, SMRT's engineering arm, JARVIS unifies data from multiple standalone systems and applies AI for predictive maintenance and faster fault resolution. The platform uses machine learning through a generative AI chatbot interface, enabling maintenance teams to access real-time analytics and engineering insights via natural language. Built on Oracle Autonomous AI Database, JARVIS consolidates maintenance and condition monitoring data, including train performance, sensor readings and asset lifecycle information. The Oracle AI Customer Excellence Center supported the platform's development and validation.

Tech in Asia
Apr 14th, 2026
Oracle, Adobe rally as AI peace hopes lift battered software sector down 23% YTD

Software stocks rallied on hopes for a US-China trade deal, with Oracle and Adobe leading gains. However, the sector remains under pressure this year amid fears that AI tools from OpenAI and Anthropic could enable customers to build software faster and potentially displace vendors. The iShares Expanded Tech-Software Sector ETF is down over 23% year-to-date, with average sales multiples falling from 9x to 6x. A record $25 billion in software-sector leveraged loans now trade at distressed levels, raising concerns about private credit markets where the sector is a major borrower. Some firms are monetising AI successfully — ServiceNow's Now Assist product reached $600 million in annual contract value in Q4 2025. Yet deteriorating valuations could trigger a credit crisis through "shadow defaults" and forced fund withdrawals, with potential spillover to banks increasingly exposed to private credit.

Bloomberg L.P.
Apr 13th, 2026
Oracle secures 2.8GW Bloom Energy fuel-cell power for AI data centres

Oracle has agreed to purchase up to 2.8 gigawatts of fuel-cell power from Bloom Energy to supply data centres for artificial intelligence work. An initial 1.2 gigawatts of capacity has been contracted and will be deployed this year and in 2027 at Oracle projects in the US. A gigawatt provides enough electricity to supply approximately 750,000 US households simultaneously. The deal represents a significant commitment to powering AI infrastructure through fuel-cell technology as demand for data centre capacity continues to grow.