Full-Time

Research and Development Service Engineer

Orchestration Platforms

Posted on 9/27/2025

Microsoft

Microsoft

10,001+ employees

Develops software, OS, and cloud services

Compensation Overview

$100.6k - $199k/yr

Company Historically Provides H1B Sponsorship

Redmond, WA, USA + 2 more

More locations: Hillsboro, OR, USA | Mountain View, CA, USA

In Person

Category
AI & Machine Learning (2)
,
Required Skills
Kubernetes
Data Science
Machine Learning
Computer Networking
Requirements
  • Bachelor's Degree in Computer Science, Information Technology, Electrical Engineering, Data Science, Cybersecurity, or related field AND 2+ years technical experience in software engineering, network engineering, service engineering, systems engineering, or industrial controls OR equivalent experience.
  • 2+ years of experience managing and supporting Graphics Processing Unit (GPU) clusters (e.g., NVIDIA, AMD) with an emphasis on uptime, availability, and proactive maintenance.
  • 1+ year(s) of experience with Kubernetes and Volcano or other advanced workload scheduling frameworks for large-scale Artificial Intelligence/Machine Learning (AI/ML) jobs.
  • 1+ year(s) of experience troubleshooting across hardware, software, networking, and system logs, with a proven ability to resolve complex issues.
Responsibilities
  • You will develop and operate orchestration platforms using Kubernetes and Volcano, ensuring seamless job submission, scheduling, and monitoring for large-scale AI workloads.
  • Contribute to service design by recommending optimal configurations with awareness of cost, security, resiliency, and scalability.
  • Refine cluster and scheduler configurations to improve availability, reliability, observability, and performance.
  • Collaborate closely with Edge, Networking, Reliability, and Automation teams to integrate orchestration with hybrid and distributed systems.
  • Take part in design reviews, share learnings, and help define measurable health and performance metrics for orchestration services.
  • Stay informed of evolving orchestration technologies, adopting new solutions and proactively seeking opportunities to improve.
Desired Qualifications
  • Bachelor's Degree in Computer Science, Information Technology, Electrical Engineering, Data Science, Cybersecurity, or related field AND 5+ years technical experience working with large-scale cloud, GPU or distributed systems OR Master's Degree in Computer Science, Information Technology, Electrical Engineering, Data Science, Cybersecurity, or related field AND 2+ years technical experience in software engineering, network engineering, service engineering, systems engineering, or industrial controls OR equivalent experience.
  • Proficiency with observability and monitoring platforms (e.g., Prometheus, Grafana) for tracking system health, performance, and service-level indicators.
  • Familiarity with CUDA programming, GPU optimization, or large-scale model training environments.
  • Experience designing or optimizing orchestration platforms for hybrid or distributed systems spanning edge and cloud.
  • Demonstrated ability to improve cost efficiency, reliability, and scalability of services through infrastructure or platform design.
  • Prior experience in log analysis, automation, and proactive issue detection to minimize downtime and maximize service health.
  • Ability to collaborate across distributed global teams, communicate effectively, and meet uptime and service-level objectives.

Microsoft develops software, devices, and cloud services. Windows is an operating system that runs on personal computers, Office provides productivity apps, and Azure offers cloud computing and developer tools. The company differentiates itself with a large, integrated ecosystem of software, devices, and services, plus long-standing partnerships with PC makers and a broad enterprise footprint. Its goal is to put a computer on every desk and in every home, and to extend that reach through cloud services, professional networking (LinkedIn), and gaming.

Company Size

10,001+

Company Stage

IPO

Headquarters

Redmond, Washington

Founded

1975

Simplify Jobs

Simplify's Take

What believers are saying

  • Azure revenue surges 40% in Q3 FY26 from AI infrastructure demand.
  • Microsoft 365 Copilot hits 20 million seats, boosting $37B AI run rate.
  • Publisher Marketplace licenses content for AI search, enhancing ecosystem revenue.

What critics are saying

  • Generative AI erodes $70B Office per-seat licensing model within 12 months.
  • Kenya $1B data center stalls permanently over government payment disputes.
  • Azure gross margins compress structurally to 68% from AI capex in 18 months.

What makes Microsoft unique

  • Microsoft pioneered Altair BASIC in 1975, launching personal computing software.
  • MS-DOS deal with IBM in 1980 established PC operating system dominance.
  • GitHub Copilot Enterprise serves 140,000 organizations, driving developer AI adoption.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

401(k) Company Match

Professional Development Budget

Conference Attendance Budget

Flexible Work Hours

Remote Work Options

Growth & Insights and Company News

Headcount

6 month growth

0%

1 year growth

-1%

2 year growth

0%
Tech in Asia
Apr 14th, 2026
Microsoft adds 30,000 Nvidia chips to Norway site after $6.2B commitment

Microsoft has secured a deal with neocloud provider Nscale to expand its Norway data centre site with 30,000 Nvidia chips. The agreement adds to Microsoft's earlier $6.2 billion commitment to the location, whilst OpenAI did not finalise a capacity agreement there. The move is part of Microsoft's roughly $60 billion spending wave on specialised neocloud providers that rent AI computing infrastructure. CEO Satya Nadella has identified power availability and data centre construction speed as the company's biggest bottleneck, rather than chip supply. The deal reflects how cheap electricity and clear regulations increasingly shape AI data centre locations. Nscale, a UK-based startup that emerged from crypto-mining firm Arkon Energy in 2024, raised $2 billion at a $14.6 billion valuation in March 2026.

Bloomberg L.P.
Apr 14th, 2026
Microsoft takes over $6.2B Stargate data centre from OpenAI in Norway

Microsoft has agreed to rent data centre capacity at a Norwegian site originally intended for OpenAI as part of its Stargate initiative. The company will rent 30,000 additional Nvidia Vera Rubin chips from neocloud provider Nscale at a campus inside the Arctic Circle in Narvik, Norway. The deal builds on Microsoft's prior $6.2 billion commitment at the same location. Nscale announced the agreement in a statement, marking a shift in the facility's intended purpose from OpenAI to Microsoft operations.

Yahoo Finance
Apr 14th, 2026
Microsoft stock down 23% despite Azure growing 39% and $625B revenue backlog

Microsoft shares have fallen 23.14% year-to-date to $370.87, despite strong Q2 FY2026 results showing non-GAAP EPS of $4.14, a 7.57% beat. Revenue reached $81.27 billion, up 16.72% year-over-year, with Azure growing 39%. The company's commercial remaining performance obligation surged 110% to $625 billion in contracted future revenue, providing multi-year visibility. Microsoft's OpenAI partnership includes a $250 billion incremental Azure services commitment, whilst the company holds a 27% stake valued at approximately $135 billion. Despite the decline, 95% of covering analysts remain bullish, with a consensus price target of $587.31. Analysts cite the year-to-date drop as creating an entry point for investors confident in Azure's AI growth trajectory.

The Register
Apr 14th, 2026
Microsoft hikes UK Surface prices by up to $280 as RAM shortage hits consumers

Microsoft has quietly raised UK prices for its Surface devices by £170 to £220, citing rising memory and component costs. The 13-inch Surface Laptop now starts at £1,099, up from £899 in February, whilst the 15-inch model has increased from £1,349 to £1,519. In the US, some configurations have jumped from $999 to $1,499, according to Windows Central. Microsoft acknowledged the increases are due to rising memory and component costs, as chip manufacturers prioritise high-bandwidth memory production, leaving DRAM and NAND supplies constrained. The changes were not announced; Microsoft simply updated pricing on its website and removed cheaper configurations. The memory shortage is affecting manufacturers across the board, from Chromebooks to Raspberry Pi devices, with geopolitical tensions and freight costs further driving prices higher.

YouTube
Apr 13th, 2026
Microsoft Plots New OpenClaw-Inspired Copilot Features

Microsoft Reporter Aaron Holmes reveals how Satya Nadella is pushing for autonomous "always-on" agents within Copilot to rival OpenClaw. He explains the internal reorganization designed to prioritize these background AI tools for enterprise users. Read more: https://www.theinformation.com/articles/microsoft-plots-new-copilot-features-inspired-openclaw Subscribe: https://www.theinformation.com/subscribe_youtube The Information’s TITV airs weekdays on YouTube, X and LinkedIn at 10AM PT / 1PM ET. Or check us out wherever you get your podcasts. Follow us: X: https://x.com/theinformation IG: https://www.instagram.com/theinformation/ TikTok: https://www.tiktok.com/@titv.theinformation LinkedIn: https://www.linkedin.com/company/theinformation/

INACTIVE