Full-Time

SRE Lead

Confirmed live in the last 24 hours

Hitachi Digital Services

Hitachi Digital Services

1,001-5,000 employees

Data & Analytics
Energy
Enterprise Software
AI & Machine Learning
Healthcare

Senior, Expert

No H1B Sponsorship

Dallas, TX, USA

Hybrid role requiring on-site presence three days a week; local candidates preferred.

Category
DevOps & Infrastructure
Site Reliability Engineering
Required Skills
PowerShell
Kubernetes
Microsoft Azure
Python
Java
AWS
Terraform
Development Operations (DevOps)
Google Cloud Platform
Requirements
  • Proven experience with SRE principles and practices in managing on-premises and cloud applications.
  • Knowledge of generative AI applications and related technologies.
  • Strong leadership skills, with the ability to drive team performance and continuous improvement.
  • Analytical skills for resolving complex technical issues, ensuring system reliability, and minimizing downtime.
  • Excellent communication and collaboration skills to work effectively with cross-functional teams.
  • Expertise in SRE principles: anomaly detection, root cause analysis, and predictive maintenance.
  • Proficiency in defining SLIs, SLOs, and error budgets.
  • Experience leading an operations team in application production environments.
  • Knowledge of scripting languages (e.g., Java, Python, PowerShell).
  • Hands-on experience with Kubernetes and OpenTelemetry.
  • Understanding of generative AI, large language models (LLMs), and responsible AI.
  • Familiarity with DevOps methodologies, tools, and automation (e.g., CI/CD pipelines, Terraform, Helm).
  • Experience with public/private cloud platforms (e.g., AWS, Azure, GCP)
Responsibilities
  • Leading a team of platform, application, and incident SREs to manage and resolve complex production issues.
  • Improving application performance, availability, and reliability.
  • Implementing observability solutions for proactive issue identification and optimization.
  • Managing processes for incidents, changes, releases, and deployments.
  • Developing automation tools (IaC, alert as code, dashboard as code) to enhance efficiency.
  • Conducting POCs to implement tools supporting generative AI platforms.
  • Analyzing trends in incidents, problems, and alerts to drive operational improvements.
  • Documenting SOPs, critical systems information, and best practices for current and future use.
  • Providing technical guidance and mentorship to junior SRE team members.
  • Staying updated on advancements in generative AI technologies and responsible AI practices.
Hitachi Digital Services

Hitachi Digital Services

View

Company Stage

N/A

Total Funding

N/A

Headquarters

Dallas, Texas

Founded

1910

Simplify Jobs

Simplify's Take

What believers are saying

  • Rising demand for digital transformation in manufacturing boosts client base.
  • Global smart city projects increase demand for Hitachi's digital solutions.
  • Remote work trends drive need for digital collaboration tools.

What critics are saying

  • Emerging digital service providers increase market competition.
  • Talent shortages in AI and data analytics may delay projects.
  • Geopolitical tensions could impact international operations and collaborations.

What makes Hitachi Digital Services unique

  • Hitachi Digital Services partners with Wenco to enhance IT consulting solutions.
  • Strong focus on smart city initiatives with integrated urban management systems.
  • Specializes in cybersecurity and cloud integration services for diverse industries.

Help us improve and share your feedback! Did you find this helpful?