Full-Time

SRE Lead

Confirmed live in the last 24 hours

Hitachi Digital Services

Hitachi Digital Services

1,001-5,000 employees

AI & Machine Learning
Data & Analytics
Energy
Healthcare
Enterprise Software
Automotive & Transportation
Consulting

Senior

No H1B Sponsorship

Dallas, TX, USA

Hybrid role requiring on-site presence three days a week; local candidates preferred.

Category
DevOps & Infrastructure
Site Reliability Engineering
Required Skills
PowerShell
Kubernetes
Microsoft Azure
Python
Java
AWS
Terraform
Development Operations (DevOps)
Google Cloud Platform
Requirements
  • Proven experience with SRE principles and practices in managing on-premises and cloud applications.
  • Knowledge of generative AI applications and related technologies.
  • Strong leadership skills, with the ability to drive team performance and continuous improvement.
  • Analytical skills for resolving complex technical issues, ensuring system reliability, and minimizing downtime.
  • Excellent communication and collaboration skills to work effectively with cross-functional teams.
  • Expertise in SRE principles: anomaly detection, root cause analysis, and predictive maintenance.
  • Proficiency in defining SLIs, SLOs, and error budgets.
  • Experience leading an operations team in application production environments.
  • Knowledge of scripting languages (e.g., Java, Python, PowerShell).
  • Hands-on experience with Kubernetes and OpenTelemetry.
  • Understanding of generative AI, large language models (LLMs), and responsible AI.
  • Familiarity with DevOps methodologies, tools, and automation (e.g., CI/CD pipelines, Terraform, Helm).
  • Experience with public/private cloud platforms (e.g., AWS, Azure, GCP)
Responsibilities
  • Leading a team of platform, application, and incident SREs to manage and resolve complex production issues.
  • Improving application performance, availability, and reliability.
  • Implementing observability solutions for proactive issue identification and optimization.
  • Managing processes for incidents, changes, releases, and deployments.
  • Developing automation tools (IaC, alert as code, dashboard as code) to enhance efficiency.
  • Conducting POCs to implement tools supporting generative AI platforms.
  • Analyzing trends in incidents, problems, and alerts to drive operational improvements.
  • Documenting SOPs, critical systems information, and best practices for current and future use.
  • Providing technical guidance and mentorship to junior SRE team members.
  • Staying updated on advancements in generative AI technologies and responsible AI practices.
Hitachi Digital Services

Hitachi Digital Services

View

Company Stage

N/A

Total Funding

N/A

Headquarters

Dallas, Texas

Founded

N/A

Simplify Jobs

Simplify's Take

What believers are saying

  • The partnership with Wenco L.L.C. could open new avenues in the mining sector, enhancing Hitachi's service offerings and market reach.

What critics are saying

  • The lack of detailed company information and recent news makes it challenging to assess the company's current strategic direction and market position.

What makes Hitachi Digital Services unique

  • Hitachi Digital Services leverages its parent company's extensive industrial and technological expertise, providing a unique blend of IT consulting and management solutions that few competitors can match.

Help us improve and share your feedback! Did you find this helpful?