Our Company
We’re Hitachi Digital Services, a global digital solutions and transformation business with a bold vision of our world’s potential. We’re people-centric and here to power good. Every day, we future-proof urban spaces, conserve natural resources, protect rainforests, and save lives. This is a world where innovation, technology, and deep expertise come together to take our company and customers from what’s now to what’s next. We make it happen through the power of acceleration.
Imagine the sheer breadth of talent it takes to bring a better tomorrow closer to today. We don’t expect you to ‘fit’ every requirement – your life experience, character, perspective, and passion for achieving great things in the world are equally as important to us.
The team
At Hitachi Digital Services, our team is driven by a shared passion for innovation, collaboration, and creating transformative solutions that impact the world. As part of our Dallas-based SRE team, you will join a diverse, inclusive, and supportive environment that values continuous learning, cutting-edge technologies, and empowering individuals to lead impactful change. Together, we engineer reliability and performance for critical systems, ensuring our solutions are robust, efficient, and scalable.
This hybrid role requires you to be on-site three days a week. We prefer local candidates as relocation assistance is not available.
The role
As an SRE Lead, you will play a pivotal role in ensuring the availability, reliability, and performance of our cloud-based and on-premises platforms. You’ll lead a talented team of engineers to troubleshoot, optimize, and drive operational excellence while championing automation and SRE best practices. In this role, you’ll define and manage incident processes, lead generative AI platform initiatives, and mentor team members to align with the highest standards of operational excellence. You will also have the unique opportunity to drive innovation in generative AI applications, working with cutting-edge technologies to shape the future of AI-driven systems.
This position is ideal for individuals who thrive in a dynamic environment and are eager to lead with creativity, problem-solving skills, and a commitment to continuous improvement.
What You’ll Be Doing
- Leading a team of platform, application, and incident SREs to manage and resolve complex production issues.
- Improving application performance, availability, and reliability.
- Implementing observability solutions for proactive issue identification and optimization.
- Managing processes for incidents, changes, releases, and deployments.
- Developing automation tools (IaC, alert as code, dashboard as code) to enhance efficiency.
- Conducting POCs to implement tools supporting generative AI platforms.
- Analyzing trends in incidents, problems, and alerts to drive operational improvements.
- Documenting SOPs, critical systems information, and best practices for current and future use.
- Providing technical guidance and mentorship to junior SRE team members.
- Staying updated on advancements in generative AI technologies and responsible AI practices.
What you’ll bring
- Proven experience with SRE principles and practices in managing on-premises and cloud applications.
- Knowledge of generative AI applications and related technologies.
- Strong leadership skills, with the ability to drive team performance and continuous improvement.
- Analytical skills for resolving complex technical issues, ensuring system reliability, and minimizing downtime.
- Excellent communication and collaboration skills to work effectively with cross-functional teams.
Mandatory Skills
- Expertise in SRE principles: anomaly detection, root cause analysis, and predictive maintenance.
- Proficiency in defining SLIs, SLOs, and error budgets.
- Experience leading an operations team in application production environments.
- Knowledge of scripting languages (e.g., Java, Python, PowerShell).
- Hands-on experience with Kubernetes and OpenTelemetry.
- Understanding of generative AI, large language models (LLMs), and responsible AI.
- Familiarity with DevOps methodologies, tools, and automation (e.g., CI/CD pipelines, Terraform, Helm).
- Experience with public/private cloud platforms (e.g., AWS, Azure, GCP).
Preferred Skills
- Knowledge of fine-tuning models, prompt engineering, retrieval-augmented generation (RAG), and cost optimization techniques.
About us
We’re a global, team of innovators. Together, we harness engineering excellence and passion to co-create meaningful solutions to complex challenges. We turn organizations into data-driven leaders that can make a positive impact on their industries and society. If you believe that innovation can bring a better tomorrow closer to today, this is the place for you.
LI-YM1
Championing diversity, equity, and inclusion
Diversity, equity, and inclusion (DEI) are integral to our culture and identity. Diverse thinking, a commitment to allyship, and a culture of empowerment help us achieve powerful results. We want you to be you, with all the ideas, lived experience, and fresh perspective that brings. We support your uniqueness and encourage people from all backgrounds to apply and realize their full potential as part of our team.
How we look after you
We help take care of your today and tomorrow with industry-leading benefits, support, and services that look after your holistic health and wellbeing. We’re also champions of life balance and offer flexible arrangements that work for you (role and location dependent). We’re always looking for new ways of working that bring out our best, which leads to unexpected ideas. So here, you’ll experience a sense of belonging, and discover autonomy, freedom, and ownership as you work alongside talented people you enjoy sharing knowledge with.
We’re proud to say we’re an equal opportunity employer and welcome all applicants for employment without attention to race, colour, religion, sex, sexual orientation, gender identity, national origin, veteran, age, disability status or any other protected characteristic. Should you need reasonable accommodations during the recruitment process, please let us know so that we can do our best to set you up for success.