Site Reliability Engineer
Devsecops
Posted on 8/22/2023
INACTIVE
Western Digital

10,001+ employees

Disk drive & data storage manufacturer
Company Overview
Western Digital’s mission is to be at the cusp of innovation and to push the boundaries of innovation to make what you think was once impossible, possible. The company has been at the forefront of innovation with the first hard drives and now continues to work on advancements in 3D NAND along with their reliable data solutions business.
Data & Analytics
Hardware

Company Stage

IPO

Total Funding

$930.6M

Founded

2014

Headquarters

San Jose, California

Growth & Insights
Headcount

6 month growth

0%

1 year growth

1%

2 year growth

7%
Locations
Milpitas, CA, USA • Remote in USA
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
AWS
Development Operations (DevOps)
Docker
Google Cloud Platform
Jenkins
Git
Microsoft Azure
Splunk
Terraform
Kubernetes
Python
Ansible
Communications
Prometheus
CategoriesNew
DevOps & Infrastructure
Requirements
  • Candidates MUST POSSESS 4 to 6 years of hands-on experience in DevOps tools and SRE practices
  • MUST POSSESS Administration experience on DevOps tools such as Artifactory, Jenkins, Zuul, Git/Gerrit, Wan disco SVN, Blackduck, CodeScene, Spinnaker, and SAST/DAST tools
  • MUST POSSESS A Very good understanding of Infrastructure at the Server, VMWare, Storage and Networking
  • Exceptional analytical, problem solving, and troubleshooting skills to manage complex process and technology issues
  • Extensive experience in Ansible automation (Research, Write, Maintain, and Optimize roles/playbooks/modules)
  • Proficiency in containerization technologies viz., Docker, Kubernetes
  • Expertise in shell scripting, Python, and other configuration management tools like Terraform
  • Development and customisation of CICD pipelines and onboarding applications with varying requirements
  • Immense experience in monitoring enhancements and metrics dashboarding using tools such as Icinga, Splunk, Prometheus & Grafana
  • Excellent communication and collaboration skills
  • Automation First mindset
  • Focus on embedding Security postures on the systems
  • Working experience in ha-proxy, load balancers, ldap/sso integration, security endpoint configurations
  • Must possess strong documentation skills and can work with rapid change and at a fast pace
Responsibilities
  • Observability and Monitoring: Design, implement, and continuously improve monitoring and observability solutions to ensure effective and real-time visibility into system performance
  • Best Practices: Advocate for and implement best practices in SRE, DevOps, and automation, with a focus on enhancing platform stability and performance
  • Automation: Lead automation efforts to streamline processes, reduce manual tasks, and improve operational efficiency
  • Architecting and Designing: Contribute to the architecture and design of systems and applications, aligning them with reliability and scalability goals
  • Technical accountability: Provide technical ownership in the SRE team, fostering a collaborative and growth-oriented environment
  • Accountability: Take ownership of system reliability, meet Service Level Objectives (SLOs), and ensure customer satisfaction
  • Collaboration: Work closely with Engineering teams to understand customer requirements and collaborate on solutions
  • Adaptability: Stay updated with emerging technologies and adapt quickly to evolving requirements and challenges
  • Upskilling: Continuously upskill in newer technologies and share knowledge within the team
  • Team Player: Collaborate effectively with team members and contribute to a positive team culture
  • Professional Behaviour: Demonstrate professionalism, integrity, and a commitment to the highest ethical standards
  • Documentation: Maintain thorough and well-organized documentation of systems and processes
Desired Qualifications
  • Knowledge of cloud computing platforms (e.g., AWS, Azure, GCP) is a plus