Senior Site Reliability Engineer
Confirmed live in the last 24 hours
Veeva Systems

5,001-10,000 employees

Cloud computing services for pharmaceutical companies.
Company Overview
Veep's mission is to help R&D, quality, and regulatory teams eliminate inefficiencies and bring high-quality, safe, sustainable products to market without compromising quality. The company builds cloud-based tools for pharmaceutical research.

Company Stage

N/A

Total Funding

$224M

Founded

2007

Headquarters

Pleasanton, California

Growth & Insights
Headcount

6 month growth

3%

1 year growth

12%

2 year growth

45%
Locations
Cambridge, MA, USA
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Bash
Kubernetes
Microsoft Azure
Python
Git
Ruby
SQL
AWS
JIRA
Terraform
Ansible
Confluence
Development Operations (DevOps)
CategoriesNew
DevOps & Infrastructure
Software Engineering
Requirements
  • Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent work experience)
  • 3+ years of working experience as a DevOps or SRE engineer
  • Independent learner, curious to learn new technologies
  • Experience with AWS and container orchestration tools (e.g., Kubernetes)
  • Familiarity with infrastructure as code tools (e.g., Terraform, Ansible) and version control systems (e.g., Git)
  • GitLab system administration experience
  • Solid scripting skills; experience with Shell, Bash, Ansible, Python, Go, Ruby, etc.
  • Excellent problem-solving skills and the ability to troubleshoot complex issues under pressure
  • 3+ years of experience in relational databases with a mastery of SQL
  • Demonstrated history of incident management and leadership ability
  • Hands-on operational experience in a high-volume or critical production service environment
  • Effective communication skills across all levels — whether talking to individual contributors or executives
  • Experience with disaster recovery planning and implementation
  • Experience with performance tuning of databases and distributed storage systems
  • Ability to handle the periodic, on-call duty
  • Fluent in English – both written and verbal
  • Experience with security best practices
Responsibilities
  • Take responsibility for managing production and pre-production environments, security, change management, deployment, architecture, and tools
  • Perform root cause analysis for complex failures and offer modern solutions and tools
  • Analyze performance and ensure the applications (GitLab, Jira, Confluence, TestRail, Mattermost), hosted in AWS, meet the scalability and reliability needs of our internal teams
  • Work closely with Infrastructure, DevOps, Security, and product teams to stabilize, secure, and scale applications for continued growth
  • Automate deployment, monitoring, and incident response processes to enhance system reliability and performance
  • Continuously monitor system health, proactively identify issues, and implement solutions to ensure optimal performance
  • Identify and troubleshoot performance bottlenecks and reliability issues across the stack
  • Implement best practices for cloud-based infrastructure, ensuring security, scalability, and cost efficiency
  • Lead the effort to triage and mitigate incidents, and perform periodic on-call duty if issues are escalated
  • Communicate effectively with engineering and infrastructure teams, and describe problems succinctly with sufficient detail
  • Engage in real-time communication during outages with both technical and non-technical audiences
Desired Qualifications
  • Experience with serverless computing and serverless architectures (e.g., AWS Lambda, Azure Functions)