Site Reliability Engineer II
Posted on 3/10/2023
INACTIVE
Locations
United States
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Node.js
AWS
Data Structures & Algorithms
Docker
Elasticsearch
Google Cloud Platform
JavaScript
Jenkins
JIRA
Git
Java
Linux/Unix
Management
Microsoft Azure
MongoDB
MySQL
Postgres
Puppet
REST APIs
Terraform
Kubernetes
Python
Ansible
Requirements
  • Expertise with Terraform and/or Ansible
  • Knowledge of JavaScript, Go, or other programming languages
  • 5+ years of professional Linux systems and software management experience
  • Expertise with Infrastructure-as-Code including Ansible and Terraform
  • Knowledgeable with code languages including: Go, Node.js, Java
  • Experience with managing infrastructure within Azure, GCP and AWS
  • Expertise with monitoring and alerting systems including Prometheus, Grafana
  • Strong script skills for systems and data driven solutions
  • JIRA experience for project/task management
  • Extensive experience in troubleshooting large-scale distributed systems
  • Strong background working in AWS, GCP, Azure and general Linux environments
  • Comprehensive background in monitoring and alerting systems in auto-remediation systems
  • Proven examples of standardizing security controls across large-scale systems
  • Comfort working within project/task management platforms
  • Cloud platforms including: Azure, GCP and AWS
  • Infrastructure coding languages: Terraform, Cloudformation, Ansible, Puppet
  • CI/CD: experience working with and supporting build and deploy pipelines and tools: Jenkins, GitHub Actions, Rundeck
  • Datastore Management and Query skills: Postgres, MySQL, Mongo, ElasticSearch, Solr
  • Container orchestration platforms: Docker, Kubernetes, EKS, AKS
  • Familiarity with coding languages including: Go, Node.js, Java, Python
  • Monitoring/Alerting Tools: Prometheus, Grafana, VividCortex, Runscope, Cloudwatch, Monitor, VictorOps
  • OS and Container Hardening: STIG, CIS, SELinux, IPTables, FIPS 140-2
  • JSON data structures and database schemas
  • API Query language: REST, GQL
Responsibilities
  • Deploy and maintain a resilient, secure, and efficient SaaS application platform to meet established SLAs
  • Automate, monitoring, management and incident response to achieve an auto-remediation system
  • Monitor site stability and performance and troubleshoot site issues
  • Scale infrastructure to meet rapidly increasing demand
  • Manage cross-functional requirements working with Engineering, Product, Services, and other departments
  • Collaborate with developers to bring new features and services into production
  • Independently design and develop tools to aid in operations and automation as well as work jointly with other team members to deliver innovative solutions to complex business and technical challenges
  • Provide deployment and operations support for multi-tiered distributed software applications
  • Estimate engineering effort, plan implementation, and rollout system changes that meet requirements for functionality, performance, scalability, reliability, and adherence to development goals and principles
  • Collaborate in a fast paced environment with multiple teams (software development, release management, build and release, etc...)
  • Collaborate in a fast paced environment with multiple teams in a dynamic entrepreneurial organization
  • Defining how the behavior of large scale systems can be achieved
  • Measuring and achieving reliability through engineering and operations work
  • Monitoring and alert development, documentation and management with the goal of creating an auto-remediation system
  • Adapting security controls to product not typically native to GA releases
  • Developing automation methods to extend standard deployment pipelines for bespoke implementations
  • Patching, policy enforcement, and audit of production systems
  • Driving the Disaster Recovery process
Desired Qualifications
  • Bachelor's degree in Computer Science or related field
  • Have worked in regulated or public sector environments through development and assessment of cloud based solutions
  • Worked with, developed, or supported continuous integration/continuous deployment systems
  • Have concrete examples ready to present for creating auto-remediation systems
Veritone

501-1,000 employees

Enterprise AI solutions platform
Company Overview
Veritone's mission is to democratize artificial intelligence and build a safer, more vibrant, transparent, and empowered society. The company is determined to invent new ways to enhance creativity and productivity like never before by investing in the unrealized potential of AI to unlock the future that once existed only in dreams.
Benefits
  • Competitive salary
  • Flexible PTO
  • Remote first environment
  • Recognition programs
  • Stock options
  • Mindfulness resources
  • 401K matching
  • Medical, dental, & vision coverage
Company Core Values
  • Visionary
  • Electrifying
  • Resilient
  • Integrity