To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.
About Futureforce University Recruiting
Our Futureforce University Recruiting program is dedicated to attracting, retaining and cultivating talent. Our interns and new graduates work on real projects that affect how our business runs, giving them the opportunity to make a tangible impact on the future of our company. With offices all over the world, our recruits have the chance to collaborate and connect with fellow employees on a global scale. We offer job shadowing, mentorship programs, talent development courses, and much more.
Software Engineering
About Salesforce
We’re Salesforce, the Customer Company, inspiring the future of business with AI+ Data +CRM. Leading with our core values, we help companies across every industry blaze new trails and connect with customers in a whole new way. And, we empower you to be a Trailblazer, too — driving your performance and career growth, charting new paths, and improving the state of the world. If you believe in business as the greatest platform for change and in companies doing well and doing good – you’ve come to the right place.
Associate Site Reliability Engineer (GovCloud) (New Grad)
PLEASE NOTE: Qualification for this job is contingent upon acceptable results from a background investigation as well as your obtaining and maintaining the specific level U.S. government background investigation required for this role.
Location:
Burlington, MA or Herndon, VA
Start Date:
August 7, August 21, September 5, September 18, October 2, October 16
Salesforce is seeking an engineering candidate to join the Site Reliability organization. Working closely with counterparts in the Infrastructure and R&D organizations, this organization provides a team of engineers monitoring cloud service availability and ready to swiftly repair any service-impacting issues. Seven days a week, 24 hours a day, the Site Reliability team keeps the Salesforce cloud and our customers protected. As a member of the Site Reliability team, you will be responsible for the primary task of detecting and resolving incidents within minutes. This objective is met by monitoring the services, reacting to problems, and proactively addressing issues before they affect performance or availability.
The team contributes to the customer and Salesforce by securing data through monitoring, automation, self-healing and resiliency initiatives, destructive testing, and game day exercises. The incumbent in this role would demonstrate a strong focus on tactical operations, as well as large-scale production engineering and orchestration.
Team Description:
GovCloud Incident Response (GIR) maintains the current infrastructure with day to day alert response, smart hands and incident management; including retrospectives and following up on long term re-mediations.
GovCloud Observability and Analytics Team (GOAT) team is responsible for detection of system and services related issues with stipulated SLAs. Investigating and adopting next-gen observability software and perform data analysis to enable us to detect an issue before customers encounter any issues.
Government Service Performance, Availability and Resilience (GovSPAR) team is responsible for ensuring the overall availability of systems and services within the GovCloud environments, maintaining systems integrity through patching and vulnerability management, and triage and diagnosis of issues during production incidents.
Role Description:
Keep the customer-facing services available at top performance by maintaining the constant health of the supporting systems.
Incident management - Act in key support roles during major incidents e.g. Sev0, Sev1. Also, participate in the technical review of the incident for problem management
Problem Management - populate and participate in RCAs and hand them off to the Global Solutions team
Ensuring that work carried out by the Site Reliability team is performed in such a way as to stay in sync with the company’s internal compliance policy and directives
Passionate about solving technical issues and customer concerns with other technical staff as the need arises
Work with and lead other members of the team in staying on top of key industry innovation and technology, and assist in team development growth
Ability to operate in the fast paced environment and solve sophisticated issues quickly successfully balance multiple priorities
Work to automate detection and resolution of recurring issues in the production environment
Help create and improve current processes to reduce operations and engineering toil
Minimum Requirements:
Must be a U.S. citizen (U.S. born or naturalized) who does not hold dual citizenship. You agree to complete a Minimum Background Investigation (MBI) for a Moderate Public Trust position with the U.S. federal government or other clearances as deemed appropriate for the role.
Bachelor’s or Master’s Degree with 1-5 years of prior experience in a relevant field (must have graduated in the past 1 year)
Systems engineering experience in enterprise scale internet service engineering or support role
Expertise in TCP/IP related technologies (networking protocols, network programming, etc.)
Expertise in CLI enterprise support of Unix variants (Linux/Solaris/BSD) as well as strong Linux/UNIX knowledge with significant exposure to Red Hat Enterprise Linux and Solaris
Strong understanding of monitoring security systems and administration
Strong Communication skills (Written and Oral)
Past experience in Incident Management and good understanding of ITIL service operations
Willingness to work in a 24/7 team managing large data centers
Be available for shift work and being on call if required
Experience provisioning, operating, and running AWS/C2S based infrastructure and systems
Understand and have experience with writing scripts in Python, Go, or other languages
Preferred Qualifications:
Prior Chef/Puppet or automated deployment experience
Prior Jenkins/Bamboo/Spinnaker pipeline execution experience
Experience in supporting and maintaining a monitoring and alert systems
Experience in supporting and maintaining Java applications
Hands on experience configuring and running AWS (Amazon Web Services), using the CLI/SDKs
Certifications in Linux+, RedHat and AWS
Experience in supporting and leading Kubernetes based applications and services
Familiar with Agile Process and DevOps
Experience taking part in blameless retrospectives, learning from incidents, and conducting post-incident investigations, including incident analysis as well as performance evaluations of responders
Working knowledge of and interest in resilience engineering including concepts such as safety II and looking at how things go right instead of how things go wrong, being proactive instead of reactive, and investigating complex sociotechnical systems
If you require assistance due to a disability applying for open positions please submit a request via this Accommodations Request Form.
At Salesforce we believe that the business of business is to improve the state of our world. Each of us has a responsibility to drive Equality in our communities and workplaces. We are committed to creating a workforce that reflects society through inclusive programs and initiatives such as equal pay, employee resource groups, inclusive benefits, and more. Learn more about Equality at www.equality.com and explore our company benefits at www.salesforcebenefits.com.
Salesforce is an Equal Employment Opportunity and Affirmative Action Employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status.Salesforce does not accept unsolicited headhunter and agency resumes.Salesforce will not pay any third-party agency or company that does not have a signed agreement withSalesforce.
Salesforce welcomes all.