Full-Time

Cloud Senior Site Reliability Engineer

Confirmed live in the last 24 hours

Bank of America

Bank of America

10,001+ employees

Provides banking and financial services globally

Compensation Overview

$149.8k - $188.9k/yr

+ Discretionary Incentive

Senior, Expert

Plano, TX, USA + 7 more

More locations: Richmond, VA, USA | Charlotte, NC, USA | New York, NY, USA | Jacksonville, FL, USA | Kennesaw, GA, USA | Chandler, AZ, USA | Lawrence Township, NJ, USA

Job is primarily located in Jersey City, NJ. No remote work option mentioned.

Category
DevOps & Infrastructure
Site Reliability Engineering
Required Skills
Microsoft Azure
Python
Grafana
Git
Java
.NET
AWS
Prometheus
Jenkins
Splunk
Linux/Unix
Databricks
Google Cloud Platform
Requirements
  • 10+ years of combined experience in either SRE, software development, or infrastructure engineering.
  • 7+ years of hands-on experience building and maintaining cloud platforms on a major cloud service provider.
  • Strong experience in implementing, monitoring, and maintaining a highly scalable and resilient Data Services platform on major CSP’s like AWS, Azure or GCP.
  • Strong experience with monitoring tools such as Grafana, Prometheus, Splunk, or Dynatrace, as well as cloud native tools like CloudWatch & CloudTrail, Azure Monitor and Log Analytics.
  • Proficiency in implementing, monitoring, and maintaining a Databricks, RDS, or OpenAI platform.
  • Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net; 5+ years applied experience in Python/Java.
  • Proficiency in implementing CI/CD pipelines with tools such as git and Jenkins, familiarity with using a GitOps model.
  • Advanced knowledge of networking (firewalls, DNS, Load Balancing, Proxies, etc.).
  • Advanced understanding of Linux & Windows operating systems including shell scripting.
  • Excellent interpersonal, organizational and communication (written, verbal, and presentation) skills are a must.
  • Proven ability to work independently with minimal supervision and as part of a global team with direct responsibilities and an ability to juggle competing priorities and adapt to changes in project scope.
Responsibilities
  • Designs solutions to visualize key production support metrics enabling Operational Readiness and Site Reliability Engineer teams to identify scenarios requiring intervention.
  • Develops software solutions and/or improved processes to address work identified as ‘toil’ by collaborating with key partners to identify, track and remediate processes to free time allocated to reliability.
  • Partners with Development and Infrastructure teams to create error budget policies prioritizing reliability stories that fall below Service Level Objective (SLO) thresholds and suggests code optimizations, additional instrumentation and/or logging structures to gain service reliability visibility.
  • Identifies and plans for capacity bottlenecks, vulnerabilities and opportunities for reliability improvement, such as low level error rates and 'noise', and reduces manual support effort and/or improves system reliability.
  • Assesses monitoring for new changes with development partners and works with monitoring tools team to monitor dashboards and enhance application and system monitoring designs.
  • Engages as a subject matter expert in incident triage efforts, failure scenario modelling and works with the Problem Manager to diagnose root causes for complex/high impact incident/problem management investigations.
  • Collaborates with Development and Infrastructure teams to understand technical solutions and develop Service Level Indicators and SLOs to measure/improve the reliability of the services they support.
Desired Qualifications
  • Strong experience working with a complex IAM infrastructure, including Active Directory, Azure AD Connect, Azure AD, and PingIdentity, Okta, or other SSO solutions.
  • Proficiency in creating automation using Python, Terraform, or Ansible.
  • Proficiency in implementing, monitoring, and maintaining a Databricks, CosmosDB, or OpenAI platform.
  • Experience in implementing, monitoring, and maintaining a highly scalable and resilient enterprise platform on Microsoft Azure using native services related to compute, storage, networking, security, and observability.
  • Experience with containerization technologies such as EC2, EKS, Fargate, Openshift, or Kubernetes.
  • Understanding of cost management, inventory management, FinOps model.

Bank of America provides a wide range of financial services, including banking, investing, asset management, and risk management products. It caters to individuals, small and medium-sized businesses, and large corporations, serving around 56 million consumer and small business accounts in the U.S. The company's services include personal banking, credit cards, loans, and investment options. Bank of America stands out from its competitors by being a leading wealth management firm and a major player in corporate and investment banking. Its goal is to help clients achieve their financial objectives through comprehensive financial solutions.

Company Size

10,001+

Company Stage

IPO

Headquarters

Charlotte, North Carolina

Founded

1904

Simplify Jobs

Simplify's Take

What believers are saying

  • Increased demand for digital banking services boosts Bank of America's online platforms.
  • Growing interest in sustainable finance aligns with Bank of America's ESG initiatives.
  • Fintech partnerships enhance Bank of America's technological capabilities and customer experience.

What critics are saying

  • Increased competition in credit facilities may pressure Bank of America's interest rates.
  • Involvement in large syndicated loans exposes Bank of America to higher credit risk.
  • Shift towards capital raising through share sales may reduce demand for traditional loans.

What makes Bank of America unique

  • Bank of America is a global leader in corporate and investment banking.
  • It serves approximately 56 million U.S. consumer and small business relationships.
  • The bank is a leader in wealth management and financial services.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Life Insurance

Disability Insurance

Paid Vacation

Paid Sick Leave

Flexible Work Hours

Remote Work Options

Professional Development Budget

Conference Attendance Budget

Company News

Cision
Mar 14th, 2025
Loomis Signs A Five-Year Credit Facility Of Eur 415 Million

Loomis signs a five-year credit facility of EUR 415 million

The Asset
Feb 27th, 2025
Citigroup leads US$850 million debt syndicate for Danaos

New York-listed Danaos Corporation has signed a syndicated loan amounting to US$850 million for the financing of 14 containerships being built in Chinese yards.

GlobeNewswire
Feb 26th, 2025
FirstService Increases Credit Facility to US$1.75 Billion

TORONTO, Feb. 26, 2025 (GLOBE NEWSWIRE) -- FirstService Corporation (TSX: FSV; NASDAQ: FSV) (“FirstService”) announced today that it has expanded and...

GlobeNewswire
Feb 18th, 2025
FreightCar America, Inc. Announces New $35 Million Asset-Based Lending Credit Facility

Expanded credit facility enhances borrowing capacity and reduces cost of capitalFurther enhances financial flexibility and ability to support growth...

Business Wire
Feb 15th, 2025
Sarepta Therapeutics Announces Inaugural $600 Million Senior Secured Revolving Credit Facility

Sarepta Therapeutics, Inc. (NASDAQ:SRPT), the leader in precision genetic medicine for rare diseases, announced today that it has closed on a $600 mil