Staff Software Engineer
ML Training
Confirmed live in the last 24 hours
Pinterest

5,001-10,000 employees

Visual discovery engine for finding ideas
Company Overview
Pinterest stands out as a unique platform that harnesses the power of visual discovery, offering an extensive dataset of over 200 billion ideas, making it a treasure trove for those seeking inspiration in various aspects of life. The company's culture is rooted in creativity and inclusivity, fostering an environment where over 430 million global users can explore, plan, and actualize their dreams. With its unique blend of technology and creativity, Pinterest has positioned itself as a leader in the social media industry, providing a distinct and personalized user experience that sets it apart from competitors.
AI & Machine Learning
Consumer Goods
Data & Analytics

Company Stage

N/A

Total Funding

$2.9B

Founded

2010

Headquarters

San Francisco, California

Growth & Insights
Headcount

6 month growth

8%

1 year growth

15%

2 year growth

46%
Locations
Remote in USA
Experience Level
Entry
Junior
Mid
Senior
Expert
Desired Skills
Kubernetes
Python
Apache Flink
Jupyter
Pytorch
Apache Spark
Java
Yarn
Data Analysis
CategoriesNew
AI & Machine Learning
Software Engineering
Requirements
  • 7+ years of experience in software engineering and machine learning, with a focus on building and maintaining ML infrastructure or Batch Compute infrastructure like YARN/Kubernetes/Mesos
  • Technical leadership experience, devising multi-quarter technical strategies and driving them to success
  • Strong understanding of High Performance Computing and/or and parallel computing
  • Ability to drive cross-team projects; Ability to understand our internal customers (ML practitioners and Data Scientists), their common usage patterns and pain points
  • Strong experience in Python and/or experience with other programming languages such as C++ and Java
Responsibilities
  • Implement cost effective and scalable solutions to allow ML engineers to scale their ML training and inference workloads on compute platforms like Kubernetes
  • Lead and contribute to key projects; rolling out GPU sharing via MIGs and MPS , intelligent resource management, capacity planning, fault tolerant training
  • Lead the technical strategy and set the multi-year roadmap for ML Training Infrastructure that includes ML Compute and ML Developer frameworks like PyTorch, Ray and Jupyter
  • Collaborate with internal clients, ML engineers, and data scientists to address their concerns regarding ML development velocity and enable the successful implementation of customer use cases
  • Forge strong partnerships with tech leaders in the Data and Infra organizations to develop a comprehensive technical roadmap that spans across multiple teams
  • Mentor engineers within the team and demonstrate technical leadership
Desired Qualifications
  • Experience with GPU programming, containerization, orchestration technologies is a plus
  • point for experience working with cloud data processing technologies (Apache Spark, Ray, Dask, Flink, etc.) and ML frameworks such as PyTorch