Simplify Logo

Full-Time

Principal Performance Modeling Engineer

Confirmed live in the last 24 hours

Groq

Groq

201-500 employees

Develops real-time AI inference hardware and software

Hardware
AI & Machine Learning

Compensation Overview

$240.2k - $420.4kAnnually

+ Equity + Benefits

Senior, Expert

Remote in USA

Category
Applied Machine Learning
AI Research
AI & Machine Learning
Required Skills
Python
Data Structures & Algorithms
Requirements
  • Computer science, mathematics, ECE or equivalent background and/or experience in this domain
  • Strong fundamentals in computer architecture, with deep knowledge and experience of working on domain specific AI architectures, is highly preferred
  • In-depth understanding of latest AI/ML algorithms and their hardware implications
  • Ability to analyze and simplify complex hardware designs into simple abstracted timing models
  • Past experience on modeling AI/ML workloads, and creating necessary tools for performance optimization. Experience with modeling LLM performance is beneficial, but not required
  • Proficient in programming languages such as C/C++ and Python
  • Experience with cycle-accurate simulators for benchmarking analysis
  • Experience with developing ASIC microarchitecture design is a plus
  • Experience with understanding and simulating RTL (systemVerilog) designs is a plus
Responsibilities
  • Develop and maintain performance models for multiple generations of Groq hardware on the latest AI/ML workloads (LLMs, CNNs, LSTMs, etc.)
  • Analyze AI/ML algorithms to understand their compute, networking and memory requirements, and map them effectively onto the underlying hardware architecture
  • Lead a matrixed team to enable SW/HW co-optimization across chip, system and software teams
  • Identify performance bottlenecks and help drive next generation chip architecture through a solid understanding of Groq's software and hardware
  • Work with silicon and system integration engineers to evaluate the costs & benefits of new technologies on Groq systems
  • Provide what-if scenarios / continuous guidance directly to CEO & senior leadership
  • Develop the Design Space Exploration (DSE) tool for performance analysis and exploration of both chip and system across various workloads
  • Define custom hardware solutions for high profile customers

At Groq, the focus on real-time AI solutions through a Language Processing Unit™ and a deterministic Tensor Streaming architecture allows the company to excel in delivering ultra-low latency AI inference at scale. This specialized approach not only enhances technology performance but also simplifies processes for developers, accelerating production timelines and improving return on investment. Additionally, Groq's commitment to domestically-based supply chains supports higher sustainability and reliability in operations, making it an effective workplace for driving forward-thinking AI technology solutions.

Company Stage

Series C

Total Funding

$408.6M

Headquarters

Mountain View, California

Founded

2016

Growth & Insights
Headcount

6 month growth

25%

1 year growth

42%

2 year growth

13%
Simplify Jobs

Simplify's Take

What believers are saying

  • Groq's recent $300 million Series D funding round, led by BlackRock, values the company at $2.5 billion, indicating strong investor confidence and financial stability.
  • The launch of public demos on platforms like Hugging Face Spaces allows users to interact with Groq's models, potentially increasing user engagement and adoption.
  • Groq's rapid query response times, significantly faster than competitors like Nvidia, position it as a leader in AI inference speed.

What critics are saying

  • The competitive landscape with established players like Nvidia poses a significant challenge to Groq's market penetration.
  • High expectations from investors following substantial funding rounds could pressure Groq to deliver rapid and consistent innovation.

What makes Groq unique

  • Groq's open-source Llama AI models outperform proprietary models from tech giants like OpenAI and Google in specialized tasks, showcasing their superior tool use capabilities.
  • Groq's processors, known as LPUs, are claimed to be 10x faster and 1/10 the price of current market options, providing a significant cost-performance advantage.
  • The company's participation in the National AI Research Resource (NAIRR) Pilot highlights its commitment to responsible AI innovation and real-time AI inference.