Join a cutting-edge and well-funded hardware startup in Silicon Valley as a Multi-chiplet fabric performance engineer within the Data Parallel Accelerator team. Our mission is to reimagine silicon and create Risc-V based Accelerated computing platforms that will transform the industry. Use of chiplets is key to maximize compute on each socket and cross-chiplet fabric play a critical role in ensuring that each compute element has a low latency and high bandwidth path to system memory. You will have the opportunity to work with some of the most talented and passionate engineers in the world to create designs that push the envelope on performance, energy efficiency and scalability. We offer a fun, creative and flexible work environment, with a shared vision to build products to change the world.
Requirements
- Knowledge of on-chip interconnection networks (NoC) including routing, arbitration, deadlock-avoidance and flow control methods.
- Experience in developing and working with interconnect-only as well as full-system simulators.
- Cycle accurate performance modeling experience using C/C++.
- Analytical modeling for fabric components with focus on latency and bandwidth.
- Ability to work well in a team and be productive under aggressive schedules.
- Proficiency in C or C++, Python and System Verilog.
- Experience with high-level simulators for power estimation is a plus.
- Excellent skills in problem solving, written and verbal communication, excellent organization skills, and highly self-motivated.
Responsibilities
- Work closely with fabric architects and micro-architects to define the multi-chiplet interconnection solution for Data Parallel Accelerator.
- Ability to represent key design details in a Python-based analytical model for peak bandwidth estimation under different traffic scenarios.
- Ability to represent architectural and micro-architectural details in cycle accurate performance models and setup performance / design space exploration studies using those models.
- Bringup and performance debug for multi-chiplet configs including full system compute and on-chip fabrics.
- Performance correlation of model with RTL - explore high performance strategies and validate that the RTL design meets targeted performance
- Develop performance verification tests to ensure quality of model and design.
Education and Experience
Bachelor’s degree plus 2 years of relevant industry experience.
Master’s or Ph.D with internship experience.