Full-Time

Staff Software Engineer

GPU Infrastructure, HPC

Cohere

Cohere

501-1,000 employees

API-based NLP tools and LLMs

No salary listed

Canada + 1 more

More locations: United States

Hybrid

Remote-flexible; candidates must reside in Canada or United States; offices in Toronto, New York, San Francisco, London, Paris.

Category
Software Engineering (1)
Required Skills
Kubernetes
Python
Tensorflow
Pytorch
Infrastructure as Code (IaC)
Go
Observability
Linux/Unix
Requirements
  • Deep expertise in ML/HPC infrastructure: Experience with GPU/TPU clusters, distributed training frameworks (JAX, PyTorch, TensorFlow), and high-performance computing (HPC) environments.
  • Kubernetes at scale: Proven ability to deploy, manage, and troubleshoot cloud-native Kubernetes clusters for AI workloads.
  • Strong programming skills: Proficiency in Python (for ML tooling) and Go (for systems engineering), with a preference for open-source contributions over reinventing solutions.
  • Low-level systems knowledge: Familiarity with Linux internals, RDMA networking, and performance optimization for ML workloads.
  • Research collaboration experience: A track record of working closely with AI researchers or ML engineers to solve infrastructure challenges.
  • Self-directed problem-solving: The ability to identify bottlenecks, propose solutions, and drive impact in a fast-paced environment.
Responsibilities
  • Build and scale ML-optimized HPC infrastructure: Deploy and manage Kubernetes-based GPU/TPU superclusters across multiple clouds, ensuring high throughput and low-latency performance for AI workloads.
  • Optimize for AI/ML training: Collaborate with cloud providers to fine-tune infrastructure for cost efficiency, reliability, and performance, leveraging technologies like RDMA, NCCL, and high-speed interconnects.
  • Troubleshoot and resolve complex issues: Proactively identify and resolve infrastructure bottlenecks, performance degradation, and system failures to ensure minimal disruption to AI/ML workflows.
  • Enable researchers with self-service tools: Design intuitive interfaces and workflows that allow researchers to monitor, debug, and optimize their training jobs independently.
  • Drive innovation in ML infrastructure: Work closely with AI researchers to understand emerging needs (e.g., JAX, PyTorch, distributed training) and translate them into robust, scalable infrastructure solutions.
  • Champion best practices: Advocate for observability, automation, and infrastructure-as-code (IaC) across the organization, ensuring systems are maintainable and resilient.
  • Mentorship and collaboration: Share expertise through code reviews, documentation, and cross-team collaboration, fostering a culture of knowledge transfer and engineering excellence.

Cohere provides access to advanced Natural Language Processing (NLP) tools and Large Language Models (LLMs) through a simple API. It serves businesses that want to improve content generation, summarization, and semantic search across multiple languages. The product works by offering API access to pre-trained models that perform tasks like text classification, sentiment analysis, and semantic search; users can customize and integrate these models into their applications, enabling scalable and affordable AI-powered solutions. Cohere differentiates itself with a developer-friendly API, multilingual support, and easy customization to help organizations build smarter and faster AI solutions. The company’s goal is to make powerful generative AI tools accessible to a wide range of customers and use cases, letting them deploy AI features quickly without managing complex models themselves.

Company Size

501-1,000

Company Stage

Series E

Total Funding

$2.1B

Headquarters

Toronto, Canada

Founded

2019

Simplify Jobs

Simplify's Take

What believers are saying

  • Cohere tripled revenue past $150M in 2025 with clients RBC, BCE, and SAP.
  • Schwarz Group invests $600M in Series E amid Aleph Alpha merger valuing $20B.
  • Cohere partnered with Ensemble for first RCM-native LLM in healthcare.

What critics are saying

  • Aleph Alpha merger causes cultural clashes, defecting clients to Oracle within 12 months.
  • Tiny Aya open-sourcing cannibalizes premium API revenue to HuggingFace in 6 months.
  • US scrutiny revokes FedRAMP High over merger, collapsing 25% ARR in 12 months.

What makes Cohere unique

  • Cohere achieved FedRAMP High authorization in under 90 days via Second Front.
  • Cohere launched North agentic AI platform for enterprise workplace productivity.
  • Cohere offers zero data retention and opt-out training for enterprise customers.

Help us improve and share your feedback! Did you find this helpful?

Your Connections

People at Cohere who can refer or advise you

Benefits

Health Insurance

Dental Insurance

100% Parental Leave top-up

Weekly lunch stipend

Remote Work Options

6 weeks of vacation

Growth & Insights and Company News

Headcount

6 month growth

-3%

1 year growth

-3%

2 year growth

0%
The Associated Press
Mar 31st, 2026
Ensemble and Cohere build first RCM-native LLM for healthcare revenue cycle management

Ensemble, a US revenue cycle management services provider, has partnered with enterprise AI company Cohere to build the healthcare industry's first revenue cycle management-native large language model. The companies are creating a custom model informed by Ensemble's operational expertise and data, designed to handle complex healthcare financial operations more accurately than general-purpose LLMs. The model will be embedded into AI agents managing processes from patient intake to account resolution. Unlike standard approaches that rely on prompt engineering, this system is fine-tuned on real RCM tasks and trained using synthetic datasets in a HIPAA-compliant environment, without using identifiable patient data. The solution aims to enhance existing electronic health record systems by providing better context and guidance for navigating payer requirements whilst reducing administrative burden for healthcare providers.

TechCrunch
Feb 17th, 2026
Cohere launches Tiny Aya, open multilingual AI models supporting 70+ languages on laptops

Cohere has launched Tiny Aya, a family of open-weight multilingual AI models supporting over 70 languages that can run on everyday devices without internet connectivity. The models were unveiled at the India AI Summit by the company's research arm, Cohere Labs. The base model contains 3.35 billion parameters and includes regional variants: TinyAya-Global for broad language support, TinyAya-Earth for African languages, TinyAya-Fire for South Asian languages, and TinyAya-Water for Asia Pacific, West Asia and Europe. South Asian language support includes Bengali, Hindi, Punjabi, Tamil and Telugu. Trained on 64 Nvidia H100 GPUs using modest computing resources, the models enable offline applications like translation, particularly useful in linguistically diverse countries like India. The models are available on HuggingFace, Kaggle and the Cohere Platform.

The Associated Press
Feb 10th, 2026
SAP and Cohere launch sovereign AI solutions in Canada for public sector and regulated industries

SAP and Cohere are expanding their partnership to deliver sovereign AI solutions globally, beginning in Canada. SAP Canada plans to integrate Cohere's agentic platform, North, into its Enterprise Resource Planning Sovereign Cloud environment, creating a complete Sovereign AI Layer for public sector and regulated industries. The integration embeds Cohere's large language models into SAP's Canadian-operated sovereign cloud infrastructure, allowing organisations to deploy advanced AI whilst maintaining data residency and operational control. This addresses the challenge of innovating with AI without compromising security or data sovereignty. A recent SAP AI report found that whilst 71% of organisations rely on data for investment decisions, 75% report incomplete data as a significant challenge. The partnership aims to overcome data fragmentation by embedding AI directly into core SAP applications.

Stockwatch
Dec 30th, 2025
Cohere triples revenue past $150M, lands RBC and BCE as clients

Toronto-based Cohere raised $600 million in 2025, achieving a $7 billion valuation, as the generative AI company secured contracts with major clients including RBC, Bell, Dell, Thales, SAP and LG for its office automation software. The company, which hired researcher Joëlle Pineau as chief AI officer, entered 2025 with approximately $50 million in annualised revenues and exited the year at more than triple that level. Chief executive Aidan Gomez expects dramatic growth to continue in 2026. Cohere has joined an elite group of 77 Canadian technology companies surpassing $100 million in annual revenue, a key threshold for sector maturity. The company also expanded internationally, opening offices worldwide during its breakthrough year.

Microsoft
Oct 6th, 2025
Cohere Raises $500M, Valued at $6.8B

AI startup Cohere Inc. has secured $500 million in new funding, valuing the company at $6.8 billion. This funding round is part of Cohere's strategy to compete with larger tech firms.