Full-Time

Research Engineer

Search/IR

Firecrawl

Firecrawl

Enterprise web scraping and extraction API

Compensation Overview

$180k - $290k/yr

+ Equity

San Francisco, CA, USA

Remote

Category
Software Engineering (1)
Required Skills
Elasticsearch
Reinforcement Learning
Requirements
  • At least three years of experience building search and information retrieval systems at scale.
  • Experience building search indexes at massive scale, serving real traffic with real latency requirements, including handling billions of documents, sharding strategies, index compaction, and schema evolution.
  • Hands-on experience with ranking, relevance, and query understanding, including knowledge of BM25, learned ranking, embedding-based retrieval, and guidance on when to use each.
  • Owns the full search stack from ingestion through index to serving, including ingestion, processing, indexing, ranking, query understanding, and serving; able to reason about dependencies and architecture across layers.
  • Experience solving freshness, deduplication, and incremental indexing problems at scale, building systems that update continuously without full rebuilds and debugging related correctness issues.
  • Self-directed experimenter who generates hypotheses, designs experiments, and ships code without needing detailed roadmap or sprint planning.
  • Backgrounds that tend to do well include: search engineers with large-scale index experience in web search, e-commerce, or document search; IR researchers who have shipped work to production; infrastructure engineers with real-time indexing pipelines; engineers from Elasticsearch, Algolia, Vespa, or similar search infrastructure teams.
  • Visa requirement: US Citizenship or Visa required for San Francisco location; remote work is listed as remote (Americas) with no visa requirement for that location.
Responsibilities
  • Build and operate search indexes at massive scale, designing, building, and maintaining the indexing infrastructure that powers Firecrawl's core product and handles billions of documents with attention to latency and storage.
  • Own the full stack from ingestion to serving, including ingestion, processing, indexing, ranking, query understanding, and serving.
  • Solve ranking, relevance, and query understanding to surface the right content for the right queries; develop and iterate on ranking models and relevance scoring.
  • Tackle freshness, deduplication, and incremental indexing by building systems that keep the index fresh without full re-crawls or rebuilds.
  • Run experiments and ship results to production; design experiments, measure results, and deploy winners without external direction.
  • Collaborate closely with RL-focused Research Engineer and engineering team to connect search/IR improvements with model training and product roadmap.
Desired Qualifications
  • Backgrounds that tend to do well: search engineers from large-scale index environments (web search, e-commerce, document search); IR researchers who shipped production work; infrastructure engineers who built real-time indexing pipelines; engineers from Elasticsearch, Algolia, Vespa, or similar teams who moved beyond tuning knobs to building the engine.

Firecrawl.cloud offers a web scraping and crawling service for enterprises and large-scale projects. It provides an API with endpoints for scraping, crawling, and data extraction, including features like caching and scheduled syncs to maintain fresh, clean data. The product targets LLM engineers by delivering fast, reliable data processing suitable for feeding large language models. Revenue comes from subscription plans and scrape credits used per API request, with a Scale plan designed to scrape millions of pages. Firecrawl integrates with existing tools and workflows to fit into customers' tech stacks, emphasizes transparency and collaboration, and aims to deliver robust, continuous data updates for diverse clients ranging from startups to large enterprises.

Company Size

N/A

Company Stage

N/A

Total Funding

N/A

Headquarters

San Francisco, California

Founded

2022

Simplify Jobs

Simplify's Take

What believers are saying

  • AI infrastructure demand accelerates as enterprises build RAG pipelines and autonomous agents.
  • Recursive crawl and dynamic content handling capture value from JavaScript-heavy modern web.
  • Self-hosted option enables enterprise adoption with strict data privacy and compliance requirements.

What critics are saying

  • DMCA litigation and copyright claims from website owners over unauthorized content extraction.
  • Major cloud providers launch native scraping APIs, directly competing with superior infrastructure.
  • EU Digital Services Act enforcement threatens European operations with up to 6% revenue fines.

What makes Firecrawl unique

  • Unified API handles search, scrape, crawl, and browser interaction in single calls.
  • Dual deployment model: managed cloud ($16/month) and open-source self-hosted options.
  • LLM-optimized output formats including clean markdown and structured JSON for AI agents.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Health Insurance

Dental Insurance

Vision Insurance

Parental Leave

Unlimited Paid Time Off

Wellness Program

Pet Insurance

Sabbatical Leave