In this role, you'll contribute across the stack: developing ingest pipelines, building scalable REST APIs, and facilitating data exploration and understanding. The platform supports large-scale data ingestion, complex queries, and interactive analysis. While your primary focus will be on the data-pipeline layer, you’ll collaborate closely with other sub-teams to ensure end-to-end functionality and performance. We’re looking for someone excited to work across the system and to improve team processes and tooling, especially for faster integration of new data sources.
Lead the design and implementation of data-processing workflows
Manage all aspects of the data-processing lifecycle (collection, discovery, analysis, cleaning, modeling, transformation, enrichment, validation)
Develop and maintain data models and JSON Schemas to ensure integrity and consistency
Collaborate with analysts and engineers to meet data requirements
Manage and optimize data storage/retrieval in Elasticsearch and Dgraph (plus MongoDB and Redis)
Orchestrate dataflow using Apache NiFi
Mentor teammates on best practices for data processing and software engineering
Use AI platforms to support hybrid automated/manual data transformation, code generation, and schema management
Work with analysts, product owners, and engineers to ensure solutions meet operational needs
Propose and implement process improvements for faster delivery of new data sources
Strong data-wrangling and dataflow background (discovery, mining, cleaning, exploration, enrichment, validation)
Proficiency in JSON and JSON Schemas (or similar)
Solid data-modeling experience
Experience with NoSQL databases (Elasticsearch, MongoDB, Redis, graph DBs)
Familiarity with dataflow tools such as Apache NiFi
Extensive experience in Python or Java (both preferred)
Experience using generative AI for code and data transformation
Git for version control; Maven for build automation
Comfortable in a Linux development environment
Familiarity with Atlassian tools (Jira, Confluence)
Strong communication and teamwork skills
Experience with various corporate data formats
Knowledge of Kafka or RabbitMQ
Proficiency in Java/Spring (Boot, MVC/REST, Security, Data)
AWS (EC2, S3, Lambda) experience
API design for data services
Frontend experience (modern JS + Vue.js or similar)
CI/CD (e.g., Jenkins), automated testing (JUnit)
Docker, Kubernetes, and other containerization tech
DevOps tools (Packer, Terraform, Ansible)
12+ years of relevant experience and a B.S. in a technical discipline
(Four additional years of experience may substitute for a degree)