Full-Time

Senior Data Engineer

GENEX

GENEX

501-1,000 employees

Cooperative cattle genetics and reproductive services

No salary listed

Madison, WI, USA

In Person

Category
Data & Analytics (1)
Required Skills
Microsoft Azure
Agile
Python
Apache Spark
SQL
CloudFormation
AWS
Terraform
Databricks
Requirements
  • Bachelor’s degree in Computer Science, Information Systems, Bioinformatics, Computational Biology, or a related field; a Master’s degree is an asset.
  • 7+ years of experience in data integration and reporting, with experience designing and operating cloud-based data platforms.
  • Extensive experience with Databricks, including Python, Spark, and Delta Lake.
  • Strong proficiency with relational databases (e.g., SQL Server, RDS), including TSQL, stored procedures, and functions.
  • Experience with data warehousing concepts and best practices.
  • Experience with Microsoft Azure cloud platform; exposure to Microsoft Fabric is desirable.
  • Hands on experience working with biological, genomic, or other omics datasets in a bioinformatics or life sciences setting (e.g., sequence data, SNP arrays, GWAS outputs, phenotypic traits).
  • Familiarity with common bioinformatics tools, data formats (e.g., FASTQ, VCF, PLINK), and workflows is highly desirable.
  • Strong analytical and problem-solving skills, with the ability to reason about complex data and scientific requirements.
  • Excellent communication and interpersonal skills.
  • Ability to work independently and as part of a cross-functional team across IT, science, and business.
  • Experience with Agile methodologies.
  • Demonstrated background in bioinformatics or computational biology, preferably supporting genetics, breeding, or life science research in an applied or commercial context.
  • Must be legally authorized to work in the United States.
Responsibilities
  • Design, develop, and maintain robust and efficient ETL/ELT pipelines and processes on Databricks for both operational and bioinformatics datasets (e.g., genomic markers, phenotypic data, laboratory outputs).
  • Ingest, transform, and harmonize structured and semi-structured biological data from lab systems, LIMS, sequencing platforms, and external partners into the enterprise data platform.
  • Troubleshoot and resolve Databricks pipeline errors and performance issues.
  • Optimize data flow performance and minimize data latency across scientific and business use cases.
  • Implement data quality checks, validations, and reconciliation processes within ETL workflows, including domain-specific checks for genomic and phenotypic data.
  • Develop and maintain Databricks pipelines, notebooks, and datasets using Python, Spark, and SQL.
  • Optimize Databricks jobs for performance and cost-effectiveness, including largescale bioinformatics and analytics workloads.
  • Integrate Databricks with other data sources and systems, including lab instruments, genomic databases, and on-prem or cloud data stores.
  • Participate in the design and implementation of data lake architectures that support both traditional analytics and bioinformatics pipelines.
  • Participate in the design and implementation of data warehousing solutions to support reporting, analytics, and scientific modeling.
  • Model and curate subject areas for genetics, reproduction, and bioinformatics (e.g., animals, pedigrees, genotypes, breeding values, trials).
  • Support data quality initiatives and implement data cleansing procedures across business and scientific domains.
  • Collaborate with business users, scientists, geneticists, and bioinformaticians to understand data requirements for department-driven reporting and analytics needs.
  • Maintain and extend the existing library of complex dashboards and visualizations to surface genetic, reproductive, and operational insights.
  • Enable self-service analytics for R&D and product teams by exposing well- governed, documented data products.
  • Troubleshoot and resolve report issues, including performance bottlenecks and data inconsistencies.
  • Apply strong programming skills in Python, SQL, and Spark to build scalable data and bioinformatics workflows.
  • Use CI/CD and IaC tools (Terraform, ARM, CloudFormation) to automate deployment of data platform components and analytics environments.
  • Design and implement Databricks platform architecture on Azure and AWS infrastructure, including environments that support largescale scientific computation.
  • Contribute to cloud security, governance, and cost optimization practices for data and bioinformatics workloads.
  • Partner with geneticists, biostatisticians, and bioinformaticians to translate scientific requirements into scalable data and platform architectures.
  • Support or orchestrate bioinformatics pipelines (e.g., variant processing, quality control, annotation, genotype imputation, genomic evaluation) using cloud and Databricks capabilities.
  • Ensure that data models, pipelines, and storage structures meet the needs of downstream analytics, predictive models, and genetic evaluations.
  • Advocate for best practices in managing sensitive biological and genetic data, including data governance, access control, and compliance with relevant standards and regulations.
  • Thrive in an entrepreneurial, self-starting, and fast-paced environment, working both independently and with our highly skilled teams.
  • Collaborate effectively with business users, data analysts, scientists, and other IT teams.
  • Communicate technical information clearly and concisely, both verbally and in writing, to technical and nontechnical stakeholders.
  • Document all development work, data models, and procedures thoroughly, including bioinformatics and scientific data flows.
  • Keep abreast of the latest advancements in data integration, cloud platforms, bioinformatics tooling, and data engineering technologies.
  • Continuously improve skills and knowledge through training and self-learning in both data engineering and bioinformatics domains.
Desired Qualifications
  • Exposure to Microsoft Fabric is desirable.
  • Familiarity with common bioinformatics tools, data formats (FASTQ, VCF, PLINK), and workflows is highly desirable.
  • Demonstrated background in bioinformatics or computational biology, preferably supporting genetics, breeding, or life science research in an applied or commercial context.

GENEX is a cooperative that provides cattle genetics and reproductive solutions to farmers and ranchers. It sells semen and embryos from genetically superior sires and offers related services such as reproductive consulting, artificial insemination training, and herd management products, creating a full suite of genetic and reproductive options for beef and dairy herds. The company’s products work by using top-quality genetics through semen and embryo sales complemented by hands-on services that support breeding programs and herd performance. Unlike many suppliers, GENEX is owned and governed by its farmer-members, aligning its activities with member needs and enabling a portion of profits to be returned to members. Its goal is to improve herd efficiency and profitability by focusing on economically important traits like milk production, feed efficiency, and disease resistance while expanding access to these genetics globally.

Company Size

501-1,000

Company Stage

N/A

Total Funding

N/A

Headquarters

Shawano, Wisconsin

Founded

1996

Simplify Jobs

Simplify's Take

What believers are saying

  • GenChoice™ sexed semen maximizes heifer calves amid high replacement costs.
  • GENEX Beef App simplifies breeding with up-to-date genetic data.
  • Guaranteed buyers in FeedWise™ ensure profitability for terminal calves.

What critics are saying

  • ABS Global's genomic testing erodes 50–70% market share in 12–24 months.
  • STgenetics undercuts FeedWise™ with cheaper hybrid vigor in 6–12 months.
  • Semex's index drives defection with 15% higher milk yield in 18–36 months.

What makes GENEX unique

  • Cooperative owned by farmer-members aligns efforts with herd improvement needs.
  • ICC™ index from 2014 targets problem-free, high-profit dairy animals.
  • FeedWise™ program uses Leachman Stabilizer™ for superior beef-on-dairy crosses.

Help us improve and share your feedback! Did you find this helpful?

Benefits

Medical, Dental, Vision Insurance

Health Savings Account/Flexible Spending Account

401(k) Company Match

Company paid Life Insurance

Short- and Long-Term Disability

EAP Program

Company News

Genex Canada
Jan 21st, 2021
Genex Cooperative, Inc. and Cooperative Resources International partnered with Nedap Livestockmanagement on May 1st 21'.

“GENEX chose to partner with Nedap because they offer the most accurate, complete and reliable activity monitoring solution in the market,” says May.