C the Signs

C the Signs

C the Signs is a cutting-edge cancer prediction system that uses artificial intelligence to enhance early detection and survival rates. The company is dedicated to reducing healthcare disparities by accelerating early cancer detection, improving patien...

Professional Services
51-250
Founded 2017

Description

  • Collaborate with data scientists and machine learning engineers to define data requirements for LLM and machine learning model fine-tuning.
  • Design, build, and maintain scalable data pipelines to ingest, process, and store large, diverse healthcare datasets.
  • Implement robust data cleaning, validation, and transformation processes to ensure dataset quality and integrity.
  • Implement and maintain data validation and monitoring to ensure the integrity, accuracy, and consistency of training datasets.
  • Develop and optimize data structures and schemas for efficient access and utilization by LLMs and machine learning models.
  • Identify and acquire new data sources and ensure compliance with relevant healthcare regulations (e.g., HIPAA).
  • Monitor data pipeline performance, troubleshoot production issues, and implement performance and reliability optimizations.
  • Document data engineering processes, data models, and data dictionaries for team use and governance.
  • Stay up-to-date with advancements in data engineering, big data technologies, and machine learning to inform pipeline and data design.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Proven experience as a Data Engineer with a focus on big data technologies.
  • Proficiency in programming with Python, Scala, or Java.
  • Extensive experience with data warehousing, ETL processes, and data modeling.
  • Experience with major cloud providers (AWS, GCP, or Azure) and their data storage and processing services.
  • Hands-on experience with big data frameworks such as Apache Spark for distributed processing.
  • Experience with healthcare data and familiarity with healthcare data standards (e.g., FHIR, HL7) (preferred).
  • Familiarity with machine learning concepts and LLM fine-tuning processes (preferred).
  • Experience with data orchestration tools such as Apache Airflow (preferred).
  • Work authorization: must be a US citizen, Green Card holder, or currently in the US with a valid H‑1B visa.

Benefits

  • Competitive salary and benefits package.
  • Flexible working arrangements with remote or hybrid options.
  • Opportunity to work on AI technology that directly impacts patient outcomes and health equity.
  • Membership on a mission-driven team combining innovation with healthcare impact.
  • Continuous learning opportunities with access to the latest tools and advancements in AI and healthcare.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Python Data Engineer

Flatgigs Professional Services

FlatGigs is hiring a Python Data Engineer to build and maintain reliable, scalable data pipelines and ETL workflows that support analytics and data-driven decision-making.

Apache Airflow AWS Azure Docker GCP Kubernetes MongoDB NumPy Pandas PostgreSQL Python SQL
1 hour, 47 minutes ago

Data Engineer- Remote

DeepSource 1-10 Internet Software & Services

Data Engineer at the organization responsible for designing and maintaining data architecture and delivering data solutions that support the company’s data needs.

Apache Spark Azure Databricks Pandas Python SQL SQL Server Terraform
2 hours, 41 minutes ago

Senior Data Infrastructure Engineer

Voltus 251-1K Electric Utilities

Voltus is hiring a Senior Data Infrastructure Engineer to own and strengthen the core data systems that power analytics, reporting, and future AI-ready applications for its remote climate-tech platform.

Apache Airflow AWS Dagster Databricks Datadog dbt GCP Go Jupyter Looker Machine Learning Mode Prometheus Python Redash Superset
5 hours, 58 minutes ago

Senior Software Engineer - Data Integration & JVM Ecosystem

ClickHouse 51-250 IT Services

ClickHouse is hiring a Senior Software Engineer for its Connectors team to build and maintain JVM-based data integrations that connect the database to widely used data engineering and visualization platforms.

Apache Airflow Apache Spark dbt Grafana HTTP Java Kafka Metabase Pandas Power BI Python SQL Tableau TCP/IP
7 hours, 36 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers