C the Signs

C the Signs

C the Signs is a cutting-edge cancer prediction system that uses artificial intelligence to enhance early detection and survival rates. The company is dedicated to reducing healthcare disparities by accelerating early cancer detection, improving patien...

Professional Services
51-250
Founded 2017

Description

  • Collaborate with data scientists and machine learning engineers to define data requirements for LLM and machine learning model fine-tuning.
  • Design, build, and maintain scalable data pipelines to ingest, process, and store large, diverse healthcare datasets.
  • Implement robust data cleaning, validation, and transformation processes to ensure dataset quality and integrity.
  • Implement and maintain data validation and monitoring to ensure the integrity, accuracy, and consistency of training datasets.
  • Develop and optimize data structures and schemas for efficient access and utilization by LLMs and machine learning models.
  • Identify and acquire new data sources and ensure compliance with relevant healthcare regulations (e.g., HIPAA).
  • Monitor data pipeline performance, troubleshoot production issues, and implement performance and reliability optimizations.
  • Document data engineering processes, data models, and data dictionaries for team use and governance.
  • Stay up-to-date with advancements in data engineering, big data technologies, and machine learning to inform pipeline and data design.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Proven experience as a Data Engineer with a focus on big data technologies.
  • Proficiency in programming with Python, Scala, or Java.
  • Extensive experience with data warehousing, ETL processes, and data modeling.
  • Experience with major cloud providers (AWS, GCP, or Azure) and their data storage and processing services.
  • Hands-on experience with big data frameworks such as Apache Spark for distributed processing.
  • Experience with healthcare data and familiarity with healthcare data standards (e.g., FHIR, HL7) (preferred).
  • Familiarity with machine learning concepts and LLM fine-tuning processes (preferred).
  • Experience with data orchestration tools such as Apache Airflow (preferred).
  • Work authorization: must be a US citizen, Green Card holder, or currently in the US with a valid H‑1B visa.

Benefits

  • Competitive salary and benefits package.
  • Flexible working arrangements with remote or hybrid options.
  • Opportunity to work on AI technology that directly impacts patient outcomes and health equity.
  • Membership on a mission-driven team combining innovation with healthcare impact.
  • Continuous learning opportunities with access to the latest tools and advancements in AI and healthcare.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Data Engineering Tech Lead

Lingaro 5K-10K IT Services

Data Engineering Tech Lead at Lingaro (Data Engineering & Management) — lead a Poland-based remote/full-time team to design, deliver, and maintain scalable, secure data engineering solutions while mentoring engineers and ensuring timely, high-quality project delivery.

Azure CI/CD Python Scala SQL
16 hours, 26 minutes ago

Senior Software Engineer - Data Integration & JVM Ecosystem

ClickHouse 51-250 IT Services

Senior Software Engineer (JVM) at ClickHouse joining the Connectors team to own and maintain JVM-based data framework integrations, connectors, and drivers that enable high-performance data ingestion and a seamless developer experience for data engineering workloads.

Apache Airflow Apache Spark ClickHouse dbt Grafana HTTP Java Kafka Metabase Pandas Power BI Python SQL Tableau TCP/IP
1 month ago

Junior Data Engineer (Remote Argentina) / Ingénieur données junior (à distance)

GlobalVision 51-250 Internet Software & Services

Junior Data Engineer at GlobalVision supporting and maintaining the company’s data infrastructure to ensure reliable, accessible, and actionable data that informs business decision-making across the organization.

dbt Domo Machine Learning Power BI Python Salesforce SQL Tableau
1 month ago

Data/Infrastructure Advocate Engineer - EMEA Remote

Hugging Face 51-250 IT Services

Hugging Face is hiring a Data/Infrastructure Advocate Engineer to bridge data infrastructure and the community by championing Xet storage on the Hub and enabling efficient storage, versioning, and collaboration on large-scale datasets.

AWS GitHub Pandas Python
1 month ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers