C the Signs

C the Signs

C the Signs is a cutting-edge cancer prediction system that uses artificial intelligence to enhance early detection and survival rates. The company is dedicated to reducing healthcare disparities by accelerating early cancer detection, improving patien...

Professional Services
51-250
Founded 2017

Description

  • Collaborate with data scientists and machine learning engineers to define data requirements for LLM and machine learning model fine-tuning.
  • Design, build, and maintain scalable data pipelines to ingest, process, and store large, diverse healthcare datasets.
  • Implement robust data cleaning, validation, and transformation processes to ensure dataset quality and integrity.
  • Implement and maintain data validation and monitoring to ensure the integrity, accuracy, and consistency of training datasets.
  • Develop and optimize data structures and schemas for efficient access and utilization by LLMs and machine learning models.
  • Identify and acquire new data sources and ensure compliance with relevant healthcare regulations (e.g., HIPAA).
  • Monitor data pipeline performance, troubleshoot production issues, and implement performance and reliability optimizations.
  • Document data engineering processes, data models, and data dictionaries for team use and governance.
  • Stay up-to-date with advancements in data engineering, big data technologies, and machine learning to inform pipeline and data design.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Proven experience as a Data Engineer with a focus on big data technologies.
  • Proficiency in programming with Python, Scala, or Java.
  • Extensive experience with data warehousing, ETL processes, and data modeling.
  • Experience with major cloud providers (AWS, GCP, or Azure) and their data storage and processing services.
  • Hands-on experience with big data frameworks such as Apache Spark for distributed processing.
  • Experience with healthcare data and familiarity with healthcare data standards (e.g., FHIR, HL7) (preferred).
  • Familiarity with machine learning concepts and LLM fine-tuning processes (preferred).
  • Experience with data orchestration tools such as Apache Airflow (preferred).
  • Work authorization: must be a US citizen, Green Card holder, or currently in the US with a valid H‑1B visa.

Benefits

  • Competitive salary and benefits package.
  • Flexible working arrangements with remote or hybrid options.
  • Opportunity to work on AI technology that directly impacts patient outcomes and health equity.
  • Membership on a mission-driven team combining innovation with healthcare impact.
  • Continuous learning opportunities with access to the latest tools and advancements in AI and healthcare.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Staff Data Engineer

SmithRx 1K-5K Pharmaceuticals

SmithRx is seeking a Data Engineering leader to design and scale the data platforms that support pharmacy benefits analytics, governance, and machine learning in a fast-growing health-tech environment.

Apache Airflow Apache Spark C# C++ dbt Go Java LLM Looker Python Scala Snowflake SQL Superset
7 hours, 5 minutes ago

Data Engineer

Jenzabar 251-1K Internet Software & Services

The Data Engineer V at Jenzabar leads the design and optimization of scalable data pipelines and analytics platforms that support business insights across product, analytics, and engineering teams.

Agile Apache Spark Azure Databricks Git Power BI Python Scrum SQL SQL Server
8 hours, 20 minutes ago

[Job 29911] Mid/Senior Data Developer, Brazil

CI&T 5K-10K Internet Software & Services

CI&T is seeking a Mid/Senior Data Developer in Brazil to build and evolve data lake and analytics layers through reliable integrations, stable transformation pipelines, and governed market data organization in a remote-first environment with some on-site presence required for Campinas metro residents.

Apache Airflow AWS CI/CD CloudFormation dbt FastAPI Git GitHub Actions GitLab CI Grafana Kafka Kubernetes Microservices Pandas PostgreSQL Prometheus Python Snowflake SQL Terraform
8 hours, 20 minutes ago

Databricks Solution Architect

Bounteous 1K-5K Internet Software & Services

Bounteous is seeking a Lead Databricks Engineer/Architect to own the design and delivery of a cloud-based lakehouse data platform that supports analytics, data science, and machine learning at petabyte scale.

Apache Spark AWS Azure CI/CD Databricks GCP Git Kafka MLflow Python Scala SQL Terraform
8 hours, 20 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers