C the Signs

C the Signs

C the Signs is a cutting-edge cancer prediction system that uses artificial intelligence to enhance early detection and survival rates. The company is dedicated to reducing healthcare disparities by accelerating early cancer detection, improving patien...

Professional Services
51-250
Founded 2017

Description

  • Collaborate with data scientists and machine learning engineers to define data requirements for LLM and machine learning model fine-tuning.
  • Design, build, and maintain scalable data pipelines to ingest, process, and store large, diverse healthcare datasets.
  • Implement robust data cleaning, validation, and transformation processes to ensure dataset quality and integrity.
  • Implement and maintain data validation and monitoring to ensure the integrity, accuracy, and consistency of training datasets.
  • Develop and optimize data structures and schemas for efficient access and utilization by LLMs and machine learning models.
  • Identify and acquire new data sources and ensure compliance with relevant healthcare regulations (e.g., HIPAA).
  • Monitor data pipeline performance, troubleshoot production issues, and implement performance and reliability optimizations.
  • Document data engineering processes, data models, and data dictionaries for team use and governance.
  • Stay up-to-date with advancements in data engineering, big data technologies, and machine learning to inform pipeline and data design.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or a related field.
  • Proven experience as a Data Engineer with a focus on big data technologies.
  • Proficiency in programming with Python, Scala, or Java.
  • Extensive experience with data warehousing, ETL processes, and data modeling.
  • Experience with major cloud providers (AWS, GCP, or Azure) and their data storage and processing services.
  • Hands-on experience with big data frameworks such as Apache Spark for distributed processing.
  • Experience with healthcare data and familiarity with healthcare data standards (e.g., FHIR, HL7) (preferred).
  • Familiarity with machine learning concepts and LLM fine-tuning processes (preferred).
  • Experience with data orchestration tools such as Apache Airflow (preferred).
  • Work authorization: must be a US citizen, Green Card holder, or currently in the US with a valid H‑1B visa.

Benefits

  • Competitive salary and benefits package.
  • Flexible working arrangements with remote or hybrid options.
  • Opportunity to work on AI technology that directly impacts patient outcomes and health equity.
  • Membership on a mission-driven team combining innovation with healthcare impact.
  • Continuous learning opportunities with access to the latest tools and advancements in AI and healthcare.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Data Engineer

Soros Fund Management Capital Markets

Soros Fund Management is hiring an experienced Data Engineer to build and modernize data systems that support trading, risk, research, and accounting operations across the firm.

Apache Spark Databricks dbt Docker FastAPI Kubernetes PostgreSQL Python Snowflake SQL SQL Server
2 hours, 47 minutes ago

Data Engineer

Egen.ai IT Services

Egen is hiring a Remote Data Engineer to design and support large-scale batch and streaming data pipelines that turn business needs into secure, accurate, and accessible data solutions.

Agile Apache Airflow Apache Spark dbt GCP PostgreSQL Salesforce
4 hours, 26 minutes ago

Senior Data Engineer

Age of Learning 251-1K Internet Software & Services

Age of Learning is seeking a Senior Data Engineer to lead its Data and Analytics platform, ensuring trusted data systems that support education products and cross-functional decision-making.

dbt Python Snowflake SQL
5 hours, 54 minutes ago

Head of Market Data

Galaxy 251-1K Capital Markets

Galaxy is hiring a Head of Market Data to lead the acquisition, governance, and optimization of market data supporting trading, research, risk, and related systems across digital and traditional assets.

Generative AI
8 hours, 17 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers