PointClickCare

PointClickCare provides a leading cloud-based healthcare software platform that enables long-term and post-acute care providers to effectively manage the complete lifecycle of resident care while enhancing operational efficiency and improving resident ...

Health Care Providers & Services

Health Care

1K-5K (2750)

Founded 2000

$232M raised

59 open positions

Links

View All Jobs

Senior Research Data Engineer (US)

3 weeks, 5 days ago

United States

Full-time

Senior

Data Engineer

Data Science and Analytics

Apache Airflow Apache Spark AWS Azure CI/CD Dagster Databricks dbt Feature Engineering Generative AI Git HIPAA LLM Machine Learning MLflow Prefect Python PyTorch SQL

Apply Now

PointClickCare

Health Care Providers & Services

1K-5K

Founded 2000

$232M raised

View All Jobs 59

Description

Own the gold data layer by transforming silver tables into curated, semantically rich, documented gold datasets for AI development.
Reverse-engineer data semantics by working with product engineers, clinical experts, and workflow experts to understand how data is created and represented.
Bridge researcher needs with data design by translating AI applied research requirements into reusable gold data products and documentation.
Curate datasets across modalities, including structured tables, unstructured content, features, labels, and chunked/tagged data for different AI use cases.
Build reusable silver-to-gold data pipelines in Databricks/Spark as scheduled and observable workloads.
Automate data quality, filtering, synthesis, labeling, and weak supervision workflows for AI data preparation.
Maintain reproducible dataset snapshots, lineage, and semantic definitions for downstream AI R&D reuse.
Collaborate with AI researchers, data platform, product, clinical, and workflow teams throughout the R&D lifecycle.
Support model development, evaluation, experimentation, and operational sustaining across classical ML, generative AI, RAG, and agentic approaches.

Requirements

5+ years building production data systems, including at least 2 years supporting ML or AI workloads.
Advanced Python, SQL, and PySpark/Databricks experience for working with large, messy data.
Expert-level SQL and the ability to read complex stored procedures and reverse-engineer business logic from queries.
Strong Databricks ecosystem experience, including Delta Lake, Unity Catalog, Spark/PySpark tuning, and MLflow.
Working knowledge of AI concepts such as embeddings, tokenization, feature engineering, point-in-time correctness, train/validation/test splits, and data drift.
Experience transforming unstructured data such as text, PDFs, transcripts, and logs into model-ready forms.
Familiarity with AI-friendly storage and formats such as Parquet and Hugging Face datasets, plus partitioning, sharding, and caching concepts.
Experience with data quality and synthesis techniques such as programmatic labeling, weak supervision, MinHash/LSH, and LLM-generated synthetic data.
Experience with pipeline orchestration and dataset versioning tools such as Airflow, Databricks Workflows, Dagster, Prefect, and Unity Catalog.
Experience handling regulated or sensitive data under controlled access, including HIPAA or equivalent, and familiarity with de-identification concepts.
Git-based version control and CI/CD experience for data and code.
Strong written documentation skills and the ability to elicit requirements from technical and non-technical experts.
Bachelor’s degree in computer science, data science, engineering, statistics, or a related field, or equivalent practical experience.
Preferred: Hands-on EHR data experience in skilled nursing, long-term care, post-acute care, or senior living.
Preferred: Working knowledge of clinical terminologies and data standards such as ICD-10, SNOMED CT, LOINC, HL7v2, FHIR, and CCDA.
Preferred: dbt experience for transformation and testing.
Preferred: Familiarity with training-side ML frameworks such as PyTorch to debug data-side bottlenecks.
Preferred: Experience supporting LLM or foundation-model training or fine-tuning data pipelines.
Preferred: Clinical NLP, OCR, document parsing, or ASR/transcript pipeline experience.
Preferred: Experience with data lineage and catalog tools.
Preferred: Prior experience embedded inside an AI or ML research team.
Preferred: Master’s degree in a relevant quantitative or computer science field.

Benefits

Benefits starting from day 1.
Retirement plan matching.
Flexible paid time off.
Wellness support programs and resources.
Parental and caregiver leaves.
Fertility and adoption support.
Continuous development support program.
Employee assistance program.
Allyship and inclusion communities.
Employee recognition and more.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Data Engineer IV (6249)

Dan.com - a GoDaddy brand Internet Software & Services

itD is hiring a Data Engineer IV to build and optimize AWS-based data infrastructure and pipelines that support AI infrastructure and inference initiatives in a remote role.

United States Contract Lead Data Engineer

AWS

14 hours, 49 minutes ago

Apply

14 hours, 49 minutes ago

Senior Data Engineer (GCP)

Xebia 1K-5K Internet Software & Services

Xebia is hiring a data engineer to support a BigQuery migration by building and maintaining scalable data pipelines for production reporting, analytics, and business intelligence solutions.

Romania Bulgaria Poland Europe Senior Data Engineer

Apache Airflow CI/CD Databricks dbt Git Python Snowflake SQL

14 hours, 49 minutes ago

Apply

14 hours, 49 minutes ago

Staff Data Engineer (Coupang Pay)

Coupang 1K-5K Internet Software & Services

쿠팡페이 핀테크 데이터 플랫폼 팀의 Staff Data Engineer로서 실시간 결제·주문 데이터를 기반으로 데이터 파이프라인과 분석 인프라를 구축·고도화하는 역할입니다.

South Korea Full-time Lead Data Engineer

Apache Spark AWS ClickHouse Flink Hive Java Presto Python Scala SQL Tableau

15 hours, 4 minutes ago

Apply

15 hours, 4 minutes ago

Data Migration Engineer (SQL Server / TSQL)

Cresteo 51-200 information technology & services

Cresteo is hiring a Data Migration Engineer to own end-to-end SQL Server data migrations from legacy schemas into reconciled, production-ready data for US-based clients and international teams.

Latin America Full-time Senior Database Administrator Data Engineer

JSON SQL Server

15 hours, 4 minutes ago

Apply

15 hours, 4 minutes ago

PointClickCare

Tags

Links

Senior Research Data Engineer (US)

PointClickCare

Description

Requirements

Benefits

Similar Roles

Data Engineer IV (6249)

Senior Data Engineer (GCP)

Staff Data Engineer (Coupang Pay)

Data Migration Engineer (SQL Server / TSQL)

You're on a roll! Sign up now to keep applying.