Plain Concepts

Plain Concepts

Plain Concepts: Multinational software company offering innovative solutions in Web, App development, AI, Mixed Reality, Big Data, Blockchain, IoT, and Cloud. Recognized by Microsoft and industry leaders for expertise and innovation.

Internet Software & Services
251-1K
Founded 2006

Description

  • Design and implement end-to-end data pipelines using Databricks Jobs, Workflows, and Delta Live Tables.
  • Build and maintain scalable ETL and ELT processes using Apache Spark with PySpark or Scala.
  • Develop and optimize Delta Lake data models, including schema design, partitioning, Z-ordering, and compaction.
  • Manage and tune Databricks clusters for performance and cost efficiency.
  • Implement CI/CD pipelines for Databricks deployments using tools such as Databricks Repos, Terraform, and Azure DevOps or GitHub Actions.
  • Process structured and semi-structured data such as JSON, Parquet, and Avro at scale.
  • Ensure data quality and reliability through validation, unit and integration testing, and monitoring.
  • Implement data governance practices, including access controls, lineage tracking, auditing, and Unity Catalog.
  • Troubleshoot Spark and Databricks performance issues such as job failures, skew, shuffle bottlenecks, and memory pressure.
  • Collaborate with data scientists, analysts, backend teams, and data consumers to define SLAs, data contracts, and service interfaces.

Requirements

  • Strong experience with Databricks in production environments, not just notebooks.
  • Deep understanding of Apache Spark internals, including execution plans, the Catalyst optimizer, and the Tungsten engine.
  • Proficiency in PySpark, with Scala preferred as an alternative.
  • Solid knowledge of Delta Lake, including ACID transactions, time travel, OPTIMIZE, VACUUM, and compaction.
  • Experience with distributed data processing and large-scale datasets at TB+ scale.
  • Familiarity with orchestration tools such as Databricks Workflows, Airflow, or similar.
  • Experience with version control and CI/CD pipelines.
  • Knowledge of cloud platforms such as AWS, Azure, or GCP, including IAM and storage services.
  • Strong SQL skills and understanding of data warehousing concepts.
  • Experience with data modeling techniques such as star schema and medallion architecture.
  • Experience with streaming pipelines such as Structured Streaming and Auto Loader, preferred.
  • Knowledge of ML workflows on Databricks, including MLflow and feature stores, preferred.
  • Infrastructure-as-Code experience with Terraform, ARM, or CloudFormation, preferred.
  • Exposure to Unity Catalog and data governance frameworks, preferred.
  • Experience with cost optimization strategies in Databricks environments, preferred.
  • Familiarity with DBT or similar transformation tools, preferred.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Data Engineer, Azure - Remote, Latin America

Bluelight Consulting 11-50 Internet Software & Services

Bluelight is hiring a remote Data Engineer, Azure in Latin America to build and maintain data pipelines and warehousing solutions for client projects in a fast-growing software consultancy.

Agile Apache Spark Azure Git Machine Learning Power BI Python REST API SQL Tableau
26 minutes ago

Data Engineer, Azure - Remote, Latin America

Bluelight Consulting 11-50 Internet Software & Services

Bluelight is hiring a remote Data Engineer, Azure to build and optimize data pipelines and warehousing solutions for client projects across Latin America.

Agile Apache Spark Azure Git Machine Learning Power BI Python REST API SQL Tableau
57 minutes ago

Data Engineer, Azure - Remote, Latin America

Bluelight Consulting 11-50 Internet Software & Services

Bluelight is hiring a remote Data Engineer, Azure to build and optimize ETL and data warehousing solutions for client projects across Latin America.

Agile Apache Spark Azure Git Machine Learning Power BI Python REST API SQL Tableau
1 hour, 1 minute ago

ML / AI Data Engineer (Contract)

Tech Holding 51-250 Internet Software & Services

Tech Holding is seeking a Senior ML / Data Pipeline Engineer to build and optimize scalable production pipelines for large-scale video and multimodal data processing across distributed cloud environments.

Apache Airflow Apache Spark AWS Azure Docker GCP Kafka Kubernetes Machine Learning NLP Python Scala
1 hour, 12 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers