Location // Remote
Contract
Detailed JD:
Role Summary
We are seeking an experienced Data Engineer to design, build, and optimize scalable, high-performance data pipelines using Databrick The role involves end-to-end ownership of data ingestion, transformation, orchestration, and optimization across cloud-based data platf
Key Responsibilities
Data Engineering & Pipeline Development
* Design, develop, and maintain batch and streaming data pipelines using Databricks (PySpark) and Snowflake.
* Implement data transformation logic using Python and 5QL for scalable and high-volume datasets.
* Create and manage complex workfiows using Apache Airflow.
* Implement scheduling, dependency management, retries, alerts, and failure handling.
* Integrate Airflow with Databricks jobs, Snowflake tasks, and cloud services.
Databricks & Lakehouse Architecture
* Work on Databricks Lakehouse architecture including Bronze / Silver / Gold (Medallion) layers.
* Optimize Spark jobs using partitioning, caching, broadcast joins, and performance tuning.
* Manage Databricks jobs, clusters, notebooks, and workspace configurations.
Snowflake Development
* Design and optimize Snowflake schemas, tables, viewis, and warehouses.
* Implement Snowflake 5QL transformations, performance tuning, and cost optimization.
* Work with Snowflake features such as Time Travel, Cloning, Teaks, Streams (where applicable).
Data Quality, Governance & Security
* Implement data quality checks, validation frameworks, and reconciliation logic.
* Ensure adherence to data governance; security, and compliance requirements.
* Collaborate with governance teams on metadata, lineage, and access controls.
CI/CD 8. Operations
* Implement CI/CD pipelines for data code using Git-based version control systems.
* Support production deployments, monitoring, and incident resolution.
—
—