C2C Job opening - Data Engineer with Cloud Data Integration & Transformation - Remote

Job Title	: Data Engineer with Cloud Data Integration & Transformation
Location	: Remote (with some Travel to NC, Client will pay for travel)
Type	: 12 Months

Note:-

Candidates need to use their own Laptop (Minimum 16 GB)
Some travel to NC will be required (Client will pay for travel to NC)

About the Role:

We are seeking a hands-on Data Engineer to develop and maintain scalable data pipelines and transformation routines within a modern Azure + Databricks environment. This role is focused on executing ingestion, cleansing, standardization, matching, merging, and enrichment of complex legacy datasets into a governed data Lakehouse architecture.

The ideal candidate brings deep experience with Spark (PySpark), Delta Lake, Azure Data Factory, and data wrangling techniques — and is comfortable working in a structured, code-managed, team-based delivery environment.

Key Responsibilities:

Data Cleansing & Transformation:

Apply cleansing logic for deduplication, parsing, standardization, and enrichment based on business rule definitions.

Use Spark-Cobol Library to parse EBCDIC/COBOL-formatted VSAM files into structured DataFrames.

Maintain 'bronze → silver → gold' structured layers and ensure quality during data transformations.

Support classification and mapping logic in collaboration with analysts and architects.

Observability, Testing & Validation:

Integrate robust logging and exception handling to enable observability and pipeline traceability.
Monitor job performance and cost with Azure Monitor and Log Analytics.
Support validation and testing using frameworks like Great Expectations or dbt tests to enforce expectations on nulls, ranges, and referential integrity.
Security, DevOps & Deployment
Store and manage credentials securely using Azure Key Vault during pipeline execution.
Maintain pipeline code using Azure DevOps Repos and participate in peer reviews and promotion workflows via Azure DevOps Pipelines.
Deploy notebooks, configurations, and transformations using CI/CD best practices in repeatable environments.

Collaboration & Profiling:

Collaborate with architects to ensure alignment with data platform standards and governance models.
Work with analysts and SMEs to profile data, refine cleansing logic, and conduct variance analysis using Databricks Notebooks and Databricks SQL Warehouse.
Support metric publication and lineage registration using Microsoft Purview and Unity Catalog and contribute to profiling datasets for Power BI consumption.

Required Skills & Experience:

5+ years of experience in data engineering or ETL development roles.

Proficiency in:
Databricks, PySpark, SQL
Delta Lake and Azure Data Lake Storage Gen2
Azure Data Factory for orchestration and event-driven workflows

Experience with:
Cleansing, deduplication, parsing, and merging of high-volume datasets
Parsing EBCDIC/COBOL-formatted VSAM files using Spark-Cobol Library
Connecting to Db2 databases using JDBC drivers for ingestion

Familiarity with:
Git, Azure DevOps Repos & Pipelines Great Expectations or dbt for validation Azure Monitor + Log Analytics for job tracking and alerting Azure Key Vault for secrets and credentials
Microsoft Purview and Unity Catalog for metadata and lineage registration

Regards,

Lakkumanan S

DataCaliper LLC

+1(919) 666-6447

lakkumanan.s@datacaliper.com

🔔 Get our daily C2C jobs / notifications on WHATSAPP

—

C2C Job opening – Data Engineer with Cloud Data Integration & Transformation – Remote

Related

About Author

Leave a Reply Cancel reply