Location: Dallas, TX (fully onsite)
Duration: Long Term Contract
Interview: Video
Job Description:
The IAM Data Modernization program focuses on migrating an on-premises SQL data warehouse to a modern GCP-based Data Lake. The initiative enables enterprise reporting, analytics, and GenAl-driven capabilities such as natural language querying, smart summarizations, and cross-domain insight generation.
Project Overview-IAM Data Modernization
Key Project Highlights
- Integrating 30+ source systems into a unified cloud data platform
- Supporting downstream needs for reporting, analytics, and cyber intelligence
- Delivering highly scalable storage, historical data retention, and governed metric layers
- Establishing a single source of truth for enterprise-wide data and GenAl enablement
Role Summary
As a Data Engineer (GCP), you will build and optimize ingestion pipelines, transformations, and data models across the GCP data lake. The role requires strong hands-on experience with BigQuery, Dataflow/Spark, Pub/Sub, and modern cloud data engineering practices.
You will collaborate with architects, analysts, and data governance teams to deliver reliable, secure, and high-performance data solutions.
Key Responsibilities
- Data Lake Engineering & Storage
- Develop and maintain multi-layered data lake structures (Bronze/Silver/Gold)
- Design GCS buckets, lifecycle policies, naming conventions, and access configurations
- Work with columnar formats such as Parquet, Avro, ORC
- Implement partitioning, clustering, and optimized data organization
- Build analytics-friendly data models and curated datasets
- Data Ingestion & Orchestration
- Build batch and streaming pipelines using Dataflow, Pub/Sub, Dataproc, BigQuery
- Implement CDC, incremental loads, and deduplication logic
- Set up Airflow/Cloud Composer pipelines for orchestration
- Build robust error-handling, replay, and backfill mechanisms
- Data Processing & Transformation
- Develop ETL/ELT data pipelines using Dataflow (Beam) or Spark
- Write optimized BigQuery SOL (partitioning, clustering, cost controls)
- Manage schema evolution with minimal downstream disruption
- Write clean, modular Python code with appropriate test coverage
- Utilize Hadoop ecosystem tools when required
- Analytics & Data Serving
- Optimize BigQuery tables for cost and performance
- Build semantic layers and standardized metric definitions
- Expose data via views, curated datasets, or APIs
- Partner with Bl teams to support dashboard and reporting needs
- Data Governance, Quality & Metadata
—