Title- Data Engineer (PySpark + AWS + Iceberg)
Location-Chicago, IL (Onsite)
All visa fine (But No GC and CPT)
Contract
Domain Banking Or Finance
Job Summary
We are looking for a skilled Data Engineer to design and build scalable data solutions using PySpark and AWS services. The ideal candidate will have hands-on experience in building modern data platforms using Apache Iceberg and implementing Medallion architecture on AWS.
Key Responsibilities
- Design and implement end-to-end data solutions using PySpark, ensuring scalability and performance.
- Build and manage data pipelines using AWS services such as AWS Glue, EMR, and Lambda.
- Develop data products using PySpark + AWS Glue stack.
- Implement Medallion Architecture (Bronze, Silver, Gold layers) for structured data processing.
- Work with Apache Iceberg tables for efficient data storage, versioning, and schema evolution.
- Ensure data quality, governance, and optimization across pipelines.
- Collaborate with cross-functional teams to understand business requirements and translate them into technical solutions.
- Optimize data processing jobs and improve performance and cost-efficiency on AWS.
Required Skills & Experience
- Strong experience in PySpark for data processing and pipeline development.
- Hands-on experience with AWS ecosystem (Glue, EMR, Lambda, S3).
- Experience implementing Medallion Architecture.
- Practical knowledge of Apache Iceberg or similar table formats.
- Strong understanding of distributed data processing and big data frameworks.
- Experience designing scalable and reliable data pipelines.
- Good understanding of data modeling and ETL/ELT concepts.
Preferred Qualifications
- Experience working outside of Databricks-only environments (ability to build solutions using native AWS stack).
- Familiarity with modern data lake architectures and open table formats.
- Knowledge of performance tuning and cost optimization in AWS.
- Experience with CI/CD pipelines for data engineering workflows.
What the Client is Specifically Looking For
- Engineers who can independently design solutions using PySpark (not limited to Databricks).
- Strong expertise in AWS-native data engineering tools.
- Hands-on implementation experience with Apache Iceberg (preferred over Delta).
- Ability to build data products using Glue + PySpark stack.
- Clear understanding and implementation of Medallion architecture using AWS services.