Location: New York City, New York, US (3 DAYS IN THE OFFICE ARE MANDATORY! This requirement is non-negotiable, so please ensure that all candidates are fully aware before proceeding)
Need only Locals
2nd Round is an In-Person Interview
Project description
Responsibilities
Design, develop, and maintain scalable data pipelines using Apache Spark (batch and/or streaming).
Build, optimize, and manage ETL/ELT workflows integrating multiple data sources.
Develop data solutions in Python for data transformations, automation, and orchestration.
Leverage AWS services (S3, EMR, Glue, Lambda, Redshift, Kinesis, etc.) to implement cloud-native data platforms.
Write efficient SQL queries for data extraction, transformation, and reporting.
Ensure data quality, lineage, and governance across pipelines.
Skills
Must have
8+ years of experience in data engineering or backend development.
Hands-on experience with Apache Spark (PySpark) in large-scale data environments.
Strong proficiency in Python programming.
Expertise in SQL (including advanced queries, performance tuning, and optimization).
Experience working with AWS services such as S3, Glue, EMR, Lambda, Redshift, or Kinesis.
Understanding of data warehousing concepts and ETL best practices.
Strong problem-solving skills and ability to work in an agile, collaborative environment.
Nice to have
Experience with Databricks or similar Spark-based platforms.
Knowledge of streaming frameworks (Kafka, Flink).