Contract
new york (hybrid)
Posted 4 months ago

qubixtalentconsulting qubixtalentconsulting

shahebee@qubixtalentconsulting.com

Required only H1B genuine candidate who is able to fetch i94 details

Job Description: Spark / Big Data Engineer
Position: Big Data Engineer (PySpark / Scala Spark)
Location: (Specify Onsite/Hybrid/Remote)
Duration: Long-term Contract / Full-time
Experience Required: 5–10+ years

Overview
We are seeking an experienced Big Data Engineer with strong expertise in PySpark, Scala Spark, Hadoop ecosystem, and SQL to design, develop, and optimize large-scale data processing pipelines. The ideal candidate will have hands-on experience in building enterprise-grade ETL workflows, working with distributed systems, and delivering high-performance data solutions.

Responsibilities
Develop, maintain, and optimize large-scale data processing pipelines using PySpark and Scala Spark.
Build Spark processes to extract data from multiple structured and unstructured data sources and create curated, analytics-ready datasets.
Work extensively with HDFS, Hive, and other Hadoop ecosystem components for storage, ingestion, and querying.
Perform ETL, data cleansing, transformation, aggregation, and validations for various business use cases.
Optimize Spark jobs through partitioning, caching, tuning, and performance improvements.
Write and debug complex SQL queries for data extraction, validation, analysis, and performance optimization.
Work with relational and NoSQL databases including MySQL, PostgreSQL, SQL Server, Cassandra, etc.
Monitor, troubleshoot, and ensure smooth execution of data pipelines in production.
Collaborate with cross-functional teams (Data Engineering, Analytics, Product, DevOps) to understand requirements and deliver reliable, scalable data solutions.
Maintain high-quality documentation, coding standards, and data governance best practices.

Required Skills
Strong programming experience in PySpark and Scala Spark.
Hands-on experience with Hadoop ecosystem (HDFS, Hive, YARN).
Strong knowledge of ETL workflows, data quality, and data modeling.
Expertise in SQL and performance tuning.
Experience working with RDBMS and NoSQL databases.
Good understanding of distributed computing and big data architecture.
Experience with debugging production data pipelines and job monitoring.
Familiarity with version control (Git), CI/CD, and Agile methodologies.

Preferred Skills
Experience with Airflow, Oozie, or other orchestration tools.
Experience in AWS / Azure / GCP cloud data ecosystems.
Knowledge of Kafka, data streaming is a plus.
Background in data warehousing or data lake architecture.

To apply for this job email your details to shahebee@qubixtalentconsulting.com

Related

Post your C2C job instantly