Immediate need on: PySpark Developer @ Columbus, OH(Remote) on C2C
Hope you are well.
My name is Suneetha and I am looking for the position of
PySpark Developer @ Columbus, OH(Remote) on C2C
I’ve included the job details below and wondering if there would be an interest from your end in this opening.
If you are available and interested and a good fit for the position, then please email me an updated copy of your resumealong with the required details to firstname.lastname@example.org and also you can call me at 223-203-9757
Title: PySpark Developer
Location: Columbus, OH(Remote)
Duration: 6 Months
5+ years of experience in handling Data Warehousing and Business Intelligence projects in Banking, Finance, Credit card and Insurance industry.
Design and Developed real time streaming pipelines for sourcing data from IOT devices, defining strategy for data lakes, data flow, retention, aggregation, summarization for optimizing the performance of analytics products.
Extensive experience on Data analytics Good knowledge on Hadoop Architecture and its ecosystem.
Having extensive knowledge on Hadoop technology experience in Storage, writing Queries, processing and analysis of data.
Experience on migrating on Premises ETL process to Cloud.
Work on various Hadoop file formats
Experience in Data Warehousing applications, responsible for the Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse
Experience in optimizing Hive SQL queries, Datastage and Spark Jobs.
Implemented various frameworks like Data Quality Analysis, Data Governance, Data Trending, Data Validation and Data Profiling with the help of technologies like Spark, Python and DB2
Experience with creation of Technical document for Functional Requirement, Impact Analysis, Technical Design documents, Data Flow Diagram with MS Visio.
Experience in delivering the highly complex project with Agile and Scrum methodology.
Quick learner and up-to-date with industry trends,
Excellent written and oral communications, analytical and problem-solving skills and good team player,
Ability to work independently and well-organized.
Design and develop ETL integration patterns using Python on Spark.
Develop framework for converting existing Datastage mappings and to PySpark (Python and Spark) Jobs.
Create Pyspark frame to bring data from DB2 Translate business requirements into maintainable software components and understand impact (Technical and Business)
Provide guidance to development team working on PySpark as ETL platform Optimize the Pyspark jobs to run on Kubernetes Cluster for faster data processing
Provide workload estimates to client Migrate On prem ETL process to AWS cloud and Snowflakes
Implement CICD(Continuous Integration and Continuous Development) pipeline for Code Deployment Reviews components developed by the team members
Required Skills : PySpark, Hadoop, DataStage/SSIS, DB2