Location: Charlotte, NC (Onsite)
Experience: 10+ Years
Employment Type: Long-Term Contract
Client: Mphasis
Job Summary:
We are seeking a skilled Data Engineer with hands-on experience in designing and developing robust data platforms on the AWS ecosystem. The ideal candidate will have strong expertise in RDBMS, data formats (JSON, Parquet), and ETL/ELT pipelines using tools like DataStage, AWS Glue, PySpark, and Python. This role requires the ability to manage and optimize data workflows, ensuring data accuracy, scalability, and performance across large datasets.
Key Responsibilities:
- Design, build, and maintain scalable data pipelines and ETL processes across diverse data sources.
- Work with structured and unstructured data in various formats (e.g., JSON, Parquet, CSV).
- Develop and optimize data workflows using AWS Glue, PySpark, and Python.
- Collaborate with data scientists, analysts, and stakeholders to define data requirements and deliver insights.
- Manage and optimize data storage solutions on AWS S3, Redshift, and other AWS services.
- Utilize DataStage or similar ETL tools for batch and streaming data ingestion.
- Implement best practices for data quality, governance, and security.
- Tune SQL queries and database performance on PostgreSQL, SQL Server, or similar RDBMS platforms.
- Monitor and troubleshoot data workflows to ensure high availability and reliability.
- Strong experience with RDBMS such as PostgreSQL, SQL Server, or equivalent.
Required Skills & Experience:
- Proficiency in working with data formats like JSON, Parquet, Avro, and CSV.
- Hands-on experience in the AWS ecosystem (S3, Glue, Lambda, Redshift, IAM, CloudWatch).
- Experience in ETL/ELT development using DataStage or similar tools.
- Strong programming skills in Python and PySpark.
- Deep understanding of data warehousing and data modeling concepts.
- Experience with version control (Git) and CI/CD pipelines for data deployment.
- Knowledge of data quality, governance, and metadata management practices.
—
