Client – Virtusa
Location – Torrance CA Onsite
Key Responsibilities:
- Data Architecture Design:
- Architect and implement a scalable data hub solution on AWS using best practices for data ingestion, transformation, storage, and access control.
- Define data models, data lineage, and data quality standards for the DataHub.
- Select appropriate AWS services (S3, Glue, Redshift, Athena, Lambda) based on data volume, access patterns, and performance requirements.
- Come up with a design that accommodates AI/ML applications in the next phase
- Data Ingestion and Integration:
- Design and build data pipelines to extract, transform, and load data from various sources (databases, APIs, flat files) into the DataHub using AWS Glue, AWS Batch, or custom ETL processes.
- Implement data cleansing and normalization techniques to ensure data quality.
- Manage data ingestion schedules and error handling mechanisms.
- Data Governance and Access Control:
- Establish data access controls and security policies to protect sensitive data within the DataHub using IAM roles and policies.
- Develop data governance frameworks including data quality checks, data lineage tracking, and data retention policies.
- Data Analytics Enablement:
- Create data catalogs and metadata management systems to facilitate data discovery and understanding by business users and data analysts.
- Design and implement data views and dashboards using Power BI to enable data exploration and visualization.
- Create data warehouses and data marts to meet the needs of the business
- Monitoring and Optimization:
- Monitor data pipeline performance, data quality, and system health to identify and resolve issues proactively.
- Optimize data storage and processing costs by leveraging AWS cost optimization features.
- Data Exchange
- Develop the required governance, security, monitoring and guard rails to enable efficient data exchange between internal application and their external vendors, partners, and SaaS providers
- Develop intake process, SLAs, and usage rules for internal and external data set producers and consumers
Required Skills and Experience:
- AWS Expertise: Deep understanding of AWS data services including S3, Glue, Redshift, Athena, Lake Formation, Sep Functions, CloudWatch and EventBridge.
- Data Modeling: Proficiency in designing dimensional and snowflake data models for data warehousing and data lakes.
- Data Engineering Skills: Experience with ETL/ELT processes, data cleansing, data transformation, and data quality checks. Experience with Informatica IICS and ICDQ is a plus.
- Programming Languages: Proficiency in Python, SQL, and potentially PySpark for data processing and manipulation.
- Data Governance: Knowledge of data governance best practices including data classification, access control, and data lineage tracking.
Preferred Qualifications:
- Experience with data lakehouse architectures and the ability to leverage both structured and unstructured data.
- Familiarity with data visualization tools like Tableau or Power BI.
- Strong communication and collaboration skills to work with stakeholders across business and technical teams.
- AWS certifications related to data analytics and architecture.
Neha Chaudhary |
—
