Tittle : AI/ML Architect with Databricks , AWS
Location : Los Angeles CA (Hybrid)
Any visa Fine
Job Summary:
Designing Databricks‑based lakehouse architectures on AWS (Delta Lake + S3 + Unity Catalog).
Clear separation of compute vs. serving layers in distributed architectures.
Low-latency API strategy where Spark is insufficient (e.g., leveraging optimized services or caching).
Caching strategies to accelerate reads and reduce compute cost.
Data partitioning, file size tuning, and optimization strategies for large-scale pipelines.
Experience handling multi-terabyte structured time‑series workloads.
Ability to distill architectural significance from ambiguous business requirements.
Strong curiosity, questioning, and requirement‑probing mindset.
Player‑coach approach: hands-on technical depth + ability to guide design.
AI/ML & Advanced Analytics
Develop, train, and optimize ML models using Python, PySpark, MLflow, and Databricks Machine Learning.
Conduct exploratory data analysis (EDA) to identify patterns, trends, and insights in large datasets.
Deploy ML models into production using MLflow, Databricks Workflows, or other MLOps pipelines.
Build analytics solutions such as forecasting, anomaly detection, segmentation, or recommendation systems.
Design ML architectures aligned with Databricks Lakehouse on AWS.
Data Engineering & Lakehouse Architecture
Architect and build scalable ETL/ELT pipelines using PySpark, SQL, and Databricks Workflows.
Implement Delta Lake best practices, including OPTIMIZE, ZORDER, partitioning, and schema evolution.
Design lakehouse layers (Bronze/Silver/Gold) with strong separation of compute and serving layers.
Optimize cluster performance and jobs using Spark tuning, caching, and shuffle minimization.
Work with multi-terabyte, time-series, high‑velocity data in a distributed environment.
Ensure robust data availability for downstream ML and analytics workloads.
AWS Cloud Integration
Architect end-to-end data and ML solutions using AWS services, including:
S3 for storage
IAM for identity & access
Glue Catalog for metadata management
Networking for secure, high‑throughput data movement
Integrate Databricks with AWS-native compute, API layers, and low-latency endpoints.
Please contact Email : leena.arokiyaraj@nityo.com