Role Expectations (Mandatory Skills)
- Build and operate production-grade Databricks/Spark pipelines (batch and streaming) using Delta/Iceberg.
- Apply robust data quality checks and data lineage (using tooling or catalog-driven approaches).
- Understand and design agentic workflows for data and GenAI-driven use cases (orchestration of multiple tools/agents, reliability, and observability).
- Implement CI/CD for data (Git-based workflows, automated tests, promotion/rollback between environments).
Nice to have
- Practical exposure to RAG/LLM-based data pipelines and GenAI/LLM integration in data engineering solutions, including agentic workflow patterns.
About the Role
We are seeking a skilled Data Engineer with hands-on experience in Databricks to join our team in Santa Clara. The ideal candidate will have a strong background in data engineering, excellent coding skills, and the ability to work collaboratively in a fast-paced environment.
Responsibilities:
– Design, develop, and maintain scalable data pipelines and ETL processes using Databricks.
– Collaborate with data scientists and analysts to understand data requirements and deliver solutions.
– Optimize and troubleshoot data workflows to ensure high performance and reliability.
– Implement data quality checks and ensure data integrity.
– Prior experience in GEN AI Models or custom solutions using databricks might be preferred. Extensive use of Genie Models might be added advantage.
– Participate in code reviews and contribute to best practices for data engineering.
**Requirements:**
– Bachelor's degree in Computer Science, Engineering, or a related field.
– Proven experience as a Data Engineer with hands-on coding experience.
– Proficiency in Databricks and Spark.
– Strong programming skills in Python, Scala, or Java.
– Experience with SQL and relational databases.
– Familiarity with cloud platforms such as AWS, Azure, or GCP.
– Excellent problem-solving skills and attention to detail.
– Strong communication and teamwork abilities.