We are currently seeking experienced professionals with expertise in Databricks for an upcoming project in Indianapolis, IN. This is a contract opportunity.
1. Role Objective
- Build, operate, and govern production-grade data and analytics solutions that span Databricks (Pipelines, Delta Live Tables, Genie, Agent Bricks) and Microsoft Fabric (Data Engineering, Lakehouse, Data Warehouse, Power BI).
- Deliver fast, reliable, and cost-optimized data flows while maintaining enterprise-grade security and observability.
2. Core Responsibilities
Architecture & Design
- Design end-to-end ingestion, transformation, and serving layers across Databricks and Fabric.
- Define datamodel standards (star schema, CDC, semistructured handling).
Pipeline Development
- Implement CICD-ready pipelines using Databricks Pipelines/Jobs API and Fabric pipelines (SparkSQL, notebooks).
- Enable realtime streaming (Event Hub/Kafka → Structured Streaming → Fabric Lakehouse).
Data Quality & Governance
- Register assets in Unity Catalog & Fabric Lakehouse catalog; enforce rowlevel security, data masking, and Purview lineage.
Performance & Cost Optimization
- Tune Spark clusters, leverage Photon & Genie autotuning.
- Use Fabric’s hot/cold tiers, materialized views, and autoscale compute to keep spend under budget.
Collaboration & Enablement
- Partner with data scientists, analysts, and product owners to translate business needs into reliable data solutions.
- Create reusable templates, documentation, and run knowledge sharing sessions on Databricks & Fabric best practices.
3. Minimum Required Skills
- Databricks – 4 + years with Pipelines, Delta Live Tables, Genie, Agent Bricks; strong PySpark/Scala; Unity Catalog administration.
- Azure Cloud – ADLS Gen2, Event Hub, Service Bus, Azure Functions, Key Vault, Azure DevOps/GitHub Actions, Terraform/ARM.
- Data Modelling – Starschema, CDC, handling JSON/Parquet/Avro.
- Governance & Security – Unity Catalog, Microsoft Purview, row-level security, GDPR/CCPA compliance.
- CI/CD & Testing – Automated unit/integration/end-to-end tests; GitOps workflow.
- Observability – Azure Monitor, Log Analytics, dashboards for pipeline health.
- Soft Skills – Clear communication, stakeholder management, self-starter in a fast-moving team.
4. Preferred / Nice to Have
- Databricks Certified Data Engineer (Associate/Professional).
- Microsoft Certified: Azure Data Engineer Associate.
- Experience with Genie AI-assisted pipeline generation and Fabric Copilot.
- Knowledge of Delta Lake TimeTravel, ZOrdering, and Fabric Direct Lake query optimizations.
- Exposure to MLflow or Azure ML for model served pipelines.