Position- Local–Senior Data Engineer
Location- Â NYC, NYC (3 days onsite minimum)
We are seeking a talented and experienced Senior Data Engineer to join our team. This role requires expertise in data engineering, with a strong focus on Retrieval-Augmented Generation (RAG) and AI technologies. The ideal candidate will have hands-on experience with modern data tools and frameworks, and a passion for building scalable, efficient data pipelines and AI solutions.
Key Responsibilities:
Design, develop, and maintain robust data pipelines and architectures using Python, PySpark, Snowflake, and Databricks.
Implement and optimize data workflows for large-scale data processing.
Collaborate with data scientists and AI teams to integrate RAG and other AI models into production systems.
Understand and apply RAG concepts to enhance retrieval and generation capabilities.
Manage cloud infrastructure and resources using Terraform (good to have). Ensure data quality, security, and compliance across all data processes. Monitor and troubleshoot data pipelines and resolve any issues promptly. Document data architecture, pipelines, and processes for team reference and knowledge sharing.
Â
Qualifications:
Proven experience as a Data Engineer or similar role, with a focus on data pipeline development.
Strong proficiency in Python and PySpark.
Extensive hands-on experience with Snowflake and Databricks.
Solid understanding of AI/ML fundamentals, especially RAG or retrieval-augmented generation.
Familiarity with cloud infrastructure and deployment tools, especially Terraform.
Knowledge of ETL/ELT processes, data modeling, and data warehousing.
Good problem-solving skills and attention to detail.
Excellent communication and teamwork abilities.
Â
Preferred Skills:
Experience with other cloud platforms (Azure, AWS, GCP).
Knowledge of orchestration tools like Airflow.
Understanding of NLP and deep learning techniques.
Prior exposure to Terraform for infrastructure as code.
Qualifications:
Proven experience as a Data Engineer or similar role, with a focus on data pipeline development.
Strong proficiency in Python and PySpark.
Extensive hands-on experience with Snowflake and Databricks.
Solid understanding of AI/ML fundamentals, especially RAG or retrieval-augmented generation.
Familiarity with cloud infrastructure and deployment tools, especially Terraform.
Knowledge of ETL/ELT processes, data modeling, and data warehousing.
Good problem-solving skills and attention to detail.
Excellent communication and teamwork abilities.
Â
Preferred Skills:
Experience with other cloud platforms (Azure, AWS, GCP).
Knowledge of orchestration tools like Airflow.
Understanding of NLP and deep learning techniques.
Prior exposure to Terraform for infrastructure as code.
—