: Sr AI and Machine Learning Engineer :: Remote

Title: Sr AI and Machine Learning Engineer

Location: USA (Remote)

Duration: 6 months minimum

Exp: 12+ years

Skill: LLM, Machine Learning, PyTorch and/or Tensorflow

Role

We are looking for a senior-level engineer with deep expertise in large-scale ML/LLM systems. You will design, optimize, and deploy high-performance training and inference pipelines; build retrieval-augmented generation (RAG) architectures; and own end-to-end ML workflows across research, experimentation, and production. This role requires strong systems thinking, hands-on capability with modern ML frameworks, and the ability to independently drive architectural decisions.

Responsibilities

LLM Training & Fine-Tuning
Execute supervised fine-tuning, preference optimization, and evaluation workflows.
Build reproducible training pipelines using PyTorch/TensorFlow, DeepSpeed, FSDP, or distributed training stacks.
Implement evaluation harnesses for task-specific and generative metrics.
Retrieval & RAG System Architecture
Design embeddings strategies (dense, sparse/hybrid; instruction-tuned embeddings).
Build vector search pipelines using FAISS, Milvus, or Pinecone with optimal indexing, sharding, and recall/latency tradeoffs.
Develop production-grade RAG pipelines with prompt templates, context optimization, caching, and guardrails.
Inference Optimization
Implement low-latency inference flows using ONNX Runtime, TensorRT, vLLM, or other optimized runtimes.
Apply quantization (INT8/FP8/4-bit), model distillation, graph optimizations, and batching strategies.
Build and operate model-serving stacks (TorchServe, BentoML, custom GRPC microservices).
Applied ML Systems Engineering
Build ML workflows across structured/semi-structured/unstructured data.
Develop feature pipelines, embeddings services, and data preprocessing stacks.
Own CI/CD, experiment tracking, artifact management, and production deployment.
Security, Compliance, and Architecture
Implement secure data pipelines: PII redaction, encryption, access controls, lineage/auditability.
Design privacy-preserving ML workflows (differential privacy, secure enclaves, RBAC/ABAC).
Make architectural decisions around storage formats, compute orchestration, containerization, and observability.

Required Skills and Experience

6+ Years of relevant work experience
Expert-level experience with LLMs, fine-tuning, evaluation, and scalable deployment.
Strong proficiency in PyTorch or TensorFlow, plus Hugging Face Transformers/PEFT, LangChain, and vector DBs (FAISS, Milvus, Pinecone).
Strong background designing retrieval systems, RAG architectures, embeddings, and hybrid search strategies.
Hands-on experience with model acceleration frameworks: TensorRT, ONNX Runtime, vLLM, DeepSpeed, FSDP, etc.
Ability to own and deliver complete ML systems—data pipelines → training → evaluation → serving → monitoring.
Strong engineering fundamentals: Python, distributed systems, performance tuning, containerization, cloud compute.
Ability to operate independently, make architectural calls, and drive complex ML/LLM projects without heavy guidance.