Location: USA (Remote)
Duration: 6 months minimum
Exp: 12+ years
Skill: LLM, Machine Learning, PyTorch and/or Tensorflow
Role
We are looking for a senior-level engineer with deep expertise in large-scale ML/LLM systems. You will design, optimize, and deploy high-performance training and inference pipelines; build retrieval-augmented generation (RAG) architectures; and own end-to-end ML workflows across research, experimentation, and production. This role requires strong systems thinking, hands-on capability with modern ML frameworks, and the ability to independently drive architectural decisions.
Responsibilities
- LLM Training & Fine-Tuning
Execute supervised fine-tuning, preference optimization, and evaluation workflows.
Build reproducible training pipelines using PyTorch/TensorFlow, DeepSpeed, FSDP, or distributed training stacks.
Implement evaluation harnesses for task-specific and generative metrics. - Retrieval & RAG System Architecture
Design embeddings strategies (dense, sparse/hybrid; instruction-tuned embeddings).
Build vector search pipelines using FAISS, Milvus, or Pinecone with optimal indexing, sharding, and recall/latency tradeoffs.
Develop production-grade RAG pipelines with prompt templates, context optimization, caching, and guardrails. - Inference Optimization
Implement low-latency inference flows using ONNX Runtime, TensorRT, vLLM, or other optimized runtimes.
Apply quantization (INT8/FP8/4-bit), model distillation, graph optimizations, and batching strategies.
Build and operate model-serving stacks (TorchServe, BentoML, custom GRPC microservices). - Applied ML Systems Engineering
Build ML workflows across structured/semi-structured/unstructured data.
Develop feature pipelines, embeddings services, and data preprocessing stacks.
Own CI/CD, experiment tracking, artifact management, and production deployment. - Security, Compliance, and Architecture
Implement secure data pipelines: PII redaction, encryption, access controls, lineage/auditability.
Design privacy-preserving ML workflows (differential privacy, secure enclaves, RBAC/ABAC).
Make architectural decisions around storage formats, compute orchestration, containerization, and observability.
Required Skills and Experience
- 6+ Years of relevant work experience
- Expert-level experience with LLMs, fine-tuning, evaluation, and scalable deployment.
- Strong proficiency in PyTorch or TensorFlow, plus Hugging Face Transformers/PEFT, LangChain, and vector DBs (FAISS, Milvus, Pinecone).
- Strong background designing retrieval systems, RAG architectures, embeddings, and hybrid search strategies.
- Hands-on experience with model acceleration frameworks: TensorRT, ONNX Runtime, vLLM, DeepSpeed, FSDP, etc.
- Ability to own and deliver complete ML systems—data pipelines → training → evaluation → serving → monitoring.
- Strong engineering fundamentals: Python, distributed systems, performance tuning, containerization, cloud compute.
- Ability to operate independently, make architectural calls, and drive complex ML/LLM projects without heavy guidance.
|
|
|||
|

—
