Location: San Francisco, CA
Job Description:
Job Summary
We are seeking an experienced AI Systems Architect to design, build, and scale high-performance distributed AI systems. The ideal candidate will have deep expertise in GenAI, LLMs, and cloud-native architectures, along with hands-on experience in building enterprise-scale AI/ML platforms and agent-based systems.
Must-Have Skills
Strong experience in designing and implementing high-performance, large-scale distributed systems
Proven experience in implementing and deploying AI/ML platforms at scale
Expertise in building agent-based architectures, evaluation frameworks, and prompt/context engineering
Knowledge of MCP (Model Context Protocol) servers
Hands-on experience in LLM inference optimization, including batching and caching strategies
Strong experience with Kubernetes and cloud infrastructure (AWS/Azure/GCP)
Proficiency in at least one programming language (Python, Java, Go, etc.)
Expertise in designing agent data stacks & retrieval systems, including:
Vector databases
Hybrid search
Data freshness strategies
Memory systems
Graph reasoning
BM25 and advanced retrieval techniques
Key Responsibilities
Architect and deliver scalable, high-performance distributed systems
Design and deploy AI/ML and GenAI platforms at enterprise scale
Build and manage agent-based architectures, including:
Prompt and context engineering
MCP servers
Evaluation frameworks
Optimize LLM inference pipelines for latency, throughput, and efficiency
Design and implement agent data & retrieval systems (vector DBs, hybrid search, memory, graph-based reasoning)
Lead Kubernetes-based, cloud-native deployments
Provide technical leadership, architecture governance, and hands-on mentoring to engineering teams
Balaji. S
201-781-8058 EXT 145
bala@realtekconsulting.net
—