cloudraninc.com
**Only qualified Senior Java Developer candidates located near Alpharetta, GA, to be considered due to the position requiring an onsite presence. ***
Required Skills, Experience, & Abilities:
Job Location – Alpharetta, GA, USA lOCAL Candidaes
Must have Skills/Attributes
A/B testing, AWS, Azure, Java, Python, Typescript
Core Software Engineering Skills (Must Have)
- Strong coding in Python (often primary) and/or TypeScript/Java.
- Solid fundamentals: data structures, APIs, concurrency/async, error handling, clean architecture.
- Experience building microservices and integrating with internal/external APIs.
- Familiarity with CI/CD, automated testing, code reviews, and release management. Agentic / LLM Engineering Skills (Must Have)
- Designing and implementing agent workflows: planning? tool selection? execution? verification.
- Tool/function calling patterns and building reliable tool interfaces (idempotency, retries, timeouts).
- Prompt engineering plus prompt/version management and safe prompt templating.
- Handling non-determinism: evaluations, guardrails, deterministic fallbacks, and replayable runs.
- Building multi-step orchestration (state machines, DAGs, workflow engines, or agent frameworks). Retrieval + Knowledge Integration (Often Required)
- Building RAG pipelines: chunking, embeddings, indexing, retrieval strategies, and reranking.
- Working knowledge of vector databases (e.g., Pinecone, pgvector, FAISS, Weaviate) and search.
- Grounding and citation approaches; freshness and permission-aware retrieval. Reliability, Safety, and Governance
- Implementing guardrails: PII redaction, prompt injection defenses, policy filters, and allow/deny tool lists.
- Designing for observability: tracing agent steps, tool calls, latency, token/cost metrics.
- Building robust fallbacks (rule-based flows, smaller models, cached answers, human escalation).
- Secure handling of secrets and credentials; least-privilege tool access. Evaluation & Quality (Critical for Agentic Systems)
- Creating evaluation suites: golden tasks, regression sets, scenario tests, adversarial tests.
- Defining success metrics (task completion rate, groundedness, hallucination rate, latency, cost).
- Experience with A/B testing or online evaluation in production. Platform/Infrastructure (Preferred)
- Cloud experience (AWS/Azure/GCP), containers (Docker), optionally Kubernetes.
- Familiarity with scalable data pipelines and queues (Kafka/SQS/PubSub) for async agent work.
- Experience optimizing inference costs/latency (model choice, batching, caching, token reduction).
Nice-to-Haves
- Experience with frameworks like LangChain, LlamaIndex, or workflow engines (Temporal, Step Functions, Airflow).
- Knowledge of security engineering relevant to LLM apps (prompt injection, data exfiltration patterns).
- Domain expertise in the product area (e.g., telecom operations, network management, customer support). Soft Skills
- Strong cross-functional collaboration with product, security, and platform teams.
- Ability to translate business workflows into agent capabilities and measurable outcomes.
- Comfortable operating in ambiguity and iterating from prototype ? hardened production.
To apply for this job email your details to praveenn@cloudraninc.com