SRE Infrastructure Engineer C2C jobs in SFO, CA (5 Days Onsite)

Location: SFO, CA (5 Days Onsite)

Job Description:

We are seeking a SRE Infrastructure Resource having 8+ years of professional experience ensuring the reliability, scalability, and performance of Google Cloud-based services through automation, monitoring, and proactive engineering. Key responsibilities include managing infrastructure as code (Terraform), optimizing GKE/Kubernetes, incident response, and implementing SLIs/SLOs to minimize manual toil.

This role requires close collaboration with cross‑functional teams, adherence to DevOps and Agile practices, and ownership of service quality and delivery.

Key Responsibilities

GCP Infrastructure Management: Design, deploy, and maintain robust infrastructure components, including VPCs, Compute Engine, GKE (Kubernetes), and storage solutions.
Automation & IaC: Utilize Terraform or Deployment Manager to manage cloud resources and build CI/CD pipelines to automate deployments. Minimizing manual, repetitive tasks by developing automation scripts and custom tools to streamline deployments and operations.
Observability & Incident Management: Develop monitoring, alerting, and logging systems (e.g., Cloud Monitoring, Prometheus, Grafana). Act as primary on-call to troubleshoot production incidents.
Incident Management: Serving as a first responder for system outages and conducting deep-dive root cause analysis (post-mortems) to prevent recurrence
CI/CD Pipeline Management: Designing and supporting automated deployment pipelines using Jenkins, ArgoCD, Artifactory, DevSecOps, GitLab CI, or GitHub Actions
Reliability Engineering: Define and maintain Service Level Indicators (SLIs) and Service Level Objectives (SLOs) – Latency, Traffic, Errors, and Saturation
Optimization & Security: Proactively optimize infrastructure for cost, performance, and security compliance.
Site Reliability Engineer, Google Cloud Engine AI SRE at Google: Focus specifically on AI workload health, and GCE visibility

Mandatory Technical Skills & Competencies

Experience: 8+ years in SRE, DevOps, or systems engineering, specifically with Google Cloud Platform.
Technical Skills: Deep knowledge of Linux, Kubernetes (GKE), networking (VPCs, CDNs), and containerization.
Programming: Proficiency in scripting/programming languages like Python, Go, or Shell.
Methodologies: Strong understanding of GitOps, CI/CD pipelines, and SRE principles (error budgets, toil reduction)
Strong troubleshooting skills across the full stack (network, OS, application).
Ability to balance system stability with the need for rapid deployment.
Observability Tools: Experience implementing monitoring and logging stacks like Prometheus, Grafana, or the Google Cloud Operations Suite
Excellent collaboration skills to work with development teams for service ownership

Soft Skills

Strong problem-solving and analytical skills
Clear communication with technical and non‑technical stakeholders
Ownership mindset and production‑grade engineering discipline
Ability to work independently and within cross‑functional teams

Neha Chaudhary
Team Lead – Recruitment
e: neha.chaudhary@compunnel.com
o: (+1) 609-606-9010 ext.2469
m: (+1) 732-743-9949
HQ: 4390 Route 1 North, Suite 302, Princeton, NJ 08540, USA.

APPLY NOW

🔔 Get our daily C2C jobs / Hotlist notifications on

WHATSAPP TELEGRAM LINKEDIN

SRE Infrastructure Engineer C2C jobs in SFO, CA (5 Days Onsite)

Related

About Author

Leave a Reply Cancel reply

Related

About Author

Leave a Reply Cancel reply

Post your C2C job instantly