Get C2C/W2 Jobs & hotlist update

Site Reliability Engineer (SRE) — Cloud & AI Operations

Contract

Website Zenmid Sols LLC

Role Overview:
Drive the reliability, security, and scalability of mission-critical cloud platforms with an AI-first and automation-driven approach. Collaborate across functions to optimize cost, enhance operational efficiency, and champion service excellence.

Key Responsibilities:

Ensure maximum site reliability and uptime through proactive monitoring and rapid incident response.
Leverage AI-driven tools and analytics for intelligent operations, automation, and continuous improvement.
Optimize resources and operating costs, always seeking efficiencies across teams and processes.
Lead incident management, including rapid troubleshooting, detailed postmortem reviews, and systematic resilience improvements.
Foster a culture of upskilling and knowledge sharing, guiding team members with the latest technical innovations.
Collaborate with development, QA, product, and operations to align architecture and reliability goals.
Champion a product-focused mindset, delivering robust solutions and dependable user experiences.
Maintain strict adherence to security, compliance, and trust standards as defined by firm-wide technology governance.
Build and maintain advanced automation and tooling to minimize manual intervention and increase productivity.
Implement industry-leading monitoring, observability, and alerting solutions to quickly detect and resolve issues.
Conduct strategic capacity planning to ensure scalability and business continuity.
Apply financial acumen to identify and deploy cost-effective technology services.
Lead disaster recovery initiatives and controlled failure testing to strengthen operational resiliency.
Communicate clearly with stakeholders and internal teams, driving alignment and shared understanding.
Stay current on the latest cloud, SRE, and platform technologies; proactively apply new innovations to supported applications.
Enhance TechOps and product team visibility and capabilities through bespoke solutions.
 

Required Skills:

Advanced troubleshooting and ability to make fast, data-driven decisions.
Expert knowledge of Azure, AKS, serverless computing, and cloud-native architecture.
Strong networking fundamentals and deep understanding of platform use cases.
Skilful at cross-team coordination and stakeholder communication.
Proven ability to lead by example, drive continuous improvement, and adapt to evolving business needs.
Solid understanding of technology governance and compliance.
Commitment to innovation, automation, and proactive learning.

To apply for this job email your details to paritosh.sood@zenmidsols.com