Techridge Inc.
We are looking for a Site Reliability Engineer (SRE) with a strong application engineering background to improve application reliability, observability, and incident resolution across a complex enterprise landscape.
This role will focus on understanding application behavior, diagnosing performance issues, and reducing Mean Time to Resolution (MTTR), rather than solely managing infrastructure or CI/CD pipelines.
Required Skills & Experience
5–10 years of experience in application engineering, production support, or SRE roles
Strong experience in application troubleshooting and debugging (Java/.NET/Node.js preferred)
Solid understanding of distributed systems and microservices architectures
Experience with application logs, debugging tools, and performance profiling
Familiarity with observability tools (Splunk, Dynatrace, AppDynamics, Datadog, etc.)
Strong understanding of API behavior, database interactions, and system integrations
Experience working in production support / incident management environments
Experience implementing distributed tracing (Open Telemetry, Jaeger, Zipkin)
Knowledge of cloud environments (AWS/Azure/GCP)
Exposure to resiliency patterns (circuit breakers, retries, fallbacks)
Experience with performance tuning and load analysis
To apply for this job email your details to krish@techridge.net