Location: Plano TX (5 Days onsite & 24×7 Rotational) – Rotational (Shift 1 (8 AM – 5 PM), Shift 2 (4 PM – 1 AM), Shift 3 (12 AM – 9 AM)) also on weekend based upon Roaster
Duration: 12 Months
Rate : $55/HR on C2C
Job Summary:
We are seeking a Site Reliability Engineer (SRE) with strong Middleware expertise to design, operate, and continuously improve highly available, secure, and scalable enterprise platforms. This role blends deep middleware operations (WebLogic, APl gateways, Java platforms) with SRE principles such as automation, observability, SLIs/SLOs, error budgets, and incident reduction. The ideal candidate will partner with application, infrastructure, security, and DevOps teams to ensure platform reliability while driving automation, standardization, and operational excellence.
Key Responsibilities: Reliability & SRE Practices:
-
Define, implement, and track SLIs, SLOs, and error budgets for middleware and platform services
-
Drive MTTR reduction, availability improvements, and operational resilience
-
Lead incident response, root cause analysis (RCA), and post-incident reviews
-
Implement proactive monitoring and alerting to reduce noise and prevent outages Middleware Platform Engineering
-
Administer and support enterprise middleware platforms including Oracle WebLogic, Apache, NGINX ○ API Gateways (Apigee Edge / X)
-
Java application servers and JVM-based services Perform patching, upgrades, configuration tuning, and capacity planning Manage certificates, keystores, truststores, and TLS configurations Ensure platform security, compliance, and performance standards Observability & Monitoring
-
Design and maintain end-to-end observability using tools such as Dynatrace, ELK/Kibana, Splunk (or equivalent) Build executive and operational dashboards for real-time health visibility Reduce alert fatigue through smart alerting, thresholds, and suppression Monitor JVM metrics, GC behavior, thread utilization, and API performance Automation & Infrastructure Efficiency
Collaborate with DevOps teams on deployment pipelines and release strategies Collaboration & Leadership:
-
Act as a reliability advisor to application and development team:
-
Partner with Unix/Linux, Database, Network, and Securit teams
-
Provide mentoring, documentation, and best-practice guidance
-
Participate in on-call rotations and production support leadership Required
Skills & Experience:
Technical Skills:
-
5+ years of experience in Middleware / Platform Operations / SRE
-
Strong expertise in WebLogic, Java middleware, Apache/NGINX Hands-on experience with observability platforms (Dynatrace, ELK, Splunk
-
Solid understanding of Linux/ Unix systems and networking fundamentals
-
Experience with API platforms (Apigee preferred)
-
Automnation and scripting skills (Shell, Python, Ansible, Terraform
-
Experience with Kubernetes/OpenShift and containerized workload
SRE & Operational Excellence:
-
Practical experience implementing SRE principles in productior
-
Strong troubleshooting skills (thread dumps, heap analysis, GC logs
Nice-to-Have:
-
Experience with cloud-native architectures and service meshes
-
Knowledge of IAM / Security integrations (OAuth, SAML, mTLS
-
Exposure to C/CD tools (Jenkins, GitHub Actions, GitLab CI]
—