Title: Site Reliability Engineering (SRE) – Triage Specialist
Location: Miami FL (Onsite)
Terms: Contract
Site Reliability Engineering (SRE) Triage Specialist is responsible for quickly assessing and prioritizing incidents and issues posted by Monitoring team to determine the best course of action. This role is vital for minimizing downtime and ensuring that the right teams are engaged as efficiently as possible during a service disruption.
Roles and Responsibilities
- Incident triage and initial response: Act as the first responder for alerts/incident and field reported production issues and escalate high-priority problems to the appropriate engineering or Application support team.
- Initial investigation and diagnosis: Perform initial analysis to understand the scope and potential cause of an incident, gathering key metrics, logs, and traces.
- Runbook execution: Follow predefined runbooks and standard operating procedures to mitigate and resolve common issues swiftly.
- Technical collaboration: Work with implementation, Engineering and Application SMEs, and other stakeholders during an incident to provide real-time updates and coordinate resolution efforts.
- Communication: Provide clear and timely updates to stakeholders about the status of an incident, including warm shift handoffs.
- Documentation and knowledge management: Summarize incident details, update knowledge bases, and contribute to the Problem Resolution Database
- Process improvement: Identify opportunities to improve observability, and incident response procedures based on daily triage activities.
Technical skills:
- Systems knowledge: Strong understanding of Oracle (12c/19c/23ai), SQL, AWS RDS and Data Replication
- Expertise: Oracle RAC, ASM, Clustenware, RMAN
- Triaging: Experience with AWR, Scripting and OEM (Oracle Enterprise Manager).
- Scripting: PL/SQL.
Soft skills:
- Problem-solving: Excellent analytical and troubleshooting skills to quickly diagnose complex issues under pressure.
- Communication: Strong verbal and written communication skills to articulate technical issues and interact with multiple teams effectively.
- Composure under pressure: Ability to remain calm and make critical decisions during high
Thanks & Regards,
Mayank Jaiswal| Senior Talent Acquisition Specialist
Amaze Systems Inc
USA: 8951 Cypress Waters Blvd, Suite 160, Dallas, TX 75019
Canada: 55 York Street, Suite 401, Toronto, ON M5J 1R7
E: mayank.jaiswal@amaze-systems.com