Role: Senior Site Reliability Engineer
Location: NYC, NY (Hybrid)
Hire type: Contract
Key Responsibilities:
What You’ll Do:
-
Support the SRE team in developing and implementing enhancements to support workflows, focusing on automation and efficiency improvements
-
Handle technical escalations, troubleshoot complex FIX and API connectivity issues, and actively participate in on-call rotations during non-traditional hours to ensure rapid response and resolution
-
Adhere to and administer incident and change management policies
-
Coordinate incident resolution efforts and implement change management protocols to maintain and enhance system reliability
-
Work closely with the Lithuania office to ensure smooth operation and alignment of SRE practices across time zones
-
Coordinate Incident Post Mortems and RCA analysis
-
Design, implement, and maintain comprehensive monitoring, logging, and tracing solutions (observability stack) to provide deep insights into system performance and user experience
-
Partner with product and engineering teams to define clear Service Level Indicators (SLIs) and Service Level Objectives (SLOs), managing error budgets to ensure service reliability meets business needs
Required Qualifications:
-
5+ years in a senior SRE role or a similar position, demonstrating deep knowledge and expertise in site reliability engineering and operations
-
Knowledge of FIX protocol and messages, ability to read FIX logs
-
Familiarity with REST APIs and a strong understanding of API integration
-
Proficient in Python and scripting for automation and system management, with a proven track record of developing and implementing automation solutions
-
Expertise in SQL and transactional databases, including querying and troubleshooting
-
Strong analytical and troubleshooting skills with a proven ability to identify and resolve technical issues through root cause analysis
-
In-depth knowledge of core networking concepts including TCP/IP, routing, and DNS.
-
Familiarity with maintaining and troubleshooting systems within both cloud (AWS) and co-location (colo)
-
Availability for flexible work hours and willingness to cover US markets trading sessions, including L2 on-call coverage
-
Knowledge of change management processes and risk management
Preferred Qualifications:
-
Experience in the brokerage or financial industry.
-
Proficient with cloud services, particularly AWS, and knowledgeable about cloud architecture best practices, including IAM, EC2, S3, and DynamoDB.
-
Experience maintaining and supporting containerized systems, with familiarity in orchestration tools.
-
Knowledge of Infrastructure as Code (IaC) practices and tools such as Terraform or CloudFormation.
-
Ability to manage and troubleshoot job scheduling tools like Rundeck or Apache Airflow.
-
Advanced skills in managing containerized environments using Kubernetes and OpenShift.
-
Practical experience with Confluent Cloud, RedPanda for event streaming architectures.
-
Experience with API-based applications and a basic understanding of using the browser developer console for front-end debugging.
Looking forward with you!!!
Thanks & Regards
Dhanashri Shete
Nityo InfoTech Corp.