Website cloudraninc.com
Position – SRE ( Payment industry Preffered)
Location – Remote
Client – Master Card
***Only qualified Site Reliability Engineer candidates located with the ability to work REMOTE within the US to be considered for this position. ***
Required Education:
• Bachelor’s degree in Computer Science or related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience.
Required Skills, Experience, & Abilities:
• Experience with algorithms, data structures, scripting, pipeline management, and software design.
• Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
• Ability to help debug, optimize code, and automate routine tasks.
• We support many different stakeholders. Experience in dealing with difficult situations and making decisions with a sense of urgency is needed.
• Experience in one or more of the following is preferred: Python, Go, Bash Scripting.
• Interest in designing, analyzing, and troubleshooting large-scale distributed systems.
• We need team members with an appetite for change and pushing the boundaries of what can be done with automation and to support our on-call shifts. Experience in working across development, operations, and product teams to prioritize needs and to build relationships is a must.
• For work on our ops team, engineer with experience in industry standard tools like Git/BitBucket, Jenkins/XLR, Chef, Splunk and Dynatrace. Experience designing and implementing an effective and efficient CI/CD flow that gets code from dev to prod with high quality and minimal manual effort is required.
Role:
Our client’s BizOps team is looking for a Site Reliability Engineer who can help us solve problems, build our pipelines and lead Mastercard in automation and best practices.
Responsibilities
• Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement.
• Analyze ITSM activities of the platform and provide feedback loop to development teams on operational gaps or resiliency concerns
• Support services before they go live through activities such as system design consulting, capacity planning and launch reviews.
• Maintain services once they are live by measuring and monitoring availability, latency, and overall system health with automated alerts.
• Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
• Practice sustainable incident response and detailed postmortems.
• Take a holistic approach to problem solving, by connecting the dots during a production event thru the various technology stack that makes up the platform, to optimize mean time to recover
• Work with a global team spread across tech hubs in multiple geographies and time zones
• Share knowledge and mentor junior resources
To apply for this job email your details to praveenn@cloudraninc.com