Summary: This position is a lead SRE role. We are targeting someone with high level automation skills in at least one scripting or programming language, no preference. This person will be the technical leader in handling issues within production. MUST HAVE: -Really looking for someone with strong automation background – scripting or programming and/or building automation tools from scratch -8+ years of SRE experience and ideally lead experience -AWS – Network type stuff (firewalls, routing, load balancing) – will be in charge or troubleshooting rather than deploying, security groups, EC2, API gateways, cloud formation, understanding ports – 99.99% AWS Strong experience in at least 1: Kubernetes Scripting Programming Infrastructure – (networking, systems, storage, etc.) Security Job Description: Job Summary The Lead SRE Engineer, is a hands-on Engineer that will be responsible for provisioning, configuring and automating cloud services and cloud native solutions as well as designing and implementing automated software delivery pipelines for enterprise deployments. A key focus of this role is providing solutions that are robust, scalable, and highly available. The Lead SRE is passionate about learning and enjoys a deep dive into complex problems and technologies. Critical to this role is the ability to break down non-trivial concepts clearly and concisely for mixed audiences. Job Functions - Configure cloud services, cloud native container orchestration platforms for high traffic applications.
- Collaborate with technical teams and cloud vendors to define best practice adoption of cloud technology and deliver best of breed solutions
- Automate application delivery pipelines for containerized and legacy applications running in the cloud and on-premises. Provisioning monitoring and APM tools to provide application health insights.
- Provision, configure and maintain infrastructure for multiple production and test environments
- Collaborate with cross functional teams to proactively reduce MTTR or production incidents through automated healing mechanisms and curating health indicators.
- Troubleshoot issues in real time as part of incident response., skills
Skills - 7+ years of overall experience in System Operations, Software Development or both
- plus 3+ years of experience in design, provisioning and automation of distributed infrastructure/applications in cloud and on-premise environments.
- Awareness and use of security technologies in eCommerce development.
- Strong experience with Kubernetes container orchestration and Docker. .
- Strong experience deploying and managing applications to public cloud systems i.e. AWS, Azure, Google Cloud Platform.
- Strong practical Windows and Linux-based systems administration skills in a Cloud or Virtualized environment.
- Experience building sophisticated and highly automated infrastructure.
- Prior success in automating a real-world production environment.
- Experience with seamless/automated build scripts used for release management across all environments.
- Strong ASP.NET, JavaScript and C# programming skills.
- Experience with configuration management tools like Chef, Puppet, Salt, or Ansible in production environments with many nodes.
- Experience building CI/CD pipelines with products like TeamCity/Octopus, Jenkins, Bamboo
- Strong scripting skills, i.e., PowerShell, Python, Bash, Ruby, Perl, etc.
- Knowledge of IP networking, VPN's, DNS, load balancing and firewall.
- Familiarity with any monitoring tools like Splunk, AppDynamics, Nagios, New Relic.
- Experience with revision control source code repositories like Git, TFS, SVN, Mercurial.
- Experience with automated testing tools (ie. Selenium, JMeter).
- Understanding and experience with code deployment (tagging).
- Understanding of Service-Oriented Architecture (SOA and REST)., knowledge, certifications
| |
|