Job Title: Observability Architect
Location – Dallas, TX (FULLY ONSITE)
Terms – Long-Term-Contract
Roles Descriptions:
Observability architect who has hands on experience on New Relic | Splunk | CloudWatch | Kibana | APM | Monitoring Solutions. As this individual will champion automation monitoring solution, which include triaging, incident management, self-healing solution etc.
Key Responsibilities:
- Design and implement end-to-end observability strategies covering metrics, logs, traces, and user experience monitoring
- Architect custom monitoring frameworks tailored to specific business applications and infrastructure landscapes
- Implement and manage observability platforms including New Relic, Splunk, AWS CloudWatch, and Kibana
- Develop and maintain APM scripts, synthetic monitors, custom dashboards, and alerting mechanisms
- Integrate observability tools with CI/CD pipelines for proactive issue detection and faster MTTR
- Collaborate with application, infrastructure, DevOps, and security teams to ensure observability coverage across systems
- Conduct root cause analysis using correlation across metrics, logs, and traces
- Provide technical leadership in observability best practices, architecture reviews, and roadmap planning
- Define and enforce standards for SLAs, SLOs, and SLIs across environments
- Mentor and guide engineering teams in the effective use of observability tools
Key Skills and Technologies
- Monitoring & APM Tools:
- Deep experience with New Relic (including APM, infrastructure, synthetics, custom instrumentation)
- Strong proficiency in Splunk (querying, dashboards, alerts, ingestion pipeline design)
- Hands-on with AWS CloudWatch (metrics, logs, alarms, insights)
- Working knowledge of Kibana and Elastic Stack (ELK)
- Scripting & Customization:
- Experience in APM scripting, custom instrumentation (using Java, Python, or Node.js agents)
- Ability to create synthetic monitors, custom event generators, and automated dashboards
- Familiarity with Terraform, CloudFormation, or scripting languages (Shell, Python) for observability automation
- Architecture & Integration:
- Expertise in designing observability frameworks for cloud-native (AWS/GCP/Azure) and hybrid environments
- Understanding of distributed systems, microservices, and event-driven architectures
- Ability to integrate observability platforms with DevOps pipelines, incident response, and ITSM tools
Qualifications:
- Bachelor’s or master’s degree in computer science, Engineering, or related field.
- 15+ years of experience in software engineering or infrastructure roles, with at least 5+ years in Operations.
- Proven success managing high-availability, large-scale distributed systems (e.g., microservices, cloud-native apps).
- Deep understanding of cloud platforms (AWS GCP), containers (Docker, Kubernetes), monitoring (Prometheus, Grafana, Datadog, new relic), and automation tools (Terraform, Ansible, etc.).
- Experience with modern CI/CD tools (e.g., Jenkins, ArgoCD, GitHub Actions).
- Strong leadership, communication, and team development skills.
Preferred Qualifications:
- Experience in regulated industries (e.g., Telecom, communications) and Global telco leaders.
- Certifications in cloud platforms (AWS Certified DevOps Engineer, Google SRE Certificate, etc.).
- Experience managing hybrid or multi-cloud environments.
Thanks & Regards,
Mayank Jaiswal| Senior Talent Acquisition Specialist
Amaze Systems Inc
USA: 8951 Cypress Waters Blvd, Suite 160, Dallas, TX 75019
Canada: 55 York Street, Suite 401, Toronto, ON M5J 1R7
E: mayank.jaiswal@amaze-systems.com