BUFFALO GROVE, IL
C2C
-Strong knowledge of ERP systems. (implementations/validation)
-Expert writing test scripts, test functionality, data flow, documentation
–CSV-Computer systems validation
-Has worked in the manufacturing/med device space
Washington, DC
Randstad is seeking a skilled and proactive Site Reliability Engineer (SRE) to join our client in the Washington D.C. area, focusing on optimizing the availability, performance, and scalability of critical production services. The ideal candidate will bridge the gap between development and operations by applying software engineering principles to infrastructure and operational problems. This role requires a strong background in CI/CD pipeline development, infrastructure automation using Infrastructure-as-Code (IaC), incident response, and deep experience with cloud platforms, preferably AWS. The SRE will collaborate across engineering teams to drive automation, enhance observability, and ensure the continuous, secure delivery of high-quality software.
Responsibilities
- Deployment & Automation: Design, build, and maintain robust Continuous Integration/Continuous Delivery (CI/CD) pipelines utilizing tools suchs as GitHub Actions, Jenkins, or AWS CodePipeline.
- Infrastructure-as-Code (IaC): Automate the provisioning and management of cloud infrastructure using IaC tools like Terraform, CloudFormation, or AWS CDK.
- Monitoring & Observability: Develop comprehensive monitoring dashboards, alerting rules, and logging configurations using platforms such as AppDynamics, CloudWatch, or Dynatrace to proactively ensure systems meet defined Service Level Objectives (SLOs).
- Incident Response & Remediation: Participate in a rotating on-call schedule, triage and resolve high-priority incidents, and conduct blameless postmortem reviews to identify and implement root cause remediations.
- Security & Compliance: Contribute to a DevSecOps culture by assisting with secrets management and integrating security scanning tools (e.g., AWS ECR, Checkmarx, Synk) directly into CI/CD pipelines.
- Documentation & Knowledge Sharing: Create and maintain high-quality technical documentation, runbooks, and escalation procedures to ensure system readiness and operational efficiency.
- Cross-Functional Collaboration: Partner with application developers, infrastructure engineers, and security teams to successfully deploy and sustain production-grade services.
- Database Management: Apply knowledge of relational (MySQL, PostgreSQL) and NoSQL (MongoDB) databases to optimize database structures and contribute to data modeling efforts.
Qualifications
Required Experience & Technical Skills
- 2+ years of hands-on experience in a Site Reliability Engineering, DevOps, or Infrastructure support role.
- Proficiency with at least one major cloud platform (AWS experience is strongly preferred).
- Experience with building and managing CI/CD pipelines (e.g., Jenkins, GitHub Actions, AWS CodePipeline).
- Proficiency in automating infrastructure with an IaC tool (e.g., Terraform, CloudFormation).
- Strong working knowledge of Linux-based systems and shell scripting.
- Familiarity with version control systems, particularly Git.
- Understanding of core monitoring and alerting principles and experience with common observability tools.
- Basic understanding of core cloud services (e.g., AWS S3, EFS, Kinesis) and basic troubleshooting techniques.
- Willingness to participate in on-call rotations and take ownership of service reliability.
Education & Soft Skills
- Bachelor’s degree in Computer Science, Information Systems, Engineering, or a related technical field—or equivalent hands-on professional experience.
- Strong desire to learn and continually grow expertise in automation, observability, and SRE best practices.
- Excellent problem-solving, analytical, and communication skills to work effectively with diverse teams.
Job Title
Washington, DC
Job Description
We are seeking a highly skilled and motivated iOS Native Software Development Engineer in Test (SDET) to join our dynamic Agile team, supporting a Randstad client in the Washington D.C. area. This hybrid role is crucial for ensuring the quality, reliability, and performance of our native iOS applications through advanced test automation. The ideal candidate will possess strong programming skills, a deep understanding of software development and quality assurance methodologies, and practical experience with SAFe/Agile frameworks. This role requires a commitment to continuous improvement, close collaboration with development teams, and the ability to embrace self-management and self-organization within iteration commitments and capacity planning. On-site presence in D.C. is required on a need basis, including attendance at key SAFe ceremonies like Program Increment (PI) planning.
Key Responsibilities
- Design, develop, and maintain robust automated test scripts and comprehensive test frameworks for native iOS applications.
- Collaborate closely with development teams to understand application changes and their impact on testing, advocating for testability in design reviews.
- Execute and evaluate various test cases, including functional, regression, performance, and API testing, to proactively identify software defects.
- Integrate automated tests seamlessly into Continuous Integration/Continuous Delivery (CI/CD) pipelines (e.g., Jenkins, GitLab CI).
- Analyze test failures, perform root cause analysis, and report bugs with clear, detailed, and reproducible documentation.
- Conduct both white-box and black-box testing techniques to ensure thorough quality coverage.
- Contribute significantly to the overall test strategy, planning, and estimation activities for the Agile/SAFe team.
- Ensure rigorous adherence to quality standards and best practices throughout the entire software development lifecycle.
Required Qualifications and Skills
- Education/Certification: AWS Certification is required.
- Programming Proficiency: Strong coding skills in a relevant language, such as SWIFT, Java, Python, or C#.
- Testing Expertise: Proven experience with testing tools relevant to mobile and automation (e.g., Appium, Selenium, JUnit, TestNG, JMeter).
- Methodologies: Deep understanding of software development and Quality Assurance (QA) methodologies.
- Agile Experience: Direct experience working in Agile, DevOps, and Test-Driven Development (TDD) environments.
- CI/CD: Familiarity with integrating automation into CI/CD tools (e.g., Jenkins, GitLab CI).
- SAFe: Contractors trained in SAFe are strongly encouraged, with an expectation to fully embrace SAFe principles, self-management, and required ceremony attendance.
- Soft Skills: Excellent analytical, problem-solving, communication, and collaboration abilities.
- Location/Work: Ability to work a hybrid schedule, with the flexibility to come to D.C. on a need basis, especially for important SAFe ceremonies (e.g., PI planning).
Job Title Principal Site Reliability Engineer
Location Washington, DC
The Principal Site Reliability Engineer will be a critical technical leader responsible for driving the operational excellence, resilience, and security of our core systems for a key Randstad client in the Washington D.C. area. This senior role merges deep expertise in infrastructure automation (IaC), CI/CD architecture, and cloud security with the foundational principles of Site Reliability Engineering (SRE), including defining SLOs, managing error budgets, and leading incident response. You will mentor cross-functional teams, implement cost-efficient cloud practices, and build the foundational tools and platforms that enable our developers to deliver secure, highly available, and scalable services with velocity.
Responsibilities
- Reliability Engineering & Operations: Define, implement, and maintain rigorous Service Level Objectives (SLOs) and Service Level Indicators (SLIs), establish effective error budgeting, and lead incident response, root cause analysis, and postmortem processes to ensure continuous service improvement.
- Infrastructure Automation: Architect, implement, and manage secure, scalable, and repeatable cloud environments leveraging Infrastructure-as-Code (IaC) tools such as Terraform, Ansible, and CloudFormation.
- CI/CD Optimization & Security: Design and optimize secure, high-performance CI/CD pipelines (e.g., GitHub Actions, Jenkins) incorporating advanced deployment techniques like automated rollback, canary, and blue/green strategies, and ensuring artifact validation.
- Observability & Telemetry: Develop comprehensive observability solutions, including building robust dashboards, configuring alerts, implementing synthetic checks, and maintaining telemetry pipelines (metrics, logs, traces) to ensure deep visibility into system performance, availability, and cost.
- Security & Compliance Enforcement: Integrate security tooling (SAST, DAST, SBOM, secrets scanning) directly into the deployment lifecycle and enforce security policies-as-code within deployment workflows to maintain strict compliance and a secure posture.
- Cost & Capacity Management: Implement tooling and financial practices to proactively monitor cloud cost trends, perform right-sizing of infrastructure resources, and strategically plan capacity to ensure optimal cost-to-performance ratio and high availability.
- Internal Platform Enablement: Design and build reusable internal tools, shared playbooks, and self-service platforms that significantly enhance developer productivity and enforce consistent, high-quality delivery standards across engineering teams.
- Mentorship & Technical Leadership: Serve as a senior technical mentor and subject matter expert across platform, security, and engineering teams, establishing and promoting best practices in operational readiness, fault tolerance, and secure delivery.
Qualifications
- Experience:
- Bachelor’s degree in Computer Science, Engineering, or a related technical discipline.
- Minimum of 5 years of progressive experience in DevOps, Site Reliability Engineering (SRE), or Platform Engineering, with proven leadership experience in infrastructure reliability and automation.
- 3+ years of direct, hands-on experience managing high-availability production environments with modern cloud-native security and observability tooling.
- Technical Expertise:
- Deep expertise in a major cloud platform (e.g., AWS, Azure, GCP), particularly in core services like Compute, Networking, Identity and Access Management (IAM), and monitoring.
- Proficiency with Infrastructure-as-Code tools, specifically Terraform and CloudFormation, and container orchestration technologies like Kubernetes and Docker.
- Strong working knowledge of Linux systems and shell scripting.
- In-depth familiarity with observability stacks (e.g., Prometheus, Grafana, ELK, Datadog, CloudWatch).
- Demonstrated experience designing, implementing, and managing CI/CD systems that incorporate security tollgates, rollback logic, and GitOps patterns.
- Skills & Knowledge:
- Strong scripting and programming skills in Python, Go, or Bash for automation and tooling development.
- In-depth understanding of core SRE practices, including incident response, SLO/SLA management, chaos engineering, and capacity modeling.
- Proven track record of creating shared tooling, documentation, and best practices that drive operational excellence and knowledge transfer across an organization.
Thanks & Regards,
Mohammed Asif
Sr. Executive – Talent Acquisition
_______________________________
E Mail: asif@vtekis.com