Senior Site Reliability Engineer at Mueller Water Products in Atlanta, Georgia

Posted in Other 2 days ago.





Job Description:

We are currently searching for a Senior Site Reliability Engineer to join Mueller's Smart Water Infrastructure team. This role will be based in our Atlanta, GA on a hybrid office/ remote schedule.


The Senior Site Reliability Engineer (SRE) is responsible for deployment, monitoring and ensuring the availability, reliability, scalability, and performance of software products against operational targets. They are responsible for the design, implementation, and maintenance of infrastructure required to support software products.



Key responsibilities



  • Collaborate with software development teams to ensure that services are designed with availability, security, scalability, reliability, and performance in mind from the outset.

  • Monitor and manage live production environments, identifying and resolving issues as they arise and implementing long-term solutions to prevent their recurrence.

  • Develop and maintain automation tools for system health, performance monitoring, and incident response to ensure rapid detection and resolution of issues.

  • Resolve support issues where your experience is required to ascertain the issue quickly and to find an appropriate resolution.

  • Lead root cause analysis of critical outages, contributing to a culture of learning and continuous improvement.

  • Provide SRE/DevOps/Infrastructure services and guidance to the Software Team.

  • Support vendor-unmanaged services such as databases.

  • Co-ordinate with internal and external security and penetration tests and manage the prioritization and resolution of any findings.

  • Produce well-written documentation and architecture diagrams.

  • Be available 'out of hours' if required to complete specific tasks and support customers in emergency or disaster scenarios. This is not a usual and regular occurrence.

  • Mentor junior engineers, fostering a culture of technical excellence and collaborative problem-solving.




Key competencies



  • Strong technical competency in software product operations.

  • Strong collaboration skills to work effectively with cross-functional teams.

  • Excellent communication skills, both verbal and written, to effectively articulate technical and product information.

  • Ability to prioritize and manage multiple tasks simultaneously and work under tight deadlines.

  • Exceptional problem-solving abilities and a systematic approach to root cause analysis.




Experience required.



  • Bachelor's or Master's degree in a computing or scientific/engineering discipline, or equivalent demonstrable experience.

  • 5+ years of Site Reliability Engineer experience.

  • Operational experience of AWS Serverless technologies

  • Linux and Windows system administration

  • CI/CD pipelines

  • Database Administration

  • Patch Management and Disaster and Recovery

  • Advanced Monitoring knowledge.

  • Automation scripting in a mainstream programming language

  • IaC




Experience desired



  • Git

  • Monitoring tools (Datadog, Cloudwatch, Grafana).

  • Terraform

  • Coding experience outside scripting tools.

  • Networking understanding (DNS/Firewalls/Certificates).

  • Exposure to ISO certified environments.

  • Security fundamentals. Snyk, TFSec and other security tools.



We are an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other category protected by law.
More jobs in Atlanta, Georgia

Other
5 minutes ago

United States
Other
6 minutes ago

United States
Other
6 minutes ago

United States
More jobs in Other

Other
1 minute ago

Rehabilitation Institute of Chicago
Other
1 minute ago

Rehabilitation Institute of Chicago
Other
5 minutes ago

United States