Manager of site reliability at FTSi.Tech in Pittsburgh, Pennsylvania

Posted in Other 2 days ago.

Type: full-time

Job Description:

Manager Site Reliability Engineering Job Description

Position Title: Manager Site Reliability Engineering

Reports to: Director of Systems Engineering

Position Summary

This position is responsible managing the overall stability of customer engineering organization, facilitating a team of dedicated engineers while coordinating with stakeholders in development, infrastructure, product, and leadership. This position is responsible for managing the stability of the website and store fleet on incident occurrence, as well as identifying how we can be better in the future. The manager of the Site Reliability Engineering team has the opportunity to develop processes and technological solutions to address site stability, and will have full control over the direction of the stability roadmap.

Responsibilities
• Manage Site Reliability engineering roadmap, backlog and active triages to ensure team is delivering on both the proactive and reactive stability needs of the customer engineering organization
• Deliver on tactical decisions while maintaining quality of day to day activities through effective management of full time and contract resources.
• Define day to day tasks and projects for team members, track and manage the delivery of work.
• Communicate effectively with leadership, cross functional partners, and individual contributors through verbal and written communication regarding incidents, followup, and team deliverables
• Maintain and enhance stability benchmarks that reflect overall stability of the site through KPIs, SLAs, SLOs, and SLIs and report on these metrics regularly
• Identify opportunities for process, people, technological improvement in the stability organization and formalize plans to execute on these improvements
• Reduce manual tasks through automation, process improvement, training, or elimination of manual need
• Mentor individual contributors to achieve technical maturity and personal growth
• Participate in business critical incident events and facilitate coordination, communication, and resolution as well as incident followup and prevention
• Partner with development team to understand applications and features will impact overall stability of site and introduce or modify monitoring and operational processes to meet these need
• Partner with cross-functional teams to identify and mitigate risks to system reliability and ensure application stability

Qualifications
• Experience as Engineering Lead / Manager (Infrastructure, SRE, Devops, Development, Incident Management)
• Experience in business critical technical incident triage and troubleshooting
• Expertise in monitoring tools and technologies (New Relic, Datadog, Dynatrace, Splunk, Elk, Google Observability) and their usage in triage and problem investigation
• Experience in automation tools (Ansible, Chef, Puppet, Terraform)
• Understanding of cloud platforms (AWS, GCP, Azure)
• Effective verbal/written communication to technical and non technical audiences
• Demonstrated hands-on experience and expertise, understanding of software development, testing, deployments, project management methodologies
• Experience in developing and executing plans, meeting deadlines and operating under tight time constraints
• Demonstrated ability to anticipate, mitigate, and resolve technical challenges across numerous disciplines

More jobs in Pittsburgh, Pennsylvania

Other about 1 hour ago PROCESS TECHNOLOGIST START UP SUPPORT Confidential Pittsburgh, Pennsylvania
Other about 2 hours ago Financial Solutions Specialist I First Commonwealth Bank Pittsburgh, Pennsylvania
Other about 5 hours ago Program Administrative Assistant University of Pittsburgh Pittsburgh, Pennsylvania

More jobs in Other

Other 30+ days ago Equipment Operator - Overhead (Oncor) Pike Electric, Inc Austin, Texas
Other 30+ days ago A Lineman - Overhead (Okefenoke REMC) Pike Electric, Inc Nahunta, Georgia
Other 12 minutes ago Prevention & Recovery Support Coordinator - 502493 University of Richmond Richmond, Virginia

Manager of site reliability at FTSi.Tech in Pittsburgh, Pennsylvania

Job Description:

More jobs in Pittsburgh, Pennsylvania

More jobs in Other

About

More Info

Job Seekers

Employers

Jobs