Terms of Employment • Contract, 6 Months (Likely Extension) • Initially, this individual should be comfortable working onsite once per week in Reston, VA. Once settled in, this individual can work remotely - with in-person events roughly once per quarter.
Overview & Responsibilities • We are seeking an experienced Infrastructure Engineer to join our team. In this role, you will play a crucial part in maintaining and optimizing our infrastructure systems by identifying, analyzing, and troubleshooting issues to ensure high availability and performance. • You will conduct in-depth server troubleshooting, perform root cause analysis, and collaborate with vendors-especially IBM-to keep our systems functioning smoothly. Additionally, you'll lead calls to provide insight and explanation into the issues and solutions, ensuring effective communication across teams.
Key responsibilities include: • Performing server health checks, identifying production issues, and assessing system functionality. • Conducting root cause analysis to determine the underlying issues, diagnosing performance bottlenecks, and implementing solutions. • Engaging with multiple vendors (e.g., IBM) for support on infrastructure tools and system issues, with a focus on MQ (Message Queue) services and other IBM products. • Managing incidents and service requests, leading troubleshooting efforts to resolve infrastructure issues efficiently. • Monitoring and analyzing system performance metrics to prevent future issues and improve system uptime. • Supporting continuous improvement by planning and implementing infrastructure enhancements based on findings. • Collaborating with cross-functional teams to understand system dependencies and the flow of the Software Development Life Cycle (SDLC). • Documenting troubleshooting processes and communicating solutions to internal stakeholders and clients.
Required Skills & Experience • 5+ years of experience in infrastructure engineering with strong knowledge of server troubleshooting, root cause analysis, and problem resolution. • Demonstrated expertise with IBM products and tools, including MQ services. • Strong experience with AWS and Bitbucket. • Proven ability to analyze and resolve network issues, identify system health status, and execute recovery plans. • Excellent communication skills, with the ability to lead calls and articulate root cause analysis, solutions, and next steps effectively to both technical and non-technical stakeholders. • Thorough understanding of SDLC processes, infrastructure frameworks, and operational best practices.
Preferred Skills & Experience • Experience working with or alongside IBM vendor support and familiarity with troubleshooting protocols specific to IBM environments. • Familiarity with automation tools and scripting languages for enhancing infrastructure processes and managing health checks. • Background in production support for complex environments with high availability requirements. • Proficiency in managing cloud-based services within AWS and maintaining infrastructure as code best practices.