You will be joining the PE Network AI team that is responsible for the end-to-end health (performance and reliability) of Meta's backend datacenter networks that support our GPU based AI Training Clusters. You will build tools and use automation to efficiency scale how we mitigate real-time impact to the network, identify and investigate long-term trends into performance and risks in our data center networks, and drive innovative solutions to monitor and improve Meta's current and future DC network products. AI is essential in driving more relevant content recommendations and ads, enhancements in engagement, and improving user experiences. Network production engineer supporting the Network AI PE team is pivotal in ensuring that the backend DC network is robust, efficient, and capable of supporting Meta's AI training clusters effectively. This role is crucial for driving AI innovations and enhancements that impact various aspects of Meta's operations and services.Engineers that typically thrive in this role are hybrid software and network engineers with experience working with systems, how they fail, and how we can increase their reliability. You have the opportunity to dig into interesting challenges in the networking and software domains, at a scale that offers new challenges on a daily basis.
Network Production Engineer - Network.AI Responsibilities:
Write and review code, develop documentation and capacity plans, and debug the hardest problems, live, on some of the largest and most complex networks and systems in the world
Participate in a weekly on-call rotation and be an escalation contact for service incidents
Perform deep dives on complex technical issues across networks, ranging from automated tooling to hardware failures and network issues
Analyze data to diagnose and identify root causes to network issues
Define, develop, and optimize automated network monitoring systems to mitigate and remediate network events
Proactively find gaps that impact multiple teams, come up with the execution plan, and drive the project directly and through influence of other teams
Contribute to team growth and development through peer mentorship
Minimum Qualifications:
Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience.
4+ years experience coding in higher-level languages (e.g., Python, C++, Go, etc.)
5+ years experience understanding and mitigating network hardware and topology failures
Experience in configuration and maintenance of network devices and NMS systems, or applications such as web servers, load balancers, relational databases, storage systems and messaging systems
Experience learning software, frameworks and APIs
Experience developing and understanding network device configuration for at least one vendor (Juniper, Cisco, Arista, Brocade, etc.)
Knowledge in routing and switching - hardware design and knowledge of forwarding and data planes
Expert knowledge of data center networking concepts (routing, switching, etc.).
Preferred Qualifications:
BS or MS in Computer Science, Computer Engineering, or Network Engineering
Expert knowledge of TCP/IP and IPv6
Experience working in a multi-vendor network environment.
Experience with developing distributed systems and operating them at scale
Experience with automation frameworks and tools such as Ansible, Puppet, or Chef
Experience with operating, designing, implementing and troubleshooting servers and networking components.
About Meta:
Meta builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology. People who choose to build their careers by building with us at Meta help shape a future that will take us beyond what digital connection makes possible today-beyond the constraints of screens, the limits of distance, and even the rules of physics.
Meta is proud to be an Equal Employment Opportunity and Affirmative Action employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, or related medical conditions), sexual orientation, gender, gender identity, gender expression, transgender status, sexual stereotypes, age, status as a protected veteran, status as an individual with a disability, or other applicable legally protected characteristics. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law. Meta participates in the E-Verify program in certain locations, as required by law. Please note that Meta may leverage artificial intelligence and machine learning technologies in connection with applications for employment.
Meta is committed to providing reasonable accommodations for candidates with disabilities in our recruiting process. If you need any assistance or accommodations due to a disability, please let us know at accommodations-ext@fb.com.
$147,000/year to $208,000/year + bonus + equity + benefits
Individual compensation is determined by skills, qualifications, experience, and location. Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual salary only, and do not include bonus, equity or sales incentives, if applicable. In addition to base compensation, Meta offers benefits. Learn more about benefits at Meta.