22 days old

Site Reliability Engineer (SRE) - Infrastructure

Cupertino, CA 95014
  • Job Code
    200095113
Summary

Summary

Posted: Jul 8, 2020

Weekly Hours: 40

Role Number:200095113

Services and Infrastructure (S&I) is seeking a customer service oriented, self-driven, and motivated Infrastruct...Summary

Summary

Posted: Jul 8, 2020

Weekly Hours: 40

Role Number:200095113

Services and Infrastructure (S&I) is seeking a customer service oriented, self-driven, and motivated Infrastructure SRE to join our team. S&I is a diverse group of engineers that form the foundation of the build system that is responsible for assembling Apple's software products. The candidate will possess the ability to analyze and troubleshoot a broad spectrum of problems. As an Infrastructure SRE you will be part of implementing the infrastructure to support the continued growth of the build system and reinventing the way we monitor our environment. You will join an existing team dedicated to supporting software engineering teams within Apple.

Key Qualifications

  • Minimum 5-7 years experience in a production data center with at least a 1000 servers
  • Experience troubleshooting complex issues, correlating data from multiple areas i.e. environmental, server sensors, and OS
  • Experience gathering server data from various vendor BMC i.e. HP iLO, Dell DRAC, IPMI
  • Broad experience supporting and maintaining common Linux/Unix applications and services, as well as a good understanding of DNS, DHCP, LDAP, NFS, Kerberos, PAM, PXE, SNMP, SSH, HTTP/S, and NTP
  • Experience with common version control software such as Git
  • Monitoring using Prometheus, Grafana and Splunk

Description

Specific responsibilities will include
Work cross functionally with vendors and variety of other teams at Apple to identify infrastructure instabilities and help resolve them
Hands on and remote troubleshooting of hardware and linux systems
Document policies and procedures
Troubleshooting Layer 2 / Layer 3 networking, Arista / Cisco preferred
Support day-to-day operations of the environment including monitoring, measuring, and troubleshooting infrastructure and services
Automate tasks and processes by identifying, owning, collaborating, and driving new or further automation to enhance the consistent stability of the environment
Ability to self-manage large projects including setting and meeting deadlines
Ability to participate in a regular on-call rotation

Additional Requirements

  • Preferred Qualifications
  • Experience with DCIM software i.e. Struxureware
  • Cisco/Arista networking experience
  • Monitoring and metrics to gather statistical data for strategic planning
  • Understanding of server deployment process using PXE
  • Understand rack elevations, power requirements and cooling for capacity planning
  • Working with remote data center service teams


Posted: 2020-10-07 Expires: 2020-11-06

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Site Reliability Engineer (SRE) - Infrastructure

Apple, Inc.
Cupertino, CA 95014

Join us to start saving your Favorite Jobs!

Sign In Create Account
Powered ByCareerCast