26 days old

Site Reliability Engineer

San Diego, CA 92101
  • Job Code
    200206925
Summary

Summary

Posted: Mar 19, 2021

Weekly Hours: 40

Role Number:200206925

Software Delivery Services & Infrastructure is focused on ensuring engineers at Apple can do amazing things and...Summary

Summary

Posted: Mar 19, 2021

Weekly Hours: 40

Role Number:200206925

Software Delivery Services & Infrastructure is focused on ensuring engineers at Apple can do amazing things and we are looking for a talented application Site Reliability Engineer (SRE) to join us in this mission. You'll play a critical role in the day-to-day operations of services relied upon across Apple. You'll partner with engineering teams to ensure they're successful. You'll look for opportunities to innovate all while driving for rock-solid operations.

Responsibilities will include
- Adopt and apply SRE best practices to services you support
- Keep users, key stakeholders, and leadership informed through regular reporting and communications
- Identify areas of automation for manual tasks/toil
- Develop playbooks related to actionable alerts
- Foster strong relationships with cross-functional teams
- Participating in on-call rotations
- Deployment validation testing for production deployments
- Continuous customer experience validation and performance analysis
- Perform regular disaster recovery (DR) testing and fail-overs
- Participation in incident post mortems and implementing preventive findings
- Ensuring services are adhering to published specs/standards
- Perform predictive analysis or implement AI to do issue avoidance

Key Qualifications

  • A positive and respectful attitude
  • A passion for providing reliable services at scale, on bare metal as well as in cloud environments
  • A deep understanding of CI/CD technologies such as Jenkins
  • Strong working knowledge of Git and code-review systems such as Gerrit, Bitbucket, and Github
  • Good understanding of administration of Linux services
  • Experience using Prometheus, Grafana, and Splunk
  • Superb collaboration skills with excellent written and verbal communication
  • The ability to troubleshoot large scale systems
  • Deep understanding of web services, how they operate and what needs monitoring and alerts
  • Good understanding of security principals and design
  • The desire to be proactive at all times in issue prevention
  • The desire to do what is right for the customer and to provide a great customer experience

Description

As part of Software Delivery Services & Infrastructure SRE, you will be responsible for delivering reliable services and driving projects to a successful outcome. This role will focus on operating and supporting a distributed development workflow used by teams in Software Engineering. You will monitor SLOs, respond to incidents, troubleshoot issues, and ensure the service is up-to-date and secure. You will collaborate with engineering teams to implement best practices and shape technical decisions.

To ensure your success, this job will provide you with:
- Passionate and talented coworkers around the global that are ready to collaborate, mentor, and learn from you
- Ownership to drive meaningful improvements to the operational reliability of the services you manage
- Opportunities to contribute to the best practices used by SRE teams within Software Delivery

Additional Requirements

  • - Prior experience as an SRE, software engineer, or system administrator
  • - Proven ability to self-manage large projects and meet deadlines


Posted: 2021-03-14 Expires: 2021-04-13

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Site Reliability Engineer

Apple, Inc.
San Diego, CA 92101

Join us to start saving your Favorite Jobs!

Sign In Create Account
Powered ByCareerCast