26 days old

HPC / AI Cluster Administrator (Remote)

Intel
Atlanta, GA 30303 Work Remotely
  • Job Code
    JR0212237
Job Description

Intel is seeking a qualified and a versatile software engineer to join the Intel DevCloud team within the Software and Advanced Technology Group (SATG). Intel DevCloud (http://devcloud.intel.com/oneapi) is a development environment to develop, test and run workloads across a range of latest Intel CPUs, GPUs, and FPGAs using Intel's software and tools.

We are looking for someone who has built out AI / HPC clusters from concept to production and have managed clusters in an on prem bare metal environment. In this role you will help us realize the technical opportunity of creating a scalable, secure and performant cluster across our bare metal heterogeneous computing environments.

Responsibilities:

  • In this role you will be responsible for designing, building and maintaining a cluster from concept to production in a heterogenous bare metal environment.

  • You will use your expertise in networking, storage and compute to create and manage an on prem bare metal Linux environment including cluster provisioning, operations, workload optimization, workflow orchestration, log monitoring, security, and troubleshooting.


Qualifications

Minimum Qualifications:

  • BS with at least 6+ years or MS with at least 4+ years of experience in computer science or related field.

  • Experience in designing, building and administering clusters in a production environment for an enterprise or a university.

  • Experience with implementing and tuning cluster file systems solutions like NFS/LUSTRE/GPFS/ Others.

  • Experience in HPC cluster networking to create high bandwidth, low latency and secure networks. Knowledge of routing, switching, and load balancing.

  • Administration experience with Linux OS (SLES or REL or CentOS or Ubuntu etc.).

  • Experience with cluster management SW like PBS/TORQUE/SLURM.

  • Experience in at least one scripting language like Shell/Bash/Python.

  • Experience in remediating security vulnerabilities including Linux patching and package management.

  • Experience in analyzing server logs and error code messages, troubleshooting any issues as needed.


Preferred Qualifications:

  • Knowledge of designing HPC cluster using Mellanox InfiniBand interconnect solution is a plus.

  • Experience in working with Git and supporting CI/CD pipelines is a plus.

  • Experience with containers (Docker/Singularity/Podman/ Kubernetes) is a plus.

  • Experience in working with Git and supporting CI/CD pipelines is a plus.

  • Knowledge of cloud-native architectures, microservices and operational best practices in the cloud is a plus.

  • Knowledge of virtualization, multi-cloud and distributed systems is a plus.

  • Willingness to pick a new area quickly.

  • Willingness to be comfortable working in a self-directed environment.

Inside this Business Group

Enable amazing computing experiences with Intel Software continues to shape the way people think about computing across CPU, GPU, and FPGA architectures. Get your hands on new technology and collaborate with some of the smartest people in the business. Our developers and software engineers work in all software layers, across multiple operating systems and platforms to enable cutting-edge solutions. Ready to solve some of the most complex software challenges? Explore an impactful and innovative career in Software.



Other Locations

US, Oregon, Hillsboro;Virtual US and Canada


Intel strongly encourages employees to be vaccinated against COVID-19. Intel aligns to federal, state, and local laws and as a contractor to the U.S. Government is subject to government mandates that may be issued. Intel policies for COVID-19 including guidance about testing and vaccination are subject to change over time.



Posting Statement

All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.

Annual Salary Range for jobs which could be performed in US, Colorado:
$132,940.00-$199,800.00


Benefits:
We offer a total compensation package that ranks among the best in the industry. It consists of competitive pay, stock, bonuses, and benefit programs. Find more information about our Amazing Benefits here

Work Model for this Role

This role is available as fully home-based and generally would require you to attend Intel sites only occasionally based on business need.


Intel is committed to a culture of accessibility.  Intel provides accommodations to applicants and employees with disabilities.  Find information and request accommodation here

Posted: 2022-04-30 Expires: 2022-05-31

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

HPC / AI Cluster Administrator (Remote)

Intel
Atlanta, GA 30303

Join us to start saving your Favorite Jobs!

Sign In Create Account
Powered ByCareerCast