20 days old

Deep Learning Distributed Training Engineer

Santa Clara, CA 95050
  • Job Code
Job Description

Design, develop and optimize for Deep Learning Training on Data Center targeted Discrete GPU and CPU clusters. Implement various distributed algorithms such as model/data parallel frameworks, parameter servers, dataflow based asynchronous data communication in deep learning frameworks. Transform computational graph representation of neural network model. Develop deep learning primitives in math libraries. Profile distributed DL models to identify performance bottlenecks and propose solutions across individual component teams. Optimize code for various computing hardware backends. Interact with deep learning researchers and experience with deep learning frameworks.

We are in agile development environment, you should be able to juggle multiple-tasks and able to make forward, demonstrable progress that delivers impact. You will have an opportunity to work with external and internal teams who are passionate about AI/DL training.

The ideal candidate should exhibit the following behavior skills:

  • Strong communication skills. Ability to develop high-quality externally publishable material is a plus
  • Work well in a dynamic team environment


You must possess the below minimum qualifications to be initially considered for this position. Preferred qualifications are in addition to the minimum requirements and are considered a plus factor in identifying top candidates. Experience listed below would be obtained through a combination of your school-work/ classes/ research and/or relevant previous job and/or internship experiences.

Minimum Qualifications:

  • Masters with 4+ years of experience or PhD with 2+ years of relevant industry experience in Computer Science or Computer Engineering or Electrical Engineering or AI or computer vision or SW Engineering or Physics or Mathematics or related relevant technical discipline.
  • 2+ years of experience with the following skills:
  • Excellent Programming skills in languages like Python, C/C++ and CUDA Low level programming and performance optimization skills for CPU and GPU including code generation, performance optimization, distributed compute, and resource management.
  • Understanding of Deep Learning algorithms and experience in deploying/optimizing distributed training on GPU/CPU clusters
  • Familiarity with DL frameworks (e.g. TensorFlow, PyTorch, Mxnet, etc.)

Preferred Qualifications:

  • 2+ years of knowledge/experience in Artificial Intelligence solutions applied t segments such as HPC, Cloud, Visual Computing and/or Enterprise.
  • Prior experience in deployment strategies, performance optimization, distributed computing algorithms, multi node, multi-GPU scaling big plus
  • Large scale language model (GPT-x, Megatron) training on compute clusters is a definite plus.
  • Experience in Machine Learning infrastructure development and optimization (framework, ML pipeline, deployment)
  • Experience or training in one or more of the parallel programming methodologies: SYCL, C++, OpenMP, MPI, CUDA is highly desired

Inside this Business Group

Enable amazing computing experiences with Intel Software continues to shape the way people think about computing across CPU, GPU, and FPGA architectures. Get your hands on new technology and collaborate with some of the smartest people in the business. Our developers and software engineers work in all software layers, across multiple operating systems and platforms to enable cutting-edge solutions. Ready to solve some of the most complex software challenges? Explore an impactful and innovative career in Software.

Intel strongly encourages employees to be vaccinated against COVID-19. Intel aligns to federal, state, and local laws and as a contractor to the U.S. Government is subject to government mandates that may be issued. Intel policies for COVID-19 including guidance about testing and vaccination are subject to change over time.

Posting Statement

All qualified applicants will receive consideration for employment without regard to race, color, religion, religious creed, sex, national origin, ancestry, age, physical or mental disability, medical condition, genetic information, military and veteran status, marital status, pregnancy, gender, gender expression, gender identity, sexual orientation, or any other characteristic protected by local law, regulation, or ordinance.

Work Model for this Role

This role is available as fully home-based and generally would require you to attend Intel sites only occasionally based on business need.

Posted: 2022-05-05 Expires: 2022-06-05

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Deep Learning Distributed Training Engineer

Santa Clara, CA 95050

Join us to start saving your Favorite Jobs!

Sign In Create Account
Powered ByCareerCast