Senior Software Engineer - HPC

1 Month ago • 10 Years + • DevOps • $184,000 PA - $356,500 PA

Job Summary

Job Description

NVIDIA seeks a Senior Software Engineer for its HPC infrastructure team. Responsibilities include designing highly available and scalable systems, evaluating new technologies, improving infrastructure provisioning and management using automation, supporting a multi-cloud environment (AWS, GCP, on-prem), collaborating with cross-functional teams, ensuring high uptime and QoS, and participating in on-call rotations. The ideal candidate has 10+ years of experience in large engineering projects, proficiency in at least two programming languages (Golang, Java, C/C++, Scala, Python, Elixir), cloud computing expertise, and strong CI/CD skills.
Must have:
  • 10+ years experience in large engineering projects
  • Proficiency in at least two programming languages
  • Cloud computing expertise (GCP, AWS, Azure)
  • Strong CI/CD, GitOps, and IaC skills
  • Design highly available and scalable systems
  • Experience with HPC clusters (Slurm or Kubernetes)
Good to have:
  • Strong understanding of Linux and TCP/IP
Perks:
  • Equity
  • Benefits

Job Details

NVIDIA has continuously reinvented itself over two decades. Our invention of the GPU in 1999 fueled the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI and enabled the next era of computing. NVIDIA is a “learning machine” that constantly evolves by adapting to new opportunities that are hard to address, that matters to the world, and that only we can address. This is our life’s work, to amplify human imagination and intelligence, and expand what is possible. We’re seeking strategic, bold, hard-working, and creative individuals who are passionate about helping us tackle challenges no one else can solve. Make the choice to join us today.
 

We are looking for a Senior Software Engineer to join our mission to continue improving our HPC infrastructure. Our team builds and operates sophisticated infrastructure to enable business critical services and AI applications. You will be working with a team of passionate and skilled engineers that are continuously working to provide better tools to build and manage this infrastructure. Ideal candidate is strong in software development, designing and creating reliable distributed systems, and has the ability to implement well thought out long term maintenance strategy.


What you’ll be doing:

  • Design highly available and scalable systems to meet the demands of our HPC clusters

  • Evaluate new and innovative technologies as the landscape evolves

  • Continuously improve infrastructure provisioning and management using automation

  • Support a globally distributed, multi-cloud hybrid environment - AWS, GCP and On-prem

  • Build strong cross functional relationships and align with partners across various business units

  • Ensure the highest level of up-time and Quality of Service (QoS) to our users through operational excellence

  • Participate in team's on-call rotation and be a contact for service incidents


What we need to see:

  • 10+ years of experience in design, implementation, and delivery of large engineering projects

  • Comfortable with at least two of the following programming languages: Golang, Java, C/C++, Scala, Python, Elixir.

  • Understands scalability challenges and performance of server-side code. Able to craft and develop horizontally-scalable, resilient and performing-under-load systems.

  • Versatile technologist with experience in full software development lifecycle – from inception and design to deployment, operation, and iterative development.

  • Proficient in cloud computing and are hands-on in at least one cloud platform: GCP, AWS, or Azure.

  • Proficient in modern CI/CD techniques, GitOps and Infrastructure as Code(IaC)

  • Strong work ethic and a passion for problem solving

  • B.S. degree in Computer Science or related technical field (or equivalent experience)

  • Detail oriented with great communication and collaboration skills


Ways to stand out from the crowd:

  • Prior experience building solutions for HPC clusters based on Slurm or Kubernetes

  • Strong understanding of Linux operation system and TCP/IP fundamentals

The base salary range is 184,000 USD - 356,500 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

GoTo Group - Software Engineer - Comms Platform

GoTo Group

Bengaluru, Karnataka, India (On-Site)
4 Months ago
ION - Smalltalk Developer - 708

ION

India (On-Site)
4 Months ago
GoTo Group - Senior Software Engineer - Engineering Platform

GoTo Group

Gurugram, Haryana, India (On-Site)
3 Months ago
PwC - IN_Manager_ GRC _Risk Analytics _Advisory_Gurugram

PwC

Gurugram, Haryana, India (On-Site)
1 Month ago
Next Level Business Services - Systems Engineer

Next Level Business Services

Redmond, Washington, United States (On-Site)
4 Months ago
The Walt Disney Company - Manager, Software Engineering - Ads Data Infrastructure and Devops

The Walt Disney Company

Santa Monica, California, United States (On-Site)
2 Months ago
Intrepid Studios,  Inc  - DevOps Engineer (Kubernetes & Cloud Services)

Intrepid Studios, Inc

Canada (On-Site)
6 Months ago
Google - Data Cloud Consultant, Professional Services, Google Cloud

Google

Mexico City, Mexico City, Mexico (On-Site)
1 Month ago
Vigaet - Internship-Backend Developer

Vigaet

Bengaluru, Karnataka, India (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Bally's Interactive - Java Developer

Bally's Interactive

(On-Site)
3 Months ago
Go Fund Me - Staff Software Engineer (Backend)

Go Fund Me

San Francisco, California, United States (Hybrid)
2 Months ago
Playrix - Technical Director (Game Project)

Playrix

Georgia (Remote)
3 Months ago
InnoGames - InnoMaster Softwareentwicklung (berufsbegleitendes Masterstudium) - WiSe25

InnoGames

Hamburg, Hamburg, Germany (Hybrid)
4 Months ago
Irdeto - Senior Software Engineer

Irdeto

Noida, Uttar Pradesh, India (Hybrid)
4 Months ago
Google - Software Engineer II, Full Stack, Google Cloud

Google

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Interactive Brokers - Software Engineer, Mid level

Interactive Brokers

Greenwich, Connecticut, United States (On-Site)
4 Months ago
Riot Games - Principal Software Engineer (Services) - Teamfight Tactics, Core Tech

Riot Games

Los Angeles, California, United States (On-Site)
3 Months ago
Meta - Software Engineer, Infrastructure

Meta

New York, New York, United States (Remote)
3 Months ago
version 1 - Register your interest for our 2025 Digital Academy Programme

version 1

Birmingham, England, United Kingdom (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Santa Clara, California, United States

Dmg - Associate Account Manager - Amazon

Dmg

Cincinnati, Ohio, United States (On-Site)
3 Months ago
Framestore - FREELANCE: NUKE - NEW YORK

Framestore

New York, New York, United States (On-Site)
8 Months ago
The Walt Disney Company - WABC-TV (ABC7) 7 On Your Side Intern, Fall 2025

The Walt Disney Company

New York, New York, United States (On-Site)
1 Week ago
Microsoft - Research Intern - Imitation Learning and Language Model Alignment (Spring 2025)

Microsoft

New York, New York, United States (On-Site)
1 Month ago
The Walt Disney Company - Principal, Product Manager

The Walt Disney Company

Burbank, California, United States (On-Site)
1 Week ago
NVIDIA - Senior Software Architect, AI and HPC

NVIDIA

Santa Clara, California, United States (Remote)
3 Days ago
ByteDance - Software Engineer Intern (Applied Machine Learning) - 2025 Summer/Fall (BS/MS)

ByteDance

San Jose, California, United States (On-Site)
3 Months ago
The Walt Disney Company - Senior Manager, Product Management - Ad Decisioning

The Walt Disney Company

Santa Monica, California, United States (On-Site)
3 Months ago
Next Level Business Services - Salesforce Technical Lead

Next Level Business Services

Bloomington, Minnesota, United States (On-Site)
4 Months ago
Pika - Summer Research Internship

Pika

Palo Alto, California, United States (On-Site)
1 Week ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Williams-Sonoma,  Inc  - Systems Engineer (DevOps)

Williams-Sonoma, Inc

Pune, Maharashtra, India (On-Site)
4 Months ago
AbZorba Games  - Dev Ops Engineer

AbZorba Games

Athens, Greece (On-Site)
8 Months ago
PlayStation Global - Staff Service Reliability Engineer

PlayStation Global

Berlin, Berlin, Germany (On-Site)
4 Months ago
Axinous - Senior Software Development Engineer

Axinous

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Northern Trust - Manager, Infra Info Svcs

Northern Trust

Pune, Maharashtra, India (On-Site)
3 Months ago
Topsoe - Senior Software Engineer

Topsoe

New Delhi, Delhi, India (On-Site)
3 Months ago
Dynamics - Infrastructure Architect (SEVIS)

Dynamics

(Remote)
2 Months ago
Appier - ServiceOps Engineer

Appier

Taipei City, Taiwan (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Yokne'am Illit, North District, Israel (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

United States (Remote)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Bengaluru, Karnataka, India (Hybrid)

Bengaluru, Karnataka, India (Hybrid)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug