Solutions Architect, Infrastructure - Research Computing

1 Month ago • 5 Years + • DevOps • $148,000 PA - $235,750 PA

Job Summary

Job Description

NVIDIA seeks a Solutions Architect, Infrastructure - Research Computing to work with universities and research institutions. Responsibilities include designing, building, and optimizing university-level research computing infrastructures, utilizing GPU-accelerated workflows and tools like NVIDIA Base Command, Kubernetes, Slurm, and Jupyter. The role involves implementing system monitoring, documenting learnings, providing customer feedback, and optimizing resource utilization at scale. A strong background in building and deploying research computing clusters and AI workloads is essential.
Must have:
  • MS/PhD in relevant field or equivalent experience
  • 5+ years relevant experience
  • GPU-accelerated computing infrastructure design & deployment
  • Cluster orchestration (Slurm, Kubernetes)
  • Container tools (Docker, Singularity)
  • System monitoring & optimization expertise
Good to have:
  • LLM training and inference experience
  • Academic research computing experience
  • High-performance parallel file systems knowledge
  • OpenMPI and NCCL knowledge
  • Debugging and profiling tool experience
Perks:
  • Competitive salary
  • Comprehensive benefits package
  • Excellent engineering work culture

Job Details

Are you an experienced systems architect with an interest in advancing artificial intelligence (AI) and high-performance computing (HPC) in academic and research environments? We are looking for a Solutions Architect to join the higher education and research team! In this role you will work with universities and research institutions to optimize the design and deployment of AI infrastructure. Our team applies expertise in accelerated software and hardware systems to help enable groundbreaking advancements in AI, deep learning, and scientific research. This role requires a strong background in building and deploying research computing clusters, deploying AI workloads, and optimizing system performance at scale.

What you’ll be doing:

  • Technical advisor for the design, build-out, and optimization of university-level research computing infrastructures that include GPU-accelerated scientific workflows.

  • Work with university research computing to optimize hardware utilization with software orchestration tools such as NVIDIA Base Command, Kubernetes, Slurm, and Jupyter notebook environments.

  • Implement systems monitoring and telemetry tools to help optimize resource utilization, and track most demanding application workloads at research computing centers.

  • Document what you learn. This can include building targeted training, writing whitepapers, blogs, and wiki articles, and working through hard problems with a customer on a whiteboard.

  • Provide customer requirements and feedback to product and engineering teams.

What we need to see:

  • MS or PhD in Engineering, Mathematics, Physical Sciences, or Computer Science (or equivalent experience).

  • 5+ years of relevant work experience.

  • Strong experience in designing and deploying GPU-accelerated computing infrastructure.

  • In-depth knowledge of cluster orchestration and job scheduling technologies, e.g. Slurm, Kubernetes,Ansible and/or Open OnDemand. And experience with container tools (Docker, Singularity, Enroot/Pyxis) including at-scale deployment of containerized environments

  • Expertise in systems monitoring, telemetry, and systems performance optimization of research computing environments. Familiarity with tools like Prometheus, Grafana or NVIDIA DCGM.

  • Understanding of datacenter networking technologies (InfiniBand, Ethernet, OFED) and experience with network configuration.

  • Familiarity with power and cooling systems architecture for data center infrastructure.

Ways to stand out from the crowd:

  • Experience in deploying LLM training and inference workflows in a research computing environment.

  • Experience working with technical computing customers in the academic research computing space.

  • Practical knowledge of high-performance parallel file systems.

  • Applications and systems-level knowledge of OpenMPI and NCCL.

  • Experience with debugging and profiling tools. E.g. Nsight Systems, Nsight Compute, Compute Sanitizer, GDB or Valgrind.

With highly competitive salaries, a comprehensive benefits package, and an excellent engineering work culture, NVIDIA is widely considered to be one of the industry's most desirable employers.

The base salary range is 148,000 USD - 235,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.

You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Similar Jobs

Playtech - Dev Ops Engineer

Playtech

London, England, United Kingdom (On-Site)
2 Months ago
Zones - Cloud Engineer

Zones

Mumbai, Maharashtra, India (On-Site)
1 Month ago
DMarket - Sr. Back-end Developer

DMarket

Ukraine (Remote)
1 Month ago
Williams-Sonoma,  Inc  - Sr Software Engineer (Machine Learning)

Williams-Sonoma, Inc

Pune, Maharashtra, India (On-Site)
4 Months ago
Microsoft - Software Engineer – Cloud Data Warehouse- Barcelona (Spain)

Microsoft

Barcelona, Catalonia, Spain (On-Site)
1 Month ago
Playtech - Platform Engineer

Playtech

London, England, United Kingdom (On_site)
1 Month ago
The Walt Disney Company - Manager, Software Engineering - Ads Data Infrastructure and Devops

The Walt Disney Company

Santa Monica, California, United States (On-Site)
2 Months ago
ION - Cloud Engineer Kubernetes

ION

Italy (Hybrid)
4 Months ago
Rackspace Technology - AWS Migration Engineer

Rackspace Technology

India (Remote)
2 Days ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

BSH Home Appliances India - Architect MES Foundation

BSH Home Appliances India

Bengaluru, Karnataka, India (On-Site)
3 Months ago
Go Fund Me - Staff Software Engineer (Backend)

Go Fund Me

Buenos Aires, Buenos Aires, Argentina (On-Site)
3 Months ago
PlayStation Global - Senior Full Stack Software Engineer

PlayStation Global

Madison, Wisconsin, United States (On-Site)
1 Month ago
Phoenix Labs - Senior Services Engineer - Dauntless

Phoenix Labs

Canada (Remote)
3 Weeks ago
PlayStation Global - Info Sys Engineer 3

PlayStation Global

Bellevue, Washington, United States (On-Site)
5 Months ago
Revolgy - L2 Cloud Ops Engineer

Revolgy

(Remote)
1 Week ago
DEVOTEAM - Data Driven | MLOps Engineer

DEVOTEAM

Lisbon, Lisbon, Portugal (Remote)
4 Months ago
Infoblox - Staff/Senior Data Engineer

Infoblox

Pune, Maharashtra, India (Hybrid)
3 Months ago
Velotio Technologies - Senior Engineer (ROR + NodeJS)

Velotio Technologies

Pune, Maharashtra, India (Remote)
2 Months ago
RoofStack - Senior Software Developer

RoofStack

İstanbul, İstanbul, Türkiye (On-Site)
2 Days ago

Get notifed when new similar jobs are uploaded

Jobs in New York, New York, United States

Onward Search - Art Director – Marketing Campaigns (Contract)

Onward Search

Cleveland, Ohio, United States (Remote)
2 Months ago
Milestone - Lead Data Engineer

Milestone

United States (Remote)
1 Day ago
Google - Software Engineering Manager II, Infrastructure, Google Cloud

Google

Durham, North Carolina, United States (On-Site)
1 Month ago
NVIDIA - Senior Systems Software Engineer, CUDA Driver

NVIDIA

Santa Clara, California, United States (Remote)
1 Week ago
Bonfire Studios - Systems Engineer (Senior/Principal/Lead)

Bonfire Studios

California, United States (On-Site)
7 Months ago
Penumbra - Human Resources Assistant

Penumbra

Roseville, California, United States (On-Site)
2 Months ago
Twitch - Product Manager - Content Moderation

Twitch

San Francisco, California, United States (Remote)
5 Months ago
The Walt Disney Company - National Geographic Yellow Border Production Services Intern, Summer 2025

The Walt Disney Company

Washington, District Of Columbia, United States (On-Site)
2 Days ago
The Walt Disney Company - Youth Activities Counselor (Japanese Speaking)

The Walt Disney Company

Kapolei, Hawaii, United States (On-Site)
2 Weeks ago
Zoox - Systems Engineer - Functional Safety

Zoox

Foster City, California, United States (Hybrid)
4 Months ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Microsoft - Cambridge Internship in ML Model Optimization

Microsoft

Cambridge, England, United Kingdom (On-Site)
3 Weeks ago
Warner Bros Discovery - Sr. Manager, Integrations

Warner Bros Discovery

Mexico City, Mexico City, Mexico (On-Site)
2 Months ago
The Walt Disney Company - Lead Software Engineer

The Walt Disney Company

Burbank, California, United States (On-Site)
1 Month ago
Larian Studios - DEVOPS BUILD ENGINEER

Larian Studios

Quebec, Canada (On-Site)
1 Month ago
Unity - Senior Site Reliability Developer

Unity

Montreal, Quebec, Canada (On-Site)
5 Months ago
PwC - ETIC, OCI Technical Support Engineer - Senior Associate

PwC

Cairo, Cairo Governorate, Egypt (On-Site)
3 Months ago
Zeta - Site Reliability Engineer I (Payzapp)

Zeta

Bengaluru, Karnataka, India (On-Site)
4 Months ago
Axon - Manager, Site Reliability Engineering

Axon

Canada (Remote)
3 Days ago
Microsoft - Senior DPU Software Engineer

Microsoft

Bengaluru, Karnataka, India (On-Site)
1 Month ago
Zeta - Sr. Site Reliability Engineer

Zeta

Bengaluru, Karnataka, India (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Since its founding in 1993, NVIDIA (NASDAQ: NVDA) has been a pioneer in accelerated computing. The company’s invention of the GPU in 1999 sparked the growth of the PC gaming market, redefined computer graphics, ignited the era of modern AI and is fueling the creation of the metaverse. NVIDIA is now a full-stack computing company with data-center-scale offerings that are reshaping industry.


Yokne'am Illit, North District, Israel (On-Site)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (Hybrid)

Santa Clara, California, United States (On-Site)

United States (Remote)

Santa Clara, California, United States (On-Site)

Santa Clara, California, United States (On-Site)

Bengaluru, Karnataka, India (Hybrid)

Bengaluru, Karnataka, India (Hybrid)

View All Jobs

Get notified when new jobs are added by NVIDIA

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug