Principal AI Platform Architect

23 Minutes ago • 10-10 Years • DevOps

About the job

Job Description

Microsoft seeks a Principal AI Platform Architect to lead the design and architecture of next-generation AI supercomputers and platforms within the Azure ecosystem. This role demands expertise in AI platform architecture, rack-scale server design, and collaboration with diverse engineering teams (software, electrical, mechanical, thermal). Responsibilities include driving architectural concepts, partnering with silicon development organizations, and conducting trade-off studies considering TCO, performance, and power efficiency. The ideal candidate will possess deep experience in deploying AI/GPU systems at scale and have a strong understanding of AI workloads and hardware impact on performance.
Must have:
  • 10+ years AI platform architecture experience
  • Expertise in co-designing with datacenter/server teams
  • Ability to articulate architectural tradeoffs
  • Drive technology partners towards optimal solutions
Good to have:
  • Experience deploying AI systems at scale in cloud environments
  • Knowledge of AI training/inference workloads
  • TCO analysis expertise
  • PCBA design experience
Perks:
  • Industry-leading healthcare
  • Educational resources
  • Product and service discounts
  • Savings and investment programs
  • Maternity/paternity leave
  • Generous time away
  • Giving programs
  • Networking opportunities

Overview

The Azure Platform Architecture team is at the forefront of technology and system design, leading the way for the next generation of systems and AI super computers. Our mission is to architect the most performant, secure, reliable, and cost and power optimized solutions that are deployed and managed at hyperscale and power Azure. Leading the AI platform architecture for these systems that power one of the largest hardware deployments on earth requires deep technical knowledge and partnership across many teams. This individual will act as the subject matter expert and platform architect for Microsoft internal Artificial intelligence (AI) Accelerator family products, helping articulate and define our next generation platforms. This requires working across multiple domains including product, software, electrical, mechanical, thermal, performance, and deployment to find the right solution trade-offs.   

We are looking for a Principal AI Platform Architect to join the team. 

Our team is part of a broader hardware and infrastructure organization known as the Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE). SCHIE is the team behind Microsoft’s expanding Cloud Infrastructure and is responsible for powering Microsoft’s “Intelligent Cloud” mission. SCHIE delivers the core infrastructure and foundational technologies for Microsoft's 200+ online businesses including Teams, OneDrive, Office 365, Xbox Live, Skype, Bing, MSN, and the Microsoft Azure platform globally. 

 We architect and design the server and data center infrastructure, security and compliance, operations, globalization, and manageability solutions to support these businesses. Our focus is on smart growth, high efficiency, and deliver trusted experience to customers and partners worldwide. As Microsoft's cloud business continues to grow the ability to deploy new offerings and HW infrastructure on time, at hyperscale, with high reliability and the best performance/price level is paramount.  

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.  

In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day 

Qualifications

Required Qualifications:  

  • 10+ years of technical engineering experience 

o OR Bachelor's degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 8+ years of technical engineering experience 

o OR Master's degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 5+ years of technical engineering experience 

o OR Doctorate degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 4+ years of technical engineering experience. 

  • 10+ years demonstratedexpertise in AI platform and/or rack-scale server architecture and design.  
  • 10+ years demonstratedexpertise in co-designing with datacenter, server, silicon, firmware/software orchestration, and manufacturing engineering organizations.  

 

Other Requirements:  

Ability to meet Microsoft, customer and/or government security screening requirements arerequired for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.   

 

Preferred Qualifications: 

  • Experience deploying AI or GPU systems at scale within a cloud service provider or hyper-scale company. 
  • Knowledge of AI training and inference workloads, and understanding of how hardware impacts AI performance, operations, and efficiencies. 
  • Ability to analyze AI system concepts from a total cost of ownership (TCO), performance per TCO, and performance per watt perspective, including understanding system constraints that drive design tradeoffs. 
  • Expertise in conducting tradeoff studies for electrical, mechanical, thermal, and hardware systems. 
  • Experience in PCBA (Printed Circuit Board Assembly) design, including schematic creation, layout, routing, power, and signal integrity. 

 

Hardware Engineering IC5 - The typical base pay range for this role across the U.S. is USD $137,600 - $267,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $180,400 - $294,000 per year.    Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:        

 

Microsoft will accept applications for the role until Jan 11th, 2025. 

 

 

#AfroTech2024 

Responsibilities

  • Drive platform, rack, and datacenter-level architectural concepts and definition for Microsoft AI system products.  
  • Build relationships with our internal silicon development organizations, technology, and development partners to drive leading edge innovation into our next generation products.   
  • Partner across Microsoft teams and collaborate to deliver industry leading products.  
  • Distill and articulate architectural tradeoffs encompassing electrical, signal integrity, mechanical, power, and thermal inputs in terms of key metrics such as Total Cost of Ownership TCO, performance, power efficiency, schedule, and risk.  
  • Drive and influence technology providers and design partners towards optimal components and solutions to meet the future requirements for Azure’s infrastructure. 
  • Embody our and  
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect
View Full Job Description
$137.6K - $294.0K/yr (Outscal est.)
$215.8K/yr avg.
Redmond, Washington, United States

Add your resume

80%

Upload your resume, increase your shortlisting chances by 80%

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

Washington, United States (On-Site)

Santa Clara, California, United States (On-Site)

Redmond, Washington, United States (On-Site)

Redmond, Washington, United States (On-Site)

Redmond, Washington, United States (On-Site)

Dublin, County Dublin, Ireland (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Similar Jobs

Microsoft - Senior Hardware Engineer

Microsoft, India (On-Site)

Globalization Partners - IT Automation Engineer

Globalization Partners, (Remote)

Symplr - Devops Engineer

Symplr, India (Hybrid)

Ziff Davis - Senior Software Engineer, Backend - Lose It!

Ziff Davis, United States (On-Site)

Banyan Software - Infrastructure Engineer - Viostream

Banyan Software, India (On-Site)

Northern Trust - Manager, Infra Info Svcs

Northern Trust, India (On-Site)

Google - Senior Database Blackbelt, Google Cloud

Google, United States (On-Site)

Rackspace Technology - GCP Cloud Engineer II - IN

Rackspace Technology, India (Hybrid)

Trend Micro - Sr. Engineer

Trend Micro, Taiwan (On-Site)

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Microsoft - Sr. Director Systems Engineering

Microsoft, Taiwan (On-Site)

Luxoft - Lead Software Solution Architect

Luxoft, United States (Remote)

Milliman - Senior Software Developer

Milliman, India (On-Site)

Microsoft - Technical Program Manager (Taipei)

Microsoft, Taiwan (On-Site)

Microsoft - Product Manager II - Azure Networking

Microsoft, Ireland (On-Site)

Cyncly - QA Test Analyst (C# Desktop)

Cyncly, India (Hybrid)

CloudHire - Microsoft /Inquoto Sales Specialist

CloudHire, United States (On-Site)

Get notifed when new similar jobs are uploaded

Jobs in Redmond, Washington, United States

Autodesk - Principal Software Engineer, Salesforce

Autodesk, United States (On-Site)

Sandbox VR - Shift Lead (Key Holder)

Sandbox VR, United States (On-Site)

Paypal - Senior Data Scientist

Paypal, United States (Hybrid)

Intel Corporation - Business HR Partner (NM DMO)

Intel Corporation, United States (Hybrid)

Kaedim - Machine Learning Engineer

Kaedim, United States (On-Site)

Ello - Design Engineer (Mobile)

Ello, United States (On-Site)

Spell Brush - Software Engineer

Spell Brush, United States (On-Site)

Get notifed when new similar jobs are uploaded

DevOps Jobs

Egnyte - Database Administrator

Egnyte, India (Remote)

Luxoft - Zscaler Engineer

Luxoft, India (Remote)

Info Stretch - Java Developer

Info Stretch, United Kingdom (On-Site)

Egnyte - Sr. Software Engineer

Egnyte, United States (Hybrid)

Electronic Arts - System Engineer

Electronic Arts, India (On-Site)

Rackspace Technology - Presales Data Architect – AWS - Sydney (Onsite)

Rackspace Technology, Australia (On-Site)

SuperPlay - DEVOPS ENGINEER

SuperPlay, Israel (On-Site)

Get notifed when new similar jobs are uploaded