Research Intern - LLM Inference Acceleration and Optimization

1 Month ago • Upto 1 Years • $78,600 PA - $154,560 PA

Job Summary

Job Description

This Research Internship at Microsoft's AIFX team focuses on accelerating and optimizing Large Language Model (LLM) inference. Interns will investigate and implement cutting-edge techniques like quantized KV-caches, flash/paged/radix attention, speculative decoding, and advanced collective communication on GPUs. The work involves leveraging state-of-the-art approaches like "You only cache once (YOCO)" to improve LLM serving efficiency at scale. The internship includes exploring, implementing, optimizing, and potentially publishing research findings related to real-world production workloads. Collaboration with Microsoft teams and contributions to open-source projects like vLLM, SGLang, and HuggingFace are key aspects of this role.
Must have:
  • PhD in CS or related field
  • 6+ months LLM training/inference experience
  • Experience with LLMs like Llama and Phi
  • Ability to convert research ideas into code
Good to have:
  • Experience with large-scale GPU communication
  • AI framework benchmarking experience (Pytorch, vLLM, SGLang)
  • Proficient interpersonal skills
  • Open to fast iteration and ambitious ideas
Perks:
  • Industry leading healthcare
  • Educational resources
  • Product and service discounts
  • Savings and investments
  • Maternity/paternity leave
  • Generous time away
  • Giving programs
  • Networking opportunities

Job Details

Overview

Research Internships at Microsoft provide a dynamic environment for research careers with a network of world-class research labs led by globally-recognized scientists and engineers, who pursue innovation in a range of scientific and technical disciplines to help solve complex challenges in diverse fields, including computing, healthcare, economics, and the environment.

If you are excited about investigating and implementing cutting-edge large language model (LLM) inference techniques and optimizations like quantized KV-caches, flash/paged/radix attention, speculative decoding, and advanced collective communication on graphics processing units (GPUs), come join the AIFX team at Microsoft Azure and contribute to a production-focused, planetary-scale LLM serving stack that is being built on top of excellent open-source efforts like vLLM, SGLang, and HuggingFace. The work includes investigation of cutting-edge, state-of-the-art approaches like "You only cache once (YOCO)" and leveraging them to save memory and compute for serving LLMs at scale. You will get a chance to explore, implement, optimize, and publish your research ideas in collaboration with teams at Microsoft working on real-world production workloads at an unprecedented scale.

Qualifications

Required Qualifications

  • Accepted or currently enrolled in a PhD program in Computer Science or related STEM field.
  • At least 6 months of experience with training and/or inference of recent LLMs like Llama and Phi.

Other Requirements

  • Research Interns are expected to be physically located in their manager’s Microsoft worksite location for the duration of their internship.
  • In addition to the qualifications below, you’ll need to submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples. After you submit your application, a request for letters may be sent to your list of references on your behalf. Note that reference letters cannot be requested until after you have submitted your application, and furthermore, that they might not be automatically requested for all candidates. You may wish to alert your letter writers in advance, so they will be ready to submit your letter. 

Preferred Qualifications

  • Experience with large-scale collective communication on GPUs.
  • Experience with performance benchmarking of AI frameworks like Pytorch, vLLM, and/or SGLang.
  • Ability to convert research ideas into working code that runs and scales on real systems.
  • Proficient interpersonal skills and growth mindset.
  • Open to failing fast in pursuit of ambitious ideas.

The base pay range for this internship is USD $6,550 - $12,880 per month. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $8,480 - $13,920 per month.

 

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: 

Microsoft accepts applications and processes offers for these roles on an ongoing basis.

  •  

Responsibilities

Research Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world’s best researchers, Research Interns learn, collaborate, and network for life. Research Interns not only advance their own careers, but they also contribute to exciting research and development strides. During the 12-week internship, Research Interns are paired with mentors and expected to collaborate with other Research Interns and researchers, present findings, and contribute to the vibrant life of the community. Research internships are available in all areas of research, and are offered year-round, though they typically begin in the summer.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Similar Jobs

Paypal - Sr. Software Engineer, Data Governance

Paypal

San Jose, California, United States (Hybrid)
4 Months ago
Gaming Innovation Group  - DevOps Engineer

Gaming Innovation Group

St. Julian's, Malta (Hybrid)
1 Month ago
Demonware - Cloud Engineering Co-op

Demonware

Vancouver, British Columbia, Canada (Hybrid)
2 Weeks ago
PwC - Associate - IFS - IT Infrastructure

PwC

Jakarta, Jakarta, Indonesia (On-Site)
2 Months ago
PwC - IN_Senior Associate _Cloud Security Expert_Advisory Corporate_Advisory_Kolkata

PwC

Kolkata, West Bengal, India (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Unity - Développeur de logiciels Staff | Staff Software Developer

Unity

Montreal, Quebec, Canada (On-Site)
2 Months ago
Columbia Sportswear Company - Azure Cloud Developer/Engineer

Columbia Sportswear Company

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
Illumina - Staff Data Engineer

Illumina

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
AAG APP - Senior Developer (Android Gaming)

AAG APP

Lucknow, Uttar Pradesh, India (Hybrid)
5 Months ago
Gunjan App Studios - Full Stack Developer

Gunjan App Studios

Kolkata, West Bengal, India (On-Site)
2 Months ago
Glean - Solutions Architect - Central

Glean

(Remote)
2 Months ago
Microsoft - Software Engineering

Microsoft

Hyderabad, Telangana, India (On-Site)
1 Month ago
PwC - ETIC, OCI Technical Support Engineer - Associate

PwC

Cairo, Cairo Governorate, Egypt (On-Site)
2 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Redmond, Washington, United States

The Walt Disney Company - Senior Software Engineer - Audience Targeting

The Walt Disney Company

Seattle, Washington, United States (On-Site)
3 Weeks ago
Onward Search - Desktop Technician

Onward Search

San Francisco, California, United States (On-Site)
2 Weeks ago
YOOM - Motion Capture Production Specialist

YOOM

Los Angeles, California, United States (On-Site)
6 Months ago
Next Level Business Services - IBM Initiate Developer

Next Level Business Services

Richardson, Texas, United States (On-Site)
3 Months ago
Luxoft - Android Framework Developer

Luxoft

Poland, Ohio, United States (Remote)
1 Month ago
Meta - Technical Game Designer

Meta

Los Angeles, California, United States (On-Site)
8 Months ago
Wind River Systems - Star Lab - Principal Technologist - Embedded Security Professional Services

Wind River Systems

San Antonio, Texas, United States (On-Site)
3 Months ago
The Walt Disney Company - Sr Software Engineer, iOS

The Walt Disney Company

New York, New York, United States (On-Site)
3 Months ago
The Walt Disney Company - Security Officer

The Walt Disney Company

New York, New York, United States (On-Site)
1 Week ago
ByteDance - SOC Prototype FW Engineer- Pico - San Jose

ByteDance

San Jose, California, United States (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

Similar Category Jobs

Looks like we're out of matches

Set up an alert and we'll send you similar jobs the moment they appear!

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

London, England, United Kingdom (On-Site)

London, England, United Kingdom (Hybrid)

London, England, United Kingdom (On-Site)

Jakarta, Jakarta, Indonesia (On-Site)

Gurugram, Haryana, India (On-Site)

Prague, Prague, Czechia (On-Site)

Montreal, Quebec, Canada (On-Site)

Dublin, County Dublin, Ireland (On-Site)

Hyderabad, Telangana, India (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug