Senior Site Reliability Engineering Manager

1 Month ago • 6-7 Years • Network Engineering • $117,200 PA - $250,200 PA

Job Summary

Job Description

The Senior Site Reliability Engineering Manager at Azure Storage will lead a team optimizing fleet availability and health for one of the world's largest storage services. Responsibilities include designing, developing, and improving automation and uptime; investigating complex issues at scale; and planning solutions to maximize efficiency. This role requires strong leadership in Agile/SCRUM, incident response, and cross-team collaboration. Significant impact on cost reduction and high-level visibility are key aspects. The position involves developing, testing, and implementing code changes for scalability, troubleshooting hardware and system issues, and understanding long-term organizational goals. The role includes on-call rotations and post-mortem reporting.
Must have:
  • 6+ years experience in relevant field
  • 4+ years in Agile/SCRUM leadership
  • Expertise in distributed systems
  • Problem-solving and investigation skills
  • Develop, test, and implement code changes
  • Incident response and post-mortem reporting
Good to have:
  • Understanding of server architecture
  • Familiarity with server components, firmware, BIOS
  • Understanding management techniques and scope control
Perks:
  • Industry-leading healthcare
  • Educational resources
  • Product and service discounts
  • Savings and investments
  • Maternity and paternity leave
  • Generous time away
  • Giving programs
  • Networking opportunities

Job Details

Overview

Are you passionate about hardware and enabling new technology? Do you enjoy complex problem solving and investigation? Azure has one of the largest storage services on the planet, holding Exabytes of data and files not just for our 3rd party customers, but also many of Microsoft’s own services. This role will focus on managing an ever growing and changing fleet at scale to maximize efficiency while providing a stable environment for our customers.  

As a Senior Site Reliability Engineering Manager in Azure Storage team you will be working with a team of engineers focused on optimizing fleet availability and health. Leading a team of engineers to design, develop and improve automation and uptimeYou will take lead of planning, investigating complex issues and designing solutions to solve problems at scale. 

This opportunity will allow you to deepen your knowledge and experience with massive distributed systems. Opportunities to have significant impact on reducing cost to the business. Exposure and visibility at VP and CVP levels.  This position is located in Redmond and has a flexible work environment that supports working from home. 

Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. 

Qualifications

Required Qualifications:

  • 6+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration.
  • 4+ years of Agile / SCRUM planning, and leading large cross team efforts.

 

Other Requirements:

  • Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: 
    • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.

 

Preferred Qualifications:

  • 7+ years technical experience in software engineering, network engineering, or systems administration
    • OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration
    • OR Master's Degree in Computer Science, Information Technology, or related field AND 3+ years technical experience in software engineering, network engineering,
  • Understanding of server architecture and the ability to debug and trouble shoot isues impacting the fleet.
  • Understadning of server componants, Firmware, BIOS and how they interact. 
  • Understanding management techinques, and methods for ensuring scope control.
  • Familiarity with distributed systems. 

 

Site Reliability Engineering M4 - The typical base pay range for this role across the U.S. is USD $117,200 - $229,200 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $153,600 - $250,200 per year.

Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here:


Microsoft will accept applications for the role until September 9, 2024.

 

 

#azurecorejobs

Responsibilities

  • Develop, test, and implement changes to optimize code and improve scalability. You leverage end-to-end technical expertise and telemetry analysis to identify patterns and opportunities to implement configuration and automation improvments. You review the effect of changes to documents and share development insights within your team.  
  • You drive Sprint planning, SCRUM stand ups, code/design reviews, and host regular cross team / org meetings. 
  • Investigate hardware and system issues that are impacting available capacity and impacting customers. 
  • Understand the long term goals of the organization and understand the steps your team will have to take to achieve those. 
  • You respond to incidents during regular on-call rotations and share details related to incidents and their resolution through post-mortem reports and regular review meetings. As a member of the team you willl be expected to help drive bridges for recovery durring major outages. 
  • Embody our  and   
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Similar Jobs

Axon - Business Systems Architect – Operations (D365 FO)

Axon

Scottsdale, Arizona, United States (Hybrid)
2 Months ago
DEVOTEAM - Ingénieur Microsoft 365 H/F

DEVOTEAM

(Remote)
3 Months ago
Riot Games - Principal Software Engineer - VALORANT, Foundations, Build Platforms

Riot Games

Los Angeles, California, United States (On-Site)
4 Months ago
Saviynt - Senior Engineer, Field Engineering

Saviynt

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
ARHS - Cloud Engineer / Security and Compliance Specialist

ARHS

Brussels, Brussels, Belgium (Remote)
2 Months ago
Activision - Lead Network Programmer

Activision

Malmö, Skåne County, Sweden (Hybrid)
1 Month ago
Microsoft - Senior Software Engineer

Microsoft

Dublin, County Dublin, Ireland (On-Site)
1 Month ago
The Walt Disney Company - Senior Network Operations Engineer

The Walt Disney Company

Bristol, Connecticut, United States (On-Site)
2 Months ago
ByteDance - Site Reliability Engineer - Game

ByteDance

Singapore (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Luxoft - Senior (SDE3) DevOps Engineer

Luxoft

Bengaluru, Karnataka, India (On-Site)
2 Months ago
Microsoft - Software Engineer

Microsoft

Bucharest, Bucharest, Romania (On-Site)
1 Month ago
Quantzig - Quality Assurance Tester

Quantzig

India (Remote)
4 Months ago
ZeniMax Media - Programmeur.se backend / Backend Programmer

ZeniMax Media

Montreal, Quebec, Canada (On-Site)
5 Months ago
Qatar Airways - DevOps Engineer

Qatar Airways

Ahmedabad, Gujarat, India (On-Site)
4 Months ago
Crimson Games LLC - Unity Game Programmer

Crimson Games LLC

India (Remote)
4 Months ago
Next Level Business Services - .NET Developer

Next Level Business Services

Minneapolis, Minnesota, United States (On-Site)
3 Months ago
InMobiInMobi - Infrastructure Engineer - III

InMobiInMobi

Bengaluru, Karnataka, India (On-Site)
4 Months ago
GoTo Group - Site Reliability Engineer - EP (SE4)

GoTo Group

Gurugram, Haryana, India (On-Site)
3 Months ago
CloudHire - Sitecore Architect/Developer

CloudHire

Mumbai, Maharashtra, India (Remote)
3 Months ago

Get notifed when new similar jobs are uploaded

Jobs in Redmond, Washington, United States

Interactive Brokers - Model Governance Senior Analyst

Interactive Brokers

Chicago, Illinois, United States (Hybrid)
4 Months ago
Backbone - Lead Product Design Engineer

Backbone

Atherton, California, United States (On-Site)
7 Months ago
Next Level Business Services - SAP WM (Full Time)

Next Level Business Services

Naples, Florida, United States (On-Site)
3 Months ago
Netflix - Production Ops Manager, Marketing Production - UCAN

Netflix

Los Angeles, California, United States (On-Site)
3 Months ago
The Walt Disney Company - Director, RISE Content Advisement

The Walt Disney Company

Burbank, California, United States (On-Site)
2 Months ago
HP - Future Customer Experience Mechanical/Thermal Engineer

HP

Fort Collins, Colorado, United States (Hybrid)
3 Months ago
Twitch - Product Manager - Community

Twitch

New York, New York, United States (On-Site)
2 Months ago
Onward Search - B2B Sales Associate

Onward Search

Richmond, Virginia, United States (On-Site)
3 Months ago
The Walt Disney Company - Systems Engineer II Broadcast

The Walt Disney Company

New York, New York, United States (On-Site)
1 Month ago
WebFX - Web Developer Internship

WebFX

Harrisburg, Pennsylvania, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

Network Engineering Jobs

Google - Technical Program Manager, Global Network Infrastructure Engineering and Delivery

Google

Dublin, County Dublin, Ireland (On-Site)
1 Month ago
ByteDance - Software Developer, Routing & Emulation Graduate - 2024 Start (PhD)

ByteDance

Seattle, Washington, United States (On-Site)
3 Months ago
Meta - Network Production Engineer, Network Infrastructure

Meta

Bellevue, Washington, United States (On-Site)
2 Months ago
PearlAbyss - Network Engineer

PearlAbyss

(On-Site)
1 Month ago
Forescout Technologies  Inc  - Jr. QA Engineer

Forescout Technologies Inc

Texas, United States (Hybrid)
2 Months ago
ByteDance - Senior Software Engineer, Multi Cloud CDN - San Jose / Seattle / Boston

ByteDance

Boston, Massachusetts, United States (On-Site)
1 Month ago
Extreme Network - Sr. Systems Engineer LA/MS

Extreme Network

Louisiana, United States (Remote)
4 Months ago
Meta - Software Engineer - Datacenter networking

Meta

Bellevue, Washington, United States (On-Site)
3 Months ago

Get notifed when new similar jobs are uploaded

About The Company

Microsoft is a tech giant that develops, licenses, and supports a range of software products, services, and devices.

Mountain View, California, United States (On-Site)

London, England, United Kingdom (On-Site)

London, England, United Kingdom (Hybrid)

London, England, United Kingdom (On-Site)

Jakarta, Jakarta, Indonesia (On-Site)

Prague, Prague, Czechia (On-Site)

Montreal, Quebec, Canada (On-Site)

Dublin, County Dublin, Ireland (On-Site)

Hyderabad, Telangana, India (On-Site)

View All Jobs

Get notified when new jobs are added by Microsoft

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug