Site Reliability Engineer | Core Platform

21 Minutes ago • All levels • DevOps

Job Summary

Job Description

As a Site Reliability Engineer (SRE) on the Observability team at King, you'll engineer and manage monitoring and observability environments for their vast gaming platform, processing over 100 billion events daily. Responsibilities include building and maintaining the metrics, logs, and tracing stack; creating developer-focused monitoring tools and dashboards; driving automation for alerting and anomaly detection; working on scalable, high-throughput systems; and improving incident response workflows. You'll collaborate closely with developers to create self-service solutions and ensure platform reliability and efficiency. Experience with observability tools (Prometheus, Loki, OpenTelemetry), distributed systems, and scripting languages (Python, Go) is crucial.
Must have:
  • Strong software development background
  • Experience with observability tools
  • Understanding of distributed systems
  • Collaboration with developers
  • Kubernetes and cloud familiarity
  • Linux performance debugging
  • Problem-solving skills
Perks:
  • Autonomy & Ownership
  • Collaboration
  • Continuous Learning
  • Impactful work

Job Details

Craft:

Technology & Development

Job Description:

At King millions of players connect to our games every day and expect to continue playing from where they left off. All this user and game progression data is stored in our infrastructure. We are looking to find someone eager to help us engineer and manage the monitoring and observability environments at the heart of this ecosystem.
We believe that you share our passion for learning new things, coding, quality, automation, continuous improvements, and actively building and upholding a great culture. Above all, we would like to see that you have a genuine interest in high performance observability.
Your role within our Kingdom
We are looking for a Site Reliability Engineer (SRE) with a strong development background to join our Observability team and help shape the future of how we monitor, debug, and optimize our platform, services, and applications at scale.. Our mission is to empower developers with the right tools and insights to keep our services running smoothly, efficiently, and reliably.

As part of the Observability team, you will build and maintain our monitoring, logging, and tracing platform, working closely with developers to create self-service solutions that enhance performance, reliability, and troubleshooting capabilities. Our platform processes over 100 billion events per day, requiring innovative approaches to scalability, automation, and efficiency.

We care deeply about our culture and believe in:
● An inclusive and diverse workplace
● Continuous improvement of everything we do
● Automation and coding as much as possible
● Collaboration and blame-free respectful problem solving
● Asking for help and sharing ideas openly
What you will work on:
Observability Platform – Engineer and operate our metrics, logs, and tracing stack, ensuring it scales reliably across the organization.

Developer-Focused Monitoring – Build APIs, tools, and dashboards that give teams insight into their services with minimal friction.

Automation & Self-Service – Drive automation efforts for alerting, event correlation, and proactive anomaly detection.

Scalable Infrastructure – Work on distributed monitoring systems that handle high-throughput data ingestion and querying.

Incident Response & Troubleshooting – Improve on-call workflows, alerting systems, and root cause analysis processes.

Our Observability Stack

We use a combination of open-source and cloud-native technologies, including:

Metrics & Tracing: OpenTelemetry, Prometheus, Mimir, InfluxDB

Log Management: Loki, Elasticsearch

Alerting & Incident Response: Grafana OnCall, PagerDuty, Alertmanager

Infrastructure as Code: Terraform, Ansible

Automation & Scripting: Python, Go, Bash

Skills to create thrills
Strong software development background – Comfortable writing production-quality Python, Java, Go, or similar languages.

Experience with observability tools (Prometheus, Loki, OpenTelemetry, etc.).

Deep understanding of distributed systems monitoring and incident response.

Ability to collaborate with developers and drive best practices for instrumenting services.

Familiarity with Kubernetes and cloud environments (GCP/AWS/Azure).

Solid knowledge of Linux performance debugging and network troubleshooting.

Strong problem-solving skills and a proactive mindset for improving reliability.

Excellent communication skills in English (both written and spoken).

Why Join Us?

We believe in:
Autonomy & Ownership – We enable developers to self-serve monitoring solutions and own their observability needs.
Collaboration – We work closely across teams to improve reliability and troubleshoot challenges blame-free.
Continuous Learning – We experiment, iterate, and improve everything we do.
Impact – Your work will directly affect how all our games and platforms operate at scale.

We think that you are a curious, humble, driven, collaborative, and responsible person who loves to work with infrastructure as code.

About King

With a mission of Making the World Playful, King is a leading interactive entertainment company with more than 20 years of history of delivering some of the world’s most iconic games in the mobile gaming industry, including the world-famous Candy Crush franchise, as well as other mobile game hits such as Farm Heroes Saga. King games are played by more than 200 million monthly active users. King, part of Microsoft (NASDAQ: MSFT), has Kingsters in Stockholm, Malmö, London, Barcelona, Berlin, Dublin, San Francisco, New York, Los Angeles and Malta. More information can be found at  or by following us on ,  on Instagram, or  on X.

Similar Jobs

Mattel  Inc  - Associate Graphic Designer – Brand Design

Mattel Inc

Montreal, Quebec, Canada (On-Site)
4 Days ago
Activision - Expert Cinematics Engineer

Activision

Manchester, England, United Kingdom (On-Site)
1 Month ago
Pragma - Professional Services Engineer - Customer Operations

Pragma

Canada (Remote)
2 Months ago
Axinous - Staff Network Engineer

Axinous

Netherlands (Remote)
1 Week ago
McCain Foods - Cloud Engineer

McCain Foods

New Delhi, Delhi, India (Hybrid)
5 Months ago
BigID - DevOps Engineer

BigID

Tel Aviv-Yafo, Tel Aviv District, Israel (On-Site)
2 Months ago
Saviynt - Sr. Principal Software Engineer - Privileged Access Management (PAM)

Saviynt

El Segundo, California, United States (Hybrid)
3 Months ago
ION - Lead Python Engineer, New York

ION

New York, New York, United States (Hybrid)
4 Months ago
Toppan Merrill - Site Reliability Engineer

Toppan Merrill

Chennai, Tamil Nadu, India (On-Site)
4 Months ago

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

ByteDance - Network Engineer, Optical Long-Haul and Submarine

ByteDance

Seattle, Washington, United States (On-Site)
5 Days ago
Light Speed Studios - Senior Level Designer

Light Speed Studios

Los Angeles, California, United States (On-Site)
1 Week ago
Tesla - Parts Operations Lead - West & South Europe

Tesla

North Holland, Netherlands (On-Site)
4 Days ago
Inspired Entertainment - Field Service Engineer

Inspired Entertainment

Falmouth, England, United Kingdom (On-Site)
3 Weeks ago
Behaviour Interactive - Senior Development Tester - Quality Assurance

Behaviour Interactive

Montreal, Quebec, Canada (Hybrid)
2 Weeks ago
The Walt Disney Company - Senior Manager, Marvel, Star Wars, Brand Commercialization

The Walt Disney Company

Minato City, Tokyo, Japan (On-Site)
2 Weeks ago
Saviynt - Lead Technical Writer

Saviynt

Bengaluru, Karnataka, India (Hybrid)
3 Months ago
Canva - Growth Marketing Strategist, Print Lead

Canva

Sydney, New South Wales, Australia (Remote)
1 Week ago
Focus Entertainment - Media Buyer

Focus Entertainment

Île-de-France, France (Hybrid)
4 Days ago

Get notifed when new similar jobs are uploaded

Jobs in Stockholm, Stockholm County, Sweden

Starkflow - Senior Systems Developer

Starkflow

Gothenburg, Västra Götaland County, Sweden (On-Site)
2 Weeks ago
Truecaller - Data Architect

Truecaller

Stockholm, Stockholm County, Sweden (On-Site)
3 Months ago
Tesla - Enterprise Business Lead - Stockholm

Tesla

Stockholm, Stockholm County, Sweden (On-Site)
4 Days ago
Arrowhead Game Studios - Tech Manager

Arrowhead Game Studios

Stockholm, Stockholm County, Sweden (Hybrid)
1 Month ago
Avalanche Studios Group - Lead Character Artist

Avalanche Studios Group

Stockholm, Stockholm County, Sweden (Hybrid)
1 Month ago
King - Senior Engineer C++/Mobile SDK

King

Stockholm, Stockholm County, Sweden (On-Site)
2 Weeks ago
Lurkit - Software Engineer

Lurkit

Linköping, Östergötland County, Sweden (On-Site)
1 Month ago
King - Senior Game Designer

King

Stockholm, Stockholm County, Sweden (On-Site)
22 Minutes ago
King - Director, AI Strategy & Partnerships

King

Stockholm, Stockholm County, Sweden (On-Site)
1 Week ago
Ubisoft - Senior C++ Programmer

Ubisoft

Malmö, Skåne County, Sweden (Hybrid)
4 Days ago

Get notifed when new similar jobs are uploaded

DevOps Jobs

Sinch - Data Platform Engineer

Sinch

Stockholm, Stockholm County, Sweden (Hybrid)
3 Months ago
Gaming Innovation Group  - DevOps Engineer

Gaming Innovation Group

St. Julian's, Malta (Hybrid)
1 Month ago
Info Stretch - Senior Engineer

Info Stretch

Mumbai, Maharashtra, India (On-Site)
3 Months ago
Intrepid Studios,  Inc  - Associate Software Engineer

Intrepid Studios, Inc

(Remote)
3 Days ago
Interactive Brokers - Senior Systems Engineer- Microsoft M365/Active Directory

Interactive Brokers

Chicago, Illinois, United States (Hybrid)
4 Months ago
Playdead - DevOps Engineer

Playdead

Copenhagen, Denmark (On-Site)
4 Months ago
Rockstar Games - Senior DevOps Engineer

Rockstar Games

Edinburgh, Scotland, United Kingdom (On-Site)
5 Months ago
Tencent - Tencent Cloud Senior Solution Architect (Cloud Migration) - Thailand

Tencent

Bangkok, Bangkok, Thailand (On-Site)
5 Months ago
The Walt Disney Company - Manager, Systems Reliability Engineering

The Walt Disney Company

Burbank, California, United States (On-Site)
1 Month ago
Google - Senior Software Engineer, Site Reliability Engineering, Google Cloud

Google

London, England, United Kingdom (On-Site)
1 Month ago

Get notifed when new similar jobs are uploaded

About The Company

At King, we’re Making the World Playful. Heard of Candy Crush? We’re the creators behind it and loads of other sweet games like Farm Heroes.

Stockholm County, Sweden (On-Site)

London, England, United Kingdom (On-Site)

Stockholm, Stockholm County, Sweden (On-Site)

Stockholm, Stockholm County, Sweden (On-Site)

Barcelona, Catalonia, Spain (On-Site)

Barcelona, Catalonia, Spain (On-Site)

London, England, United Kingdom (Hybrid)

Barcelona, Catalonia, Spain (On-Site)

Stockholm, Stockholm County, Sweden (On-Site)

London, England, United Kingdom (On-Site)

View All Jobs

Get notified when new jobs are added by King

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug