Community

Home >

Jobs >

Data Reliability Engineer

Data Reliability Engineer

4 Months ago • Upto 10 Years • Undisclosed

About the job

10 skills required for this role

Add these skills to join the top 1% applicants for this job

redis

elk

grafana

elasticsearch

terraform

linux

rabbitmq

ansible

prometheus

scalability

Job Description

Bungie seeks a Data Reliability Engineer to design, deploy, and maintain highly available data infrastructure, including Kafka, RabbitMQ, Redis, Elasticsearch, and Graphite. You'll troubleshoot issues, ensure data security, and collaborate with engineering teams on projects and services. Must have experience with Linux, infrastructure automation, and distributed production environments.

Must have:

Linux Administration
Infrastructure Automation
Distributed Environments
Troubleshooting Skills

Good to have:

Capacity Planning
Data Security
Time-Series Monitoring
Data Observability

Perks:

Hybrid Work
Bungie-Approved Remote

Data Reliability Engineering at Bungie is a core team of the Central Tech area that keeps our games and tooling running at scale. Our team owns the overall scalability, observability and resilience of the databases, data processing platforms and in-memory key-value stores used throughout the Bungie ecosystem. We partner with our engineering teams and business units on projects, services, designs, and processes We are the stewards of architecture and provide tools and services to enable engineering teams to meet their design requirements.

RESPONSIBILITIES

Design, deploy, and maintain highly available and scalable data infrastructure components including Kafka, RabbitMQ, Redis, Elasticsearch, and Graphite
Perform capacity planning and scalability assessments for data platforms
Troubleshoot and resolve issues related to data processing pipelines, message queuing, and performance including participation in on-call rotation
Ensure data security, integrity, and compliance with industry best practices and regulatory requirements
Document system configurations, procedures, and operational knowledge
Advise service owners on industry and company standards and best practices
Maintain reliability and performance levels for core data platform infrastructure
Data observability strategy and implementation
Data ownership strategy and documentation

REQUIRED SKILLS

Strong understanding of Linux operating systems and their administration
Effective communication skills and ability to collaborate effectively in a team environment
Experience with infrastructure automation and configuration management (e.g., Ansible, Terraform…)
Excellent troubleshooting skills and the ability to analyze and resolve complex infrastructure resource and application deployment issues
Experience working in a distributed production environment
Deep understanding of cluster management areas, such as scaling, consistency tuning, replication, and multi-datacenter configuration
Familiarity with time-series monitoring systems & tools (e.g., Datadog, Prometheus, Grafana and ELK)
Experience designing and implementing logging and metric pipelines

View Full Job Description

Upload your resume, increase your shortlisting chances by 80%