Senior Site Reliability Engineer

4 Weeks ago • 5-10 Years
Sign up and Unlock PRO benefits for FREE!

About the job

SummaryBy Outscal

Join our SRE team and design, implement, and maintain highly scalable and reliable systems in a cloud-native environment. Must have experience with AWS, Kubernetes, and observability tools like Prometheus and Grafana. Experience with automation tools like Terraform and Jenkins is essential.

About the job

Why Lytx:

Join our dynamic and passionate team of driven, low-ego engineers who are at the forefront of designing and supporting cutting-edge IoT infrastructure. As we rapidly grow and transition to the cloud, we're diving into the exciting realms of "Operations as Code," "Infrastructure as Code," and innovative infrastructure automation.

Our Site Reliability Engineering (SRE) team is pivotal in ensuring the availability, reliability, observability, and resilience of Lytx’ services, both on-premises and in the cloud. We're not just keeping the lights on—we're engineering the future of our business's continuity.

If you're energized by crafting transformative solutions and excel at designing robust, detailed cloud infrastructure with a focus on continuous improvement, this could be the perfect role for you!

Responsibilities:

  • System Design and Architecture: Design, implement, and maintain scalable and reliable systems, ensuring they can handle both current and future demands.
  • Incident Management: Lead incident response efforts, diagnose root causes, and implement long-term solutions to prevent recurrence. Ensure effective communication during outages.
  • Monitoring and Observability: Develop and maintain comprehensive monitoring and alerting systems to proactively identify and address issues before they impact users.
  • Automation and Efficiency: Automate repetitive tasks and processes to improve operational efficiency and reduce manual intervention.
  • Performance Tuning: Continuously optimize system performance, including fine-tuning applications, databases, and infrastructure to meet service level objectives (SLOs).
  • Capacity Planning: Forecast future system requirements based on growth trends and current usage, and plan capacity upgrades to ensure system reliability.
  • Collaboration and Mentoring: Work closely with development teams to integrate reliability into the software development lifecycle. Mentor junior SREs and share best practices.
  • Documentation and Knowledge Sharing: Create and maintain detailed documentation on system design, incident response procedures, and operational practices to ensure knowledge is preserved and accessible.

Requirements:

  • 5+ years of experience as an SRE within AWS environments at medium to large-scale organizations.
  • 3+ years of hands-on experience implementing and managing observability tools, such as Prometheus, New Relic, Grafana, or similar.
  • Advanced programming skills in Python, Groovy, and Bash.
  • Strong understanding of database technologies, including both SQL and NoSQL systems.
  • 3+ years of experience developing and managing infrastructure deployment pipelines using Git, Terraform, Helm, Jenkins/Jenkins X/ArgoCD, or similar tools.
  • Proven expertise in designing, evaluating, and supporting production environments in AWS, including VPCs, EKS, IAM, AMI, EC2, CloudWatch, CloudTrail, Control Tower, GuardDuty, MSK, S3, Glacier, Gateways, Direct Connect, Route 53, RDS, ALBs, Autoscaling, and more.
  • Hands-on experience with Linux systems and protocols and technologies such as HTTP, REST, TCP/IP, SSL, DNS, SMTP, SSH, NTP, Load Balancing, SQL/NoSQL, Message Brokers, Nginx, Vault, etc.
  • Extensive experience with Kubernetes and various container and cloud-native technologies.
  • Significant experience in managing 24/7 on-call rotations, creating runbooks, establishing support procedures, and proactively monitoring systems across multiple geographic locations.
  • Ability to thrive under pressure and excel in a technically challenging environment.

About The Company

Karnataka, India (Hybrid)

Karnataka, India (Hybrid)

Karnataka, India (Hybrid)

Karnataka, India (Hybrid)

View All Jobs

Similar Jobs

PlayStation Global - Site Reliability Engineer Intern - Undergraduate

California, United States (Hybrid)

PlayStation Global - Senior Service Reliability Engineer

Berlin, Germany (On-Site)

PlayStation Global - Staff Service Reliability Engineer

Berlin, Germany (On-Site)

PlayStation Global - Senior Service Reliability Engineer

Berlin, Germany (On-Site)

PlayStation Global - Staff Service Reliability Engineer

Berlin, Germany (On-Site)

Vimeo - Sr. Site Reliability Engineer

New York, United States (Remote)

Lifechurch - Senior Site Reliability Engineer

Oklahoma, United States (On-Site)

Lifechurch - Senior Site Reliability Engineer

Oklahoma, United States (On-Site)

HHA Exchange - Senior Site Reliability Engineer

New York, United States (Remote)

Similar Skill Jobs

Fortis Games - Director of Product - Game Tools

United States (Remote)

Fortis Games - Trust & Safety Specialist

United Kingdom (Remote)

Fortis Games - Trust & Safety Specialist

Hungary (Remote)

Fortis Games - Trust & Safety Specialist

Romania (Remote)

Starkflow - Equity Research Intern

Haryana, India (On-Site)

Blind Squirrel Games - Senior Graphics Engineer

California, United States (Hybrid)

Electronic Arts - XD/ID Technical Artist - EA SPORTS™ NHL

British Columbia, Canada (On-Site)

Jobs in Bengaluru, Karnataka, India

Skillz - Senior Technical Recruiter

Karnataka, India (On-Site)

Starkflow - Equity Research Intern

Haryana, India (On-Site)

Electronic Arts - UI Architect

Telangana, India (On-Site)

Starkflow - Executive Assistant

Haryana, India (On-Site)

Starkflow - HR Manager

Karnataka, India (On-Site)

Scanline VFX - Bidding Manager

Maharashtra, India (Hybrid)

Dream Game Studios - Senior Character Animator

Maharashtra, India (On-Site)

Software Engineering Jobs

Fortis Games - Director of Product - Game Tools

United States (Remote)

Blind Squirrel Games - Senior Graphics Engineer

California, United States (Hybrid)

Evolution - Video Specialist

Bucharest, Romania (On-Site)

Electronic Arts - UI Architect

Telangana, India (On-Site)

Electronic Arts - AI Engineer II

California, United States (On-Site)

Electronic Arts - Full Stack Engineer

Texas, United States (On-Site)

Level Up Your Career in Game Development!

Transform Your Passion into Profession with Our Comprehensive Courses for Aspiring Game Developers.

Job Common Plug