Senior Site Reliability Engineer

4 Months ago • 5 Years + • DevOps • Undisclosed

About the job

Job Description

Senior Site Reliability Engineer with 5+ years of experience in Cloud and on-prem SRE design and implementation. Must have expertise in infrastructure automation, distributed systems, and cloud platforms like AWS, Azure, GCP. Strong knowledge of monitoring, logging, and configuration management is essential.
Must have:
  • Infrastructure Automation
  • Distributed Systems
  • Cloud Platforms
  • Monitoring Concepts
Good to have:
  • Containerization Tech
  • Network Experience
  • Elastic Search
  • Prometheus
Perks:
  • Global IT Team
  • Fast-Paced Environment

Responsibilities:

About Tencent Overseas IT:
Tencent Overseas IT has the mission to empower Tencent’s rapid global growth with future-ready, global IT platforms, applications, and services. We are chartered to lead the Overseas IT strategy, architecture, roadmap, and execution. Satisfying our internal/external customers and becoming a world-class global IT team are our top aspirations.


We are seeking a Sr. Site Reliability Engineer with extensive cloud and on-prem SRE design and implementation experience.

Duties and Responsibilities:
This senior role will closely work with our internal IT and cloud providers to design the best global SRE architecture and solution in the cloud. This role will also support the studio’s infrastructure, game publishing infrastructure and its evolution to the cloud. Our customers include internal or acquired gaming studios, game publishing services, innovative offices/workplaces, various business groups, and external customers. The work scope will include understanding the internal customers’ business requirements, collecting the technical requirements, developing reference architecture and prototypes based on leading industry best practices, leading implementation, and deployment for global locations, as well as issue troubleshooting when necessary.

For this SRE job, you will:
• Design, implement, and support operational and reliability of large-scale Cloud-enabled studio with a focus on performance at scale, real-time monitoring, logging ,analyzing and alerting
• Maintain services once they go live by measuring and monitoring availability, latency, and overall system health.
• Design and develop robust and scalable products and tools to enhance operational efficiency.
• Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
• Participate in incident response and troubleshooting efforts to minimize downtime and ensure system reliability.
• Maintain project and product documents and knowledge
• Be part of an on-call rotation to support production systems (if needed)


Based in Shanghai, China, this person will work closely with the global IT team, and HQ teams.

Whom we are looking for:

  • A quick learner
  • A positive, self-motivated, and passionate person
  • Independent, insistent, and open-minded.
  • A great team player and both dependable and autonomous.
  • Customer-oriented and could work at a very fast pace.

Requirements:

Requirements

  • 5+ years of experience with Infrastructure automation, distributed systems design, experience with design, develop tools for running large-scale private or public cloud systems in Production
  • In-depth knowledge and understanding of monitoring concepts, alert mechanisms, log monitoring, anomaly detections, creation, and setup of dashboards.
  • In-depth knowledge and experience with Elastic Search, Prometheus
  • Expertise in configuration management with a framework such as Ansible, Terraform, Helm
  • Proficiency with programming languages like Python, Golang, and shell scripting to automate tasks
  • Passion for infrastructure and monitoring as code
  • Bachelor’s degree (or higher), Computer Science, Mathematics, or related science or engineering major
  • Solid understanding of cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes).
  • Good understanding and hands on experience in network is plus
  • Bilingual preferred (English, Chinese)
View Full Job Description

Add your resume

80%

Upload your resume, increase your shortlisting chances by 80%

About The Company

Tencent is a world-leading internet and technology company that develops innovative products and services to improve the quality of life of people around the world.


Founded in 1998 with its headquarters in Shenzhen, China, Tencent's guiding principle is to use technology for good. Our communication and social services connect more than one billion people around the world, helping them to keep in touch with friends and family, access transportation, pay for daily necessities, and even be entertained.


Tencent also publishes some of the world's most popular video games and other high-quality digital content, enriching interactive entertainment experiences for people around the globe.


Tencent also offers a range of services such as cloud computing, advertising, FinTech, and other enterprise services to support our clients' digital transformation and business growth.


Tencent has been listed on the Stock Exchange of Hong Kong since 2004.

Irvine, California, United States (On-Site)

Amsterdam, North Holland, Netherlands (On-Site)

Tokyo, Japan (On-Site)

(On-Site)

Tokyo, Japan (On-Site)

(On-Site)

View All Jobs

Get notified when new jobs are added by Tencent

Similar Jobs

AMD - Formal Verification-11+ YRS

AMD, India (On-Site)

Interactive Brokers - Technical Operations Specialist (TOPS)

Interactive Brokers, United States (Hybrid)

Fluence - Controls Engineer

Fluence, United Kingdom (Remote)

Smarsh - Sr FinOps Engineer

Smarsh, India (Hybrid)

Luxoft - Expert Programming Hero (Dual-Mode)

Luxoft, Romania (On-Site)

Britive - ENGINEERING MANAGER

Britive, India (Remote)

Get notifed when new similar jobs are uploaded

Similar Skill Jobs

Meta - Design Verification Engineer

Meta, United States (On-Site)

The Walt Disney Company - CG Supervisor

The Walt Disney Company, Australia (On-Site)

Luxoft - Support Network Engineer with Automation

Luxoft, United States (Remote)

Paypal - Senior AI Machine Learning Engineer

Paypal, United States (On-Site)

Dezerv - DevOps Engineer

Dezerv, India (On-Site)

prizepicks - Front End Engineer III (React/Typescript)

prizepicks, United States (Remote)

Paypal - Lead Principal ML Engineer, AI Solutions

Paypal, United States (On-Site)

Swiss Re - Senior Site Reliability Engineer

Swiss Re, India (On-Site)

Get notifed when new similar jobs are uploaded

Jobs in Shanghai, Shanghai, China

Keywords Studios (Player Support) - Workday Integrations, Security & Reporting People Technology Partner

Keywords Studios (Player Support), China (Remote)

Ubisoft - Senior Character Artist (For Honor)

Ubisoft, China (On-Site)

Keywords Studios (Player Support) - Japanese LQA Tester

Keywords Studios (Player Support), China (On-Site)

Razer - Business Development Specialist

Razer, China (On-Site)

Corsair - Supplier Quality Engineer

Corsair, China (On-Site)

Keywords Studios (Player Support) - Expert Subtitle Translator/QCer: English to Simplified Chinese

Keywords Studios (Player Support), China (Remote)

Tencent - Senior Strategic Investment Manager

Tencent, China (On-Site)

Logitech - EE manager

Logitech, China (On-Site)

Keywords Studios (Player Support) - Senior project manager (game art)

Keywords Studios (Player Support), China (On-Site)

Astek - BSP Audio Engineer

Astek, China (On-Site)

Get notifed when new similar jobs are uploaded

DevOps Jobs

Luxoft - Database Reliability Engineer

Luxoft, Romania (On-Site)

Miniclip - Senior Cloud Database Engineer

Miniclip, Portugal (On-Site)

Ubisoft - Golang Developer

Ubisoft, Canada (Hybrid)

Tesla - Site Reliability Engineer, Energy Software

Tesla, Netherlands (On-Site)

Pentair - DevOps Engineer- IoT

Pentair, India (On-Site)

Barracuda Networks  Inc  - Sr. Salesforce Developer

Barracuda Networks Inc , India (Hybrid)

Get notifed when new similar jobs are uploaded