Seeking a strong, resourceful and committed support engineer to maintain and ensure the proper working status of our AWS-based production system and services. The candidate must have a proven record of supporting AWS-based backend systems by setting up monitoring and alerting and performing a first round of troubleshooting as needed. The candidate must have hands-on experience with analyzing large sets of data trying to pinpoint potential production issues. The candidate must demonstrate and expand the use of best practices with AWS cost control and production efficiency.
Responsibilities
Monitor, maintain and support the team’s production assets in AWS which includes Collections and CreditingPerform reviews of our systems, track updates, generally ensure the health and functional order of all our stage and production assets in AWSAs this is a large system processing large amounts of data 24x7 with daily downstream delivery contracts, ensure such daily delivery commitments are met per SLAs with NO delay or missAutomate any manual processes or introduce new processes and best practices to ensure the proper functional order of our AWS-based production environmentExecute post deployment impact analysis and checks following releases to ensure no quality escapePerform data analysis at the request of both our own team’s senior members as well as external stakeholders. Such analysis will involve collecting, combining, joining multiple data sets from multiple tables coming from our databases or MDL (Media Data Lake) environments and highlighting any anomalies or trend breaksEngage appropriate members of the team (leaders, developers) when alerts are reportedAble to perform a first level of troubleshooting and analysis on problems assisting and supporting further in-depth analysis by the developers themselvesMonitor AWS systems performance and advise of any necessary infrastructure changesMust be able to make themselves available at any time an alert in production is reported, assess severity and engage the team as neededDocument processes and procedures in RunBooks or any other type of documentation (for support and audit purposes)Keep and maintain good records and logs of releases, upgrades, issues and actions
Key skills
At least 2-3 years of professional, hands-on experience in supporting and maintaining large AWS systems and services, ensuring quality and proper working orderWorking knowledge of the AWS ecosystem and servicesSQL programming for writing queries for data checks and analysisKnowledge of Python programmingKnowledge of AWS security best practices, including IAM roles, security groupsKnowledge of Unix and Windows environmentsVery good knowledge of AWS Monitoring, Alerting and Automation concepts and tools. Experience in both setting up and using such tools on an ongoing basisAny prior experience in designing datasets and visualizations with tools like Superset, Grafana is a definite PLUSResourceful, self-starter, proactive and a team-player. Able to quickly assess possible problems and take quick decisions to protect our environment and our dataDetail oriented, problem solverEffective communication; ability to describe and explain potential or existing issues and problems efficiently and accurately. Very good writing skillsExperience working in an Agile