Site Reliability Engineer - Product
Position: Site Reliability Engineer
Location: Pune (Currently WFH, post pandemic you need to relocate)
About the Organization:
A listed product development company, headquarter in Singapore and offices in Australia, United States, Germany, United Kingdom, and India. You will gain work experience in a global environment.
We are looking for an experienced DevOps / Site Reliability engineer to join our team and be instrumental in taking our products to the next level.
In this role, you will be working on bleeding edge hybrid cloud / on-premise infrastructure handing billions of events and terabytes of data a day.
You will be responsible for working closely with various engineering teams to design, build and maintain a globally distributed infrastructure footprint.
As part of role, you will be responsible for researching new technologies, managing a large fleet of active services and their underlying servers, automating the deployment, monitoring and scaling of components and optimizing the infrastructure for cost and performance.
- Ensure the operational integrity of the global infrastructure
- Design repeatable continuous integration and delivery systems
- Test and measure new methods, applications and frameworks
- Analyze and leverage various AWS-native functionality
- Support and build out an on-premise data center footprint
- Provide support and diagnose issues to other teams related to our infrastructure
- Participate in 24/7 on-call rotation (If Required)
- Expert-level administrator of Linux-based systems
- Experience managing distributed data platforms (Kafka, Spark, Cassandra, etc) Aerospike experience is a plus.
- Experience with production deployments of Kubernetes Cluster
- Experience in automating provisioning and managing Hybrid-Cloud infrastructure (AWS, GCP and On-Prem) at scale.
- Knowledge of monitoring platform (Prometheus, Grafana, Graphite).
- Experience in Distributed storage systems such as Ceph or GlusterFS.
- Experience in virtualisation with KVM, Ovirt and OpenStack.
- Hands-on experience with configuration management systems such as Terraform and Ansible
- Bash and Python Scripting Expertise
- Network troubleshooting experience (TCP, DNS, IPv6 and tcpdump)
- Experience with continuous delivery systems (Jenkins, Gitlab, BitBucket, Docker)
- Experience managing hundreds to thousands of servers globally
- Enjoy automating tasks, rather than repeating them
- Capable of estimating costs of various approaches, and finding simple and inexpensive solutions to complex problems
- Strong verbal and written communication skills
- Ability to adapt to a rapidly changing environment
- Comfortable collaborating and supporting a diverse team of engineers
- Ability to troubleshoot problems in complex systems
- Flexible working hours and ability to participate in 24/7 on call support with other team members whenever required.
***** Looking for people from product organizations, who can join at the earliest.
Must have Skills
english - Fluent