Site Reliability Engineer - Product

Position: Site Reliability Engineer

Location: Pune (Currently WFH, post pandemic you need to relocate)

About the Organization:

A listed product development company, headquarter in Singapore and offices in Australia, United States, Germany, United Kingdom, and India. You will gain work experience in a global environment.

Job Description:

We are looking for an experienced DevOps / Site Reliability engineer to join our team and be instrumental in taking our products to the next level.

In this role, you will be working on bleeding edge hybrid cloud / on-premise infrastructure handing billions of events and terabytes of data a day.

You will be responsible for working closely with various engineering teams to design, build and maintain a globally distributed infrastructure footprint.

As part of role, you will be responsible for researching new technologies, managing a large fleet of active services and their underlying servers, automating the deployment, monitoring and scaling of components and optimizing the infrastructure for cost and performance.

Day-to-day responsibilities

Ensure the operational integrity of the global infrastructure
Design repeatable continuous integration and delivery systems
Test and measure new methods, applications and frameworks
Analyze and leverage various AWS-native functionality
Support and build out an on-premise data center footprint
Provide support and diagnose issues to other teams related to our infrastructure
Participate in 24/7 on-call rotation (If Required)

Candidate Profile:

Expert-level administrator of Linux-based systems
Experience managing distributed data platforms (Kafka, Spark, Cassandra, etc) Aerospike experience is a plus.
Experience with production deployments of Kubernetes Cluster
Experience in automating provisioning and managing Hybrid-Cloud infrastructure (AWS, GCP and On-Prem) at scale.
Knowledge of monitoring platform (Prometheus, Grafana, Graphite).
Experience in Distributed storage systems such as Ceph or GlusterFS.
Experience in virtualisation with KVM, Ovirt and OpenStack.
Hands-on experience with configuration management systems such as Terraform and Ansible
Bash and Python Scripting Expertise
Network troubleshooting experience (TCP, DNS, IPv6 and tcpdump)
Experience with continuous delivery systems (Jenkins, Gitlab, BitBucket, Docker)
Experience managing hundreds to thousands of servers globally
Enjoy automating tasks, rather than repeating them
Capable of estimating costs of various approaches, and finding simple and inexpensive solutions to complex problems
Strong verbal and written communication skills
Ability to adapt to a rapidly changing environment
Comfortable collaborating and supporting a diverse team of engineers
Ability to troubleshoot problems in complex systems
Flexible working hours and ability to participate in 24/7 on call support with other team members whenever required.

***** Looking for people from product organizations, who can join at the earliest.

Job Type

Payroll

Positions

DevOps Engineers

Must have Skills

DevOps - 4 Years
Advanced
Ansible - 2 Years
Expert
Kubernetes - 2 Years
Advanced
AWS Cloud - 3 Years
Expert
Prometheus - 3 Years
Expert
Grafana - 2 Years
Expert

Languages

english - Fluent

25 - 41 K/Year USD (Annual salary)

Longterm (Duration)

Partially Remote Pune, Maharashtra, India

India

Site Reliability Engineer - Product

Job Type

Positions

Must have Skills

Languages

Refer a friend for this role and earn {{(JobDetailByID.referral_fee > 0) ? getExchangeDecimalRateData((JobDetailByID.referral_fee/4)): getExchangeDecimalRateData(49/4) | number : 0 }} {{currency_code}}

Candidate Support