Site Reliability Engineer

The Opportunity:

Our Private SaaS offering has grown significantly over the past year and we now orchestrate and monitor our event pipelines across more than 150 customer-owned AWS & GCP sub-accounts. Each account has its own individualised and optimised stack and all are capable of processing many billions of events per month.

We are looking for another SRE to help us grow to managing 1,000 and then 10,000 AWS, GCP & Azure accounts. You will be pioneering solutions to managing estates of this size through cutting edge monitoring and automation. You’ll work closely with our Tech Ops Lead on all aspects of our proprietary deployment, orchestration and monitoring stacks.

Tech Ops has two areas of responsibility: the centralised services we provide customers and their pipeline infrastructure hosted in their own AWS or GCP accounts. Within both domains we are striving to increase service reliability, fulfil customer requests in a timely fashion, and automate recurring tasks. Task automation is essential as our customer base grows, because our infrastructure estate scales linearly with our customer numbers, unlike most software businesses.

The challenge of automating the maintenance and deployment of thousands of individualised stacks is an enormously ambitious undertaking and a hugely exciting infrastructure automation challenge!

The environment you’ll be working in:

Our company values are Transparency, Honesty, Ownership, Inclusivity, Empowerment, Customer-centricity, Growth and Technical Excellence. These aren’t just words we plucked out of thin air, we came up with them together as a company and are continually looking to find new ways to weave these into our day to day operations. From flexible hours and working locations to the way we give feedback, we’re passionate about building a company that supports both company and individual development.

What you’ll be doing:

- Maintaining and developing our growing Terraform infrastructure-as-code stacks which we use to deploy infrastructure for all internal and client use cases

- Maintaining our internal infrastructure stacks which include the suite as well as our Insights UI and VPNs

- Participating in our on-call rotation to help us serve our client base 24/7Taking rotations of L3 Technical Support where you will be responsible for triaging and dealing with infrastructure issues

- Handling high-severity internal or customer incidents, ensuring we meet all SLAs

What you bring to the team:

- Has worked with AWS in a production capacity - experience in GCP and/or Azure is a bonus

- Has worked with Terraform, CloudFormation or some form of infrastructure-as-code tooling

- Any experience with the HashiCorp stack (Vault, Consul, Nomad) and understanding their role in infrastructure automation is a bonus

-Has worked with Docker and is familiar with container-based architectures

- Knowledgeable about the Linux operating system and how to manage servers in a production capacity

- Knowledgeable about Cloud networking principles and how to troubleshoot issues in this space

- Comfortable scripting in one or more of: Bash, Python, Ruby or PerlComfortable programming in one or more of: Java, Scala, Golang or Python

Position

DevOps Engineer

Must have Skills

AWS
Beginner
Docker
Beginner
Terraform
Beginner
Linux
Beginner

Cancel

Active

Skip

Site Reliability Engineer

Job Type

Client Payroll

Positions

DevOps Engineer

Must have Skills

AWS
Beginner
Docker
Beginner
Terraform
Beginner
Linux
Beginner

Languages

english -Basic

Up to 200 K/Year USD (Annual salary)

Longterm (Duration)

Fully Remote

Site Reliability Engineer

Position

Must have Skills

Your Application

Fully Remote

Partially Remote

On-Site

Your Pricing

Up to 200 K/Year

Availability

Resume*

General Info*

Matching Skills

Please enter the Expertise Level and Years of Experience.

Matching Projects

Intro Video

Resume*

Cover Letter

Add Attachment

Create a new Company

Add new teammate

Create a new Skill

Site Reliability Engineer

Job Type

Positions

Must have Skills

Languages

Candidate Support

Are you sure to withdraw this job ?

Are you sure to remove this filter ?

You have scheduled Interview on null, still you want to skip this job?

Site Reliability Engineer

Position

Must have Skills

Refer a friend for this role and earn 25 USD

Site Reliability Engineer

Job Type

Positions

Must have Skills

Languages

Refer a friend for this role and earn 25 USD

Are you sure to withdraw this job ?

Candidate Support