loader image

Site Reliability Engineer (DC/NY/SF/Remote)

What you'll do:

 

Internal-facing


  • Implement monitoring, logging and alerting for legacy systems we've inherited from a past team

  • Provide guidance to teams as they prepare new systems for production launch

  • Manage a Root Cause Analysis process used by multiple scrum teams across dozens of systems

  • Help teams setup and run fire drill exercises on a quarterly cadence

  • Help define an approach for bringing Chaos Engineering to the agency



 

External-facing


  • Maintain a statuspage displaying service availability through a mix of automated monitoring and manual updates

  • Document SRE best practices for use by other application teams throughout the agency

  • Develop a training process to help application teams who are new to SRE practices and cloud infrastructure to build reliable, scalable, and secure applications

  • Consult with application teams on an as-needed basis on how to properly configure monitoring, logging and alerting

  • Consult with application teams on how to setup processes such as an incident response process, an RCA process.



What we're looking for:


  • At least 1 year of production on-call experience

  • Previous experience maintaining a medium or large scale production system, especially in regards to working with monitoring, logging, and alerting

  • Experience debugging issues across a complex system architecture

  • Ability to make changes to an existing codebase, such as installing a new monitoring agent(not required, but helpful)

  • Ability to perform light scripting, such as writing a simple bash/python/ruby/go script(not required, but helpful)

  • Experience configuring selenium scripts, newrelic synthetics, or other automated functional testing

  • Excellent written and verbal communication skills, technical and otherwise

  • Ability to communicate complex technical topics to a range of audiences, from highly technical to non-technical

  • Ability to develop clear, repeatable processes, and to produce documentation and runbooks that are accessible to a range of audiences

  • Experience with the following systems a plus: AWS, Azure, new relic, splunk, ELK, cloudwatch

  • Education requirements: Bachelor’s degree



 


Position

DevOps Engineer


Must have Skills

  • Shell Scripting

    Beginner

  • Python

    Beginner

  • Selenium

    Beginner

  • AWS

    Beginner

Client Payroll

Up to 450 K/Year USD (Annual salary)

Fully Remote

english - Fluent

Languages
Cancel
Cancel

Active a month ago

Skip

Site Reliability Engineer (DC/NY/SF/Remote)

What you'll do:

 

Internal-facing


  • Implement monitoring, logging and alerting for legacy systems we've inherited from a past team

  • Provide guidance to teams as they prepare new systems for production launch

  • Manage a Root Cause Analysis process used by multiple scrum teams across dozens of systems

  • Help teams setup and run fire drill exercises on a quarterly cadence

  • Help define an approach for bringing Chaos Engineering to the agency



 

External-facing


  • Maintain a statuspage displaying service availability through a mix of automated monitoring and manual updates

  • Document SRE best practices for use by other application teams throughout the agency

  • Develop a training process to help application teams who are new to SRE practices and cloud infrastructure to build reliable, scalable, and secure applications

  • Consult with application teams on an as-needed basis on how to properly configure monitoring, logging and alerting

  • Consult with application teams on how to setup processes such as an incident response process, an RCA process.



What we're looking for:


  • At least 1 year of production on-call experience

  • Previous experience maintaining a medium or large scale production system, especially in regards to working with monitoring, logging, and alerting

  • Experience debugging issues across a complex system architecture

  • Ability to make changes to an existing codebase, such as installing a new monitoring agent(not required, but helpful)

  • Ability to perform light scripting, such as writing a simple bash/python/ruby/go script(not required, but helpful)

  • Experience configuring selenium scripts, newrelic synthetics, or other automated functional testing

  • Excellent written and verbal communication skills, technical and otherwise

  • Ability to communicate complex technical topics to a range of audiences, from highly technical to non-technical

  • Ability to develop clear, repeatable processes, and to produce documentation and runbooks that are accessible to a range of audiences

  • Experience with the following systems a plus: AWS, Azure, new relic, splunk, ELK, cloudwatch

  • Education requirements: Bachelor’s degree



 


Job Type

Client Payroll


Positions

DevOps Engineer


Must have Skills

  • Shell Scripting

    Beginner

  • Python

    Beginner

  • Selenium

    Beginner

  • AWS

    Beginner


Languages

english -Fluent

Up to 450 K/Year USD (Annual salary)

Longterm (Duration)

Fully Remote

Skip

Tracey J

| United States