Site Reliability Engineer (DC/NY/SF/Remote)
What you'll do:
Internal-facing
- Implement monitoring, logging and alerting for legacy systems we've inherited from a past team
- Provide guidance to teams as they prepare new systems for production launch
- Manage a Root Cause Analysis process used by multiple scrum teams across dozens of systems
- Help teams setup and run fire drill exercises on a quarterly cadence
- Help define an approach for bringing Chaos Engineering to the agency
External-facing
- Maintain a statuspage displaying service availability through a mix of automated monitoring and manual updates
- Document SRE best practices for use by other application teams throughout the agency
- Develop a training process to help application teams who are new to SRE practices and cloud infrastructure to build reliable, scalable, and secure applications
- Consult with application teams on an as-needed basis on how to properly configure monitoring, logging and alerting
- Consult with application teams on how to setup processes such as an incident response process, an RCA process.
What we're looking for:
- At least 1 year of production on-call experience
- Previous experience maintaining a medium or large scale production system, especially in regards to working with monitoring, logging, and alerting
- Experience debugging issues across a complex system architecture
- Ability to make changes to an existing codebase, such as installing a new monitoring agent(not required, but helpful)
- Ability to perform light scripting, such as writing a simple bash/python/ruby/go script(not required, but helpful)
- Experience configuring selenium scripts, newrelic synthetics, or other automated functional testing
- Excellent written and verbal communication skills, technical and otherwise
- Ability to communicate complex technical topics to a range of audiences, from highly technical to non-technical
- Ability to develop clear, repeatable processes, and to produce documentation and runbooks that are accessible to a range of audiences
- Experience with the following systems a plus: AWS, Azure, new relic, splunk, ELK, cloudwatch
- Education requirements: Bachelor’s degree
Job Type
Client Payroll
Positions
DevOps Engineer
Must have Skills
Languages
english -Fluent
Skip


Refer a friend for this role and earn
25 USD
Use the share options below Learn More
Refer a friend for this role and earn 25 USD
Don’t forget to share your referral URL
Up to 450 USD/Hour
450 USD
Up to 450 K/Year USD (Annual salary)
Longterm (Duration)
Fully Remote
Tracey J