DevOps / Site Reliability Engineer (SRE)
Hasura Cloud is a unique Graph QL product that lessens the effort that goes into building backends for applications. Our customers can use Hasura Cloud to generate a fully-featured unified Graph QL API connected to several databases and other REST/Graph QL APIs. DevOps Engineers and Site Reliability Engineers (SREs) are responsible for keeping Hasura Cloud systems running smoothly and making sure updates can be rolled out reliably without any downtime.
- Build out our infrastructure with Terraform, Kubernetes, VMs and bare metal instances.
- Design, build and maintain core infrastructure pieces that allow Hasura Cloud scaling to support thousands of concurrent requests from our users.
- Expand Hasura Cloud to support multiple Cloud providers.
- Improve the deployment process to make it as reliable and boring as possible.
- Be on a PagerDuty rotation to respond to Hasura Cloud availability incidents and provide support for service engineers with customer incidents.
- Use your dev time to address the systemic issues you’ve identified, to proactively prevent incidents from happening.
- Design smart monitoring that alerts on symptoms (our SLIs) rather than on causes, to make each alert meaningful and actionable.
- Document every action so your findings turn into repeatable actions–and then into automation.
- Debug production issues across services and levels of the stack.
- Plan the growth of Hasura Cloud's infrastructure.
You may be a fit to this role if you:
- Think about systems - edge cases, failure modes, behaviors, specific implementations.
- Know your way around Linux and the Unix Shell.
- Know how to use declarative infrastructure tools like Terraform.
- Have strong programming skills (Go/Python).
- Value asynchronous collaboration and communication with your globally distributed team.
- Enjoy documenting all the things so you don't need to learn the same thing twice.
- Have an urge to build automation and tooling so that you never have to do the same work twice.
- Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it.
- Have experience with Nginx, Openresty, Docker, Kubernetes, Terraform, or similar technologies.
- Have experience with various Cloud providers like AWS, GCP, Azure, DO etc., their systems, products and APIs.
- Have experience with monitoring tools like Honeycomb/Datadog/Prometheus/Grafana.
Bonus points for:
- Have experience with Hasura and its Graph QL APIs.
- Have strong fundamentals in SQL, particularly with PostgreSQL.
- Have experience with database management and scaling.
Must have Skills
New Delhi [UTC +5]
english - Fluent