Site Reliability Engineer
Your Company
Site Reliability Engineer
full-timeremote$140k — $200k
About the role
As a Site Reliability Engineer, you'll ensure our platform is fast, reliable, and available for customers around the clock. You'll build the tools and systems that keep our infrastructure running smoothly while driving a culture of operational excellence.
Responsibilities
- Define and maintain SLOs, SLIs, and error budgets for critical services
- Build and improve observability with monitoring, alerting, and dashboards
- Automate operational tasks and incident response procedures
- Lead incident management and conduct blameless postmortems
- Optimize system performance, reliability, and cost efficiency
- Collaborate with engineering teams on scalability and resilience
- Manage on-call rotations and escalation procedures
- Drive adoption of SRE best practices across the organization
Requirements
- 5+ years of SRE, DevOps, or systems engineering experience
- Strong proficiency in Python, Go, or similar languages
- Deep experience with cloud infrastructure (AWS, GCP, or Azure)
- Expertise with Kubernetes, Terraform, and container orchestration
- Knowledge of monitoring tools (Datadog, Prometheus, Grafana)
- Understanding of distributed systems and failure modes
- Experience with incident management and postmortem processes
Nice to have
- Experience with chaos engineering practices
- Knowledge of FinOps and cloud cost optimization
- Background in capacity planning and load testing
- Contributions to open-source infrastructure tools
Benefits
- Fully remote with on-call compensation
- Competitive salary and equity
- Learning and certification budget
- Home office stipend
- Comprehensive health benefits
About your company
Add a compelling description of your company, culture, mission, and values. This section helps candidates understand why they should work with you.
Use this template in Hirer.one
Customize with AI, publish to job boards, and start receiving applications in minutes.