Senior Site Reliability Manager - REMOTE

REMOTE SRE needed for top SaaS company. Join a growing team which delivers and SaaS product used 24/7 by leading US companies!

  • Denver, CO
  • $150,000 - $200,000
Easy Apply Now

A bit about us:

We are a growing team which delivers a SaaS product that is used 24/7 by development and site-reliability teams at leading enterprises such as Lyft & Doordash!

Why join us?

Our Site Reliability Engineering team is growing with a laser focus on ensuring our SaaS offering operates reliably and at scale.

We are looking for someone who will play a key role in building and operating the world's best real-time data collection and visualization system and work closely with developers to provide a secure, reliable, scalable application for our customers.

Benefits Summary:
401k Matching
Employee Stock Purchase Plan
Medical Coverage, Retirement, and Parental Leave Plans for All Family Types
Generous Time Off Programs
40 hours of paid time to volunteer in your community
Rethink's Neurodiversity program to support parents raising children with learning or behavior challenges, or developmental disabilities
Financial contributions to your ongoing development (conference participation, trainings, course work, etc.)
Healthy and local inspired snacks in all our pantries when visiting an office

Job Details

This position will perform work that the U.S. government has specified can only be performed by a U.S. citizen on U.S. soil, and therefore any offer will be contingent upon verification of both of these requirements. Additionally, we are looking for someone who is willing to obtain security clearance.

Advanced cloud platform, security, linux systems and automation experience, and a strong knowledge of running workloads at scale.
Support thousands of cloud instances in multiple regions at scale and share your learnings and best practices with others.
Be experienced in, and enjoy working remotely within a fully remote and distributed team.

Your mission:
We need someone who is passionate about security, automation, infrastructure as a code, and configuration as a code who can develop and deploy software that will help drive improvements towards the availability, management, and visibility of SaaS Infrastructure.
In this role, you will take part in the SRE on-call and drive improvements to continuously increase the signal-to-noise ratio. You will contribute to the security and development of tools for metrics gathering, introspection, monitoring, automated remediation and orchestration.

Act as a leader on the team through mentoring others, working collaboratively with your peers and the product engineering team.
Demonstrate knowledge of cloud architecture security, scaling, and management principles and have experience working with AWS, GCE or Azure cloud infrastructures.
Demonstrate a commitment to improving incident management, reducing Time to Resolution (MTTR), and solving each technical issue with the goal of taking steps to ensure it doesn’t happen again.
Drive assigned projects to completion, being clear when tradeoffs are needed and deadlines need to be adjusted to accommodate higher-priority work.
Help drive the migration from AWS VMs to container, designing, deploying, and maintaining production services using container technology such as Kubernetes/ EKS or ECS.
Drive reliability and feature improvements within the product by providing feedback to the product management team, influenced by a commitment to act as customer zero.
Be able to collaborate remotely via Slack/Zoom etc.
Easy Apply Now
Easy Apply Now
Job Details
Managed by Jobot Pro
Denver, CO
Job Type
$150,000 - $200,000