Your Career
Were seeking an experienced hands-on Cloud SRE manager to lead high-severity incident and problem management across our GCP-centric platforms. This role combines deep technical troubleshooting with process ownership, ensuring rapid recovery, root cause elimination, and long-term reliability improvements. You will own L3 OnCall responsibilities, drive post-incident learning, and champion automation and operational excellence.
Implement and lead post-mortem processes within SLAs, identify root causes, and drive corrective actions to reduce repeat incidents.
More information about the Cortex product can be found
Your Impact
Your Experience

Keyskills: site reliability engineering gke kubernetes google cloud platform sre ai site reliability sla alerting postgresql reliability engineering cassandra gcp devops kafka mysql gitlab terraform prometheus aws infrastructure as code