Youll lead efforts to instrument and monitor our production environment with deep visibility and proactive issue detection. This includes tracking Core Web Vitals, feature KPIs, funnel conversions, API responsiveness, and broader traffic shifts. Your work will empower WM.com to measure daily change with precision and respond to any anomalies before customers are impacted.
Key Responsibilities
Architect and maintain robust monitoring frameworks using LogRocket, Datadog, AppDynamics, LaunchDarkly, BrowserStack, and more
Define and track performance indicators such as Core Web Vitals, feature-specific KPIs, and system throughput metrics
Quickly identify, analyze, and escalate production issues with full operational context
Build automated alerting and escalation systems to streamline support responses
Recommend and implement new observability tools to enhance coverage and reduce blind spots
Partner with engineering and support teams to develop best-in-class incident response playbooks
Qualifications
3+ years experience in Site Reliability Engineering, DevOps, or Infrastructure roles
Hands-on expertise with modern observability platforms and cloud ecosystems
Strong troubleshooting and root cause analysis skills across distributed systems
Passion for clean instrumentation, operational excellence, and building resilient platforms
Bonus: Experience in high-traffic consumer-facing platforms or working with Kubernetes/Docker setupsRole & responsibilities
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: OtherRole Category: OtherRole: OtherEmployement Type: Full time