About Us:
Our purpose is to help clients exceed their financial health goals. Across the reimbursement cycle, our scalable solutions and clinical expertise help solve programmatic needs. Enabling our teams with leading technology allows analytics to guide our solutions and keeps us accountable achieving goals.
We build long-term careers by investing in YOU. We seek to create an environment that cultivates your professional development and personal growth, as we believe your success is our success.
ESSENTIAL DUTIES AND RESPONSIBILITIES:
Note: The essential duties and responsibilities below are intended to describe the general duties and responsibilities of this position and are not intended to be an exhaustive statement of duties. This position may perform all or most of the primary duties listed below. Specific tasks, responsibilities or competencies may be documented in the Team Member s performance objectives as outlined by the Team Member s immediate Leadership Team Member.
We are seeking a skilled Site Reliability Engineer (SRE) with expertise in AWS, Azure, Kubernetes, Serverless (Lambda), Python, CI/CD, and observability tooling. The SRE will ensure system reliability, scalability, and performance across our cloud platforms, while driving automation, observability, and operational excellence.
Key ResponsibilitiesCloud & InfrastructureDesign, deploy, and optimize workloads on AWS and Azure.Manage Kubernetes clusters (AKS, EKS) and serverless solutions like AWS Lambda.Implement secure, scalable, and cost-optimized infrastructure.
Automation & EngineeringBuild automation frameworks and operational tooling using Python.Implement Infrastructure as Code (IaC) using Terraform, CloudFormation, or Azure Bicep.Automate deployments, scaling, monitoring, and remediation.
CI/CD & DevOps PracticesDevelop and maintain CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI/CD, Azure DevOps).Partner with development teams to improve release velocity and reliability.
Observability & MonitoringImplement end-to-end observability practices: metrics, logging, tracing.Manage logging & monitoring solutions (Loggly,ELK/EFK stack, CloudWatch, Azure Monitor, Prometheus, Grafana, Datadog, New Relic).Establish alerting & escalation workflows with PagerDuty, OpsGenie, or similar incident management platforms.
Site Reliability & Incident ManagementParticipate in 24/7 on-call rotations for critical systems.Diagnose, resolve, and perform root cause analysis for incidents.Drive post-incident reviews and continuous improvement in reliability.
Collaboration & MentorshipWork closely with Dev, QA, and Security teams to ensure production readiness.Champion SRE best practices across teams.Mentor junior engineers on cloud-native operations and observability.
Experience: 3 6 years in SRE, DevOps, or Cloud Engineering.
Cloud Platforms: Strong experience with AWS (EC2, S3, Lambda, CloudWatch, RDS, VPC, etc.) and Azure (VMs, Functions, Monitor, Networking, etc.).
Containers & Orchestration: Hands-on with Kubernetes (EKS, AKS) and containerization (Docker).
Programming: Proficient in Python for automation and system integrations.
CI/CD Tools: Experience with Jenkins, GitHub Actions, GitLab, Azure DevOps.
IaC: Proficiency with Terraform / CloudFormation / Bicep.
Observability: Experience with Prometheus, Dyanatrace,Grafana, ELK/EFK, Datadog, New Relic, or similar.
Incident Management: Familiar with PagerDuty, OpsGenie, or equivalent tools.
Networking & Security: Solid understanding of IAM, VPC, firewalls, encryption, and compliance.
Soft Skills: Strong analytical, troubleshooting, and collaboration skills.
Certifications: AWS Certified SysOps Administrator / Solutions Architect, Microsoft Azure Administrator Associate, CKA (Certified Kubernetes Administrator).
Experience with multi-cloud deployments.
Familiarity with service mesh (Istio/Linkerd) and advanced observability (OpenTelemetry)
PHYSICAL DEMANDS:
Note: Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions as described. Regular eye-hand coordination and manual dexterity is required to operate office equipment. The ability to perform work at a computer terminal for 6-8 hours a day and function in an environment with constant interruptions is required. At times, Team Members are subject to sitting for prolonged periods. Infrequently, Team Member must be able to lift and move material weighing up to 20 lbs. Team Member may experience elevated levels of stress during periods of increased activity and with work entailing multiple deadlines.
A job description is only intended as a guideline and is only part of the Team Member s function. The company has reviewed this job description to ensure that the essential functions and basic duties have been included. It is not intended to be construed as an exhaustive list of all functions, responsibilities, skills and abilities. Additional functions and requirements may be assigned by supervisors as deemed appropriate.

Keyskills: Automation Analytical Engineering Manager Troubleshooting Continuous improvement VMS Operations Monitoring Analytics Python