Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Site Reliability Engineer (ai/ml Expertise) - Btp @ SAP

Home > Devops

 Site Reliability Engineer (ai/ml Expertise) - Btp

Job Description

SAP is seeking a Site Reliability Engineer (SRE) with AI/ML operational expertise , basic DevOps skills , and strong proficiency in Red Hat Linux and Shell/Bash scripting . This role focuses on reliability engineering, observability, automation, and operational excellence for SAP-centric platforms and AI-driven services. Experience with SAP Integration Suite , SAP API Management , or Google Apigee is essential.
What you ll build:
  • Reliability Engineering
    • Define and implement SLIs, SLOs, SLAs for SAP applications.
    • Apply error budgets and reliability principles to guide release decisions.
    • Conduct capacity planning , performance tuning , and chaos engineering for resilience.
  • Observability & Incident Management
    • Build end-to-end observability : metrics, logs, traces, and distributed monitoring.
    • Implement Dynatrace for application and infrastructure monitoring.
    • Automate alerting , runbooks , and incident response workflows .
    • Drive postmortems , root cause analysis, and continuous improvement.
  • AI/ML Operations
    • Operationalize ML models: deployment, monitoring, drift detection, rollback strategies .
    • Ensure model reliability , fairness, and compliance in production environments.
    • Automate model retraining pipelines and integrate with SAP BTP AI services.
  • API Management & Integration
    • Manage and secure APIs using SAP Integration Suite , SAP API Management , or Google Apigee .
    • Implement API governance , traffic management, and monitoring for reliability.
    • Collaborate with development teams to ensure API-first architecture and integration best practices .
  • Basic DevOps
    • Support CI/CD pipelines for SAP and non-SAP workloads.
    • Implement Infrastructure as Code (IaC) for environment provisioning.
    • Assist in containerization and Kubernetes operations .
  • Security & Compliance
    • Embed security controls into operational workflows (secrets management, vulnerability scanning).
    • Ensure compliance with data privacy , auditability , and SAP security standards .
What you bring:
  • Bachelors or Masters in relevant field
  • 5+ years of experience in SRE/Platform Reliability roles, including AI/ML operations experience .
  • Hands-on experience with SAP Integration Suite , SAP API Management , or Google Apigee .
  • Strong proficiency in Red Hat Linux and Shell/Bash scripting .
  • Exposure to basic DevOps practices (CI/CD, IaC, container orchestration).
Core SRE Skills
  • SLIs/SLOs, error budgets, chaos engineering.
  • Observability: Dynatrace, Prometheus, Grafana, ELK/EFK, OpenTelemetry.
  • Incident response: Knowledge on ticketing tools like Jira, ServiceNow. Ability to investigate and prepare CF RCAs.
  • Familiarity with multiple cloud platforms such as AWS, Azure, Google Cloud Platform. Understand their respective offerings, services, and best practice.
API Management
  • SAP Integration Suite, SAP API Management, or Google Apigee.
  • API security, throttling, analytics, and governance.
Automation & Infrastructure
  • Kubernetes, Helm, Terraform, Argo CD/Flux.
  • Self-healing and auto-remediation strategies.
AI/ML Operations
  • MLflow, Kubeflow, Airflow, model monitoring tools (Evidently AI, WhyLabs).
  • Feature stores, model registries, inference serving (Triton, Seldon).
Basic DevOps
  • CI/CD tools: Azure DevOps, GitHub Actions, Jenkins.
  • Containers: Docker, Kubernetes.
SAP Exposure
  • SAP BTP (Neo, Cloud Foundry), SAP AI Core, SAP HANA ML.
Programming
  • Python (for ML workflows), Shell/Bash scripting, Go/Java for reliability tooling.
KEY PERFORMANCE INDICATORS
  • SLO attainment, MTTR reduction, incident frequency.
  • API reliability and performance metrics.
  • Model reliability metrics (drift detection time, rollback success).
WHAT YOU GET
  • Opportunity to shape reliability engineering for AI-driven SAP solutions .
  • Work on API management and observability in hybrid cloud environments.
  • Collaborative culture focused on operational excellence and innovation .

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Site Reliability Engineer
Employement Type: Full time

Contact Details:

Company: SAP
Location(s): Bengaluru

+ View Contactajax loader


Keyskills:   Performance tuning Automation SAP Operational excellence Reliability engineering Continuous improvement bash scripting Analytics Python Capacity planning

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

DevOps Engineer

  • Sai Placement
  • 3 - 6 years
  • Vadodara
  • 11 days ago
₹ 18-20 Lacs P.A.

Cloud Assistant Engineer

  • Pepsico
  • 4 - 9 years
  • Hyderabad
  • 11 days ago
₹ Not Disclosed

Cloud Platform Devops Engineer

  • Baker Hughes
  • 4 - 8 years
  • Mumbai
  • 12 days ago
₹ Not Disclosed

Cloud Platform Devops Engineer

  • Baker Hughes
  • 4 - 8 years
  • Hyderabad
  • 12 days ago
₹ Not Disclosed

SAP

Publicis Sapient is a digital business transformation company. We partner with global organizations to help them create and sustain competitive advantage in a world that is increasingly digital. We operate through our expert SPEED capabilities: Strategy and Consulting, Product, Experience, Engineeri...