Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Manager @ Infinite

Home > Devops

 Manager

Job Description

  • Collaborate with development teams, product owners, and stakeholders to define, enforce, and track SLOs and manage error budgets.
  • Improve system reliability by designing for failure, testing edge cases, and monitoring key metrics.
  • Boost performance by identifying bottlenecks, optimizing resource usage, and reducing latency across services.
  • Build scalable systems that handle growth in traffic or data without compromising performance.
  • Stay directly involved in technical work, contributing to the codebase and leading by example in solving complex infrastructure challenges
AI Ops:
  • Design and implement scalable deployment strategies optimized for large language models like, Llama, Claude, Cohere and others.
  • Set up continuous monitoring for model performance, ensuring robust alerting systems are in place to catch anomalies or degradation.
  • Stay current with advancements in MLOps and Generative AI, proactively introducing innovative practices to strengthen AI infrastructure and delivery.
Monitoring and Alerting:
  • Set up monitoring and observability using Prometheus, Grafana, CloudWatch, and logging with OpenSearch/ELK
  • Proactively identify and resolve issues by leveraging monitoring systems to catch early signals before they impact operations.
  • Design and maintain alerting mechanisms that are clear, actionable, and tuned to avoid unnecessary noise or alert fatigue.
  • Continuously improve system observability to enhance visibility, reduce false positives, and support faster incident response.
  • Apply best practices for alert thresholds and monitoring configurations to ensure reliability and maintain system health.
Cost Management:
  • Monitor infrastructure usage to identify waste and reduce unnecessary spending.
  • Optimize resource allocation by using right-sized instances, auto-scaling, and spot instances where appropriate.
  • Implement cost-aware design practices during architecture and deployment planning.
  • Track and analyze monthly cloud costs to ensure alignment with budget and forecast.
  • Collaborate with teams to increase cost visibility and promote ownership of cloud spend.
Required Skills & Experience :
  • Strong experience as SRE with a proven track record of managing large-scale, highly available systems.
  • Knowledge of core operating system principles, networking fundamentals, and systems management.
  • Strong understanding of cloud deployment and management practices
  • Hands-on experience with Terraform/OpenTofu, Helm, Docker, Kubernetes, Prometheus and Istio
  • Hands-on experience with tools and techniques to diagnose and uncover container performance
  • Skilled with AWS services both from technology and cost perspectives
  • Skilled in DevOps/SRE practices and build/release pipelines
  • Experience working with mature development practices and tools for source control, security, and deployment
  • Hands on experience with Python/Golang/Groovy/Java
  • Excellent communication skills, written and verbal
  • Strong analytical and problem-solving skills
Preferred Qualifications
  • Experience scaling Kubernetes clusters and managing ingress traffic.
  • Familiarity with multi-environment deployments and automated workflows.
  • Knowledge of AWS service quotas, cost optimization, and networking nuances.
  • Strong troubleshooting skills and effective communication across teams.
  • Prior experience in regulated environments (HIPAA, SOC2, ISO27001) is a plus

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: Release Manager
Employement Type: Full time

Contact Details:

Company: Infinite
Location(s): Bengaluru

+ View Contactajax loader


Keyskills:   kubernetes python iso hipaa resource soc sre golang cloud deployment tracking helm docker practice management system administration groovy java grafana writing troubleshooting terraform prometheus aws communication skills amazon cloudwatch

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

DevOps Environment Manager - K3

  • Softtek
  • 4 - 6 years
  • Bengaluru
  • 5 days ago
₹ Not Disclosed

Manager, DevOps Engineering

  • Leading Client
  • 10 - 15 years
  • Chennai
  • 12 days ago
₹ Not Disclosed

Hybrid Cloud Kubernetes Architect-Manager

  • EY
  • 10 - 15 years
  • Chennai
  • 15 days ago
₹ Not Disclosed

Hybrid Cloud Integration Architect-Manager

  • EY
  • 7 - 11 years
  • Chennai
  • 15 days ago
₹ Not Disclosed

Infinite

About Xperience Infinite: Xperience Infinite specializes in developing cutting-edge Computerized Maintenance Management System (CMMS) solutions tailored for UK-based retail giants. Our innovative products are designed to manage and maintain assets and resources efficiently, ensuring optimal perfor...