Your browser does not support javascript! Please enable it, otherwise web will not work for you.

MLops Architect @ Citiustech

Home > Devops

 MLops Architect

Job Description

Observability & MLOps Engineer

Primary Focus

Observability & ML Lifecycle Management

Core Responsibilities

- Design observability stack
- Implement distributed tracing
- Build Grafana dashboards & alerts
- Integrate telemetry across clouds

Core Skills

- Metrics, logs, traces
- Grafana & alerting
- MLOps engineering
- Python/Scripting

Good-to-Have

- Airflow basics
- Multi-cloud observability

Overlap

- Python/Scripting
- Cloud familiarity

Senior Observability Specialist.

Location: [Chennai, Pune, Bangalore]

Employment Type: Fulltime Experience Required: [15-18]

Job Summary:

We are seeking a highly skilled Senior Observability Specialist to design, implement, and manage endtoend observability strategies across cloud and on-premises environments. This role requires expertise in modern monitoring, logging, and tracing tools, ensuring system reliability, performance optimization, and proactive incident detection. The ideal candidate will have experience with Dynatrace, Datadog, and various opensource solutions, including Grafana, Loki, Tempo, Mimir, and Prometheus.

Key Responsibilities:

  • Design and implement fullstack observability architectures that provide seamless monitoring, logging, and tracing capabilities.
  • Define best practices for observability across hybrid cloud, multicloud, and onpremises environments.
  • Ensure scalability, availability, and resilience of monitoring solutions in hightraffic applications.

Monitoring & Dashboarding Architecture:

  • Architect Grafana-based observability platforms for real-time visualization and analysis of metrics.
  • Establish Prometheus-based metric collection pipelines optimized for high-volume environments.
  • Integrate Dynatrace and Datadog into cloud-native infrastructure for proactive monitoring.

Centralized Logging & Distributed Tracing:

  • Design and implement centralized logging solutions using Loki, ensuring efficient log ingestion, indexing, and querying.
  • Develop distributed tracing strategies with Tempo to enhance performance monitoring in microservices architectures.
  • Optimize Mimir-based metric storage for seamless data retrieval and scalability.

Observability Strategy & Automation:

  • Define and implement observability-driven DevOps methodologies to improve system reliability.
  • Lead automation initiatives for log analysis, alerting, and anomaly detection using machine learning models.
  • Architect automated alerting workflows using Prometheus Alertmanager, Dynatrace AI alerts, and Datadog event notifications.
  • Ensure efficient KPI tracking and proactive troubleshooting based on observability insights.

Scripting & API Integrations:

  • Develop custom API integrations using Python or Go to query, retrieve, and process monitoring data.
  • Architect event-driven observability pipelines for automated data collection and reporting.

DevOps & CI/CD Integration:

  • Collaborate with DevOps teams to integrate observability tooling within CI/CD pipelines.
  • Optimize system performance and resource utilization through proactive monitoring.
  • Advocate for best practices in observability driven software development.

Cloud-Native Observability & DevOps Alignment:

  • Design observability strategies tailored for Kubernetes-based microservices and cloud-native architectures.
  • Collaborate with DevOps teams to embed observability practices within CI/CD pipelines for continuous monitoring.
  • Optimize logging and metrics pipelines to support containerized and serverless environments.

Qualifications & Skills Architectural Focus Strong expertise in designing observability frameworks across Dynatrace, Datadog, Grafana, Loki, Tempo, Mimir, and Prometheus. Proficiency in observability architecture, ensuring scalable and reliable monitoring solutions. Advanced experience in scripting with Python or Go for custom API integrations. Deep understanding of DevOps methodologies, CI/CD best practices, and cloud-native observability tools. Experience in microservices architecture and distributed systems monitoring. Ability to troubleshoot bottlenecks, optimize performance, and implement predictive observability insights. Preferred Certifications (Optional): Certified Kubernetes Administrator (CKA) AWS Certified DevOps Engineer Dynatrace Performance Monitoring Certification Prometheus Certified Associate.

Job Classification

Industry: IT Services & Consulting
Functional Area / Department: Engineering - Software & QA
Role Category: DevOps
Role: DevOps Consultant / Architect
Employement Type: Full time

Contact Details:

Company: Citiustech
Location(s): Pune

+ View Contactajax loader


Keyskills:   mlops grafana aks MLflow Kubeflow tensorflow Terraform Vertex AI aimlops Keras observability Azure Devops azure

 Fraud Alert to job seekers!

₹ Not Disclosed

Similar positions

DevOps Architect

  • Sapiens
  • 3 - 8 years
  • Bengaluru
  • 26 days ago
₹ Not Disclosed

DevOps Architect

  • Sapiens
  • 10 - 12 years
  • Bengaluru
  • 26 days ago
₹ Not Disclosed

Devops / Mlops With Gen Ai Engineer

  • Big 4
  • 5 - 10 years
  • Hyderabad
  • 1 day ago
₹ Not Disclosed

DevOps Architect

  • Infogain
  • 12 - 18 years
  • Mumbai
  • 2 days ago
₹ Not Disclosed

Citiustech

CitiusTech is a specialist provider of healthcare technology services and solutions, with strong presence across the globe. As a strategic partner to some of the world's largest healthcare organizations, CitiusTech plays a deep and meaningful role in accelerating technology innovation and shaping th...