Observability & MLOps Engineer
Primary Focus
Observability & ML Lifecycle Management
Core Responsibilities
- Design observability stack
- Implement distributed tracing
- Build Grafana dashboards & alerts
- Integrate telemetry across clouds
Core Skills
- Metrics, logs, traces
- Grafana & alerting
- MLOps engineering
- Python/Scripting
Good-to-Have
- Airflow basics
- Multi-cloud observability
Overlap
- Python/Scripting
- Cloud familiarity
Senior Observability Specialist.
Location: [Chennai, Pune, Bangalore]
Employment Type: Fulltime Experience Required: [15-18]
Job Summary:
We are seeking a highly skilled Senior Observability Specialist to design, implement, and manage endtoend observability strategies across cloud and on-premises environments. This role requires expertise in modern monitoring, logging, and tracing tools, ensuring system reliability, performance optimization, and proactive incident detection. The ideal candidate will have experience with Dynatrace, Datadog, and various opensource solutions, including Grafana, Loki, Tempo, Mimir, and Prometheus.
Key Responsibilities:
Monitoring & Dashboarding Architecture:
Centralized Logging & Distributed Tracing:
Observability Strategy & Automation:
Scripting & API Integrations:
DevOps & CI/CD Integration:
Cloud-Native Observability & DevOps Alignment:
Qualifications & Skills Architectural Focus Strong expertise in designing observability frameworks across Dynatrace, Datadog, Grafana, Loki, Tempo, Mimir, and Prometheus. Proficiency in observability architecture, ensuring scalable and reliable monitoring solutions. Advanced experience in scripting with Python or Go for custom API integrations. Deep understanding of DevOps methodologies, CI/CD best practices, and cloud-native observability tools. Experience in microservices architecture and distributed systems monitoring. Ability to troubleshoot bottlenecks, optimize performance, and implement predictive observability insights. Preferred Certifications (Optional): Certified Kubernetes Administrator (CKA) AWS Certified DevOps Engineer Dynatrace Performance Monitoring Certification Prometheus Certified Associate.

Keyskills: mlops grafana aks MLflow Kubeflow tensorflow Terraform Vertex AI aimlops Keras observability Azure Devops azure
CitiusTech is a specialist provider of healthcare technology services and solutions, with strong presence across the globe. As a strategic partner to some of the world's largest healthcare organizations, CitiusTech plays a deep and meaningful role in accelerating technology innovation and shaping th...