JD :
1.1 Responsibilities ( Principal SRE )
- Be responsible for both on-premises and cloud-based infrastructure. - Function as an extension of the existing staff and teams.
- Possess deep expertise in:
- Troubleshooting
- Infrastructure (physical and cloud)
- Automation
- Enterprise-level system administration - Agile workflows
- Enterprise change management
Technical Proficiencies
Cloud Technologies:
- AWS native load balancers
- AWS EC2, ECS, EKS, Containers - Terraform
Monitoring & Observability:
- Splunk Cloud Observability - CloudWatch
DevOps & Automation:
- CI/CD
- Jenkins
- Automation with Python
AWS Infrastructure Management
- Deploy and maintain shared platform team assets (e.g., ECS clusters, ALBs) - Deploy and maintain unique or non-standard infrastructure assets
- Assist developer teams in standardizing deployments
3. Cost Containment
- Minimize service operational costs
- Perform periodic cost analysis to identify cost-saving opportunities
4. Capacity Planning
- Conduct capacity analysis for production and non-production environments - Right-size domain assets for performance and availability
- Leverage automation (e.g., autoscaling)
5. Monitoring
- Collaborate on defining SLOs for service availability
- Coordinate deployment quality objectives
- Develop pattern-based service monitors
- Implement uniform service measurement and monitoring
6. Performance Optimization
- Develop performance measurement techniques
- Assist in refining and improving service efficiency over time
7. Incident Management
- Provide on-call support and drive service restoration
8. Security
- Implement InfoSec-recommended patterns - Monitor anomalies using internal tools
9. NOC Services
- Establish targeted alerting and predefined NOC response procedures
1.2 Key Responsibilities
- Develop and maintain monitoring and alerting systems
- Manage the incident response lifecycle (runbooks, dashboards, automation) - Automate operational tasks for efficiency
- Participate in on-call rotations
- Design performance testing and capacity planning strategies
- Collaborate across teams to troubleshoot and resolve issues
1.3 Required Qualifications
- Strong problem-solving skills
Hands-on experience with:
- Cloud Platforms: AWS, Azure, GCP
- IaC Tools: Terraform or CloudFormation
- Programming Languages: Python, Java, C/C++, Go, JavaScript, or Ruby - Log Aggregation: Splunk, ELK, or SumoLogic
- Monitoring Tools: SignalFx, Datadog, Dynatrace, AppDynamics
- Prior roles in SRE, Software Engineering, or Production Engineering - Passion for learning and improving systems
- Interest in SLIs, SLOs, resilience, scaling, system Design and performance
1.4 Desired Qualifications
- Experience with large-scale distributed systems
Familiarity with configuration and automation tools: - Terraform, Puppet, Ansible
Experience with CI/CD and DevOps toolchain:
- Git, Jenkins, Docker, Nexus, Artifactory, Selenium
Knowledge of cloud security practices, including: - Intrusion detection
- Penetration testing
- Vulnerability scanning

Keyskills: Ci Cd Pipeline Terraform Ansible AWS Python GIT Site Reliability Engineering
DXC Technology (NYSE: DXC) helps global companies run their mission-critical systems and operations while modernizing IT, optimizing data architectures, and ensuring security and scalability across public, private and hybrid clouds. The worlds largest companies and public sector organizations trust ...