Your browser does not support javascript! Please enable it, otherwise web will not work for you.

Network Engineer - Data center - HPC @ Cirruslabs

Home > IT Network






Cirruslabs  Network Engineer - Data center - HPC

Job Description


Key Responsibilities

  • Operate and support data center network infrastructure, including high-speed switching, routing, optics, cabling, and transceivers in on-prem environments.
  • Manage IP addressing, VLANs, VRFs, and segmentation constructs to support secure and scalable network operations.
  • Support modern data center architectures including leaf-spine fabrics, ECMP-based forwarding, and VXLAN overlay/underlay environments, including day-2 operations.
  • Execute network changes, maintenance activities, upgrades, and validation tasks with strong focus on availability, reliability, and operational discipline.
  • Troubleshoot L1L3 issues impacting network performance, including optics/cabling faults, interface errors, MTU mismatches, routing behavior, packet loss, congestion, retransmissions, and ECMP pathing.
  • Diagnose latency, throughput, drop, and east-west traffic performance issues in high-performance environments.
  • Use Linux CLI, logs, and common network troubleshooting tools to investigate incidents and validate infrastructure behavior.
  • Partner with compute, storage, application, and platform teams to isolate whether bottlenecks originate from the network, host, or application layer.
  • Develop and maintain network automation artifacts such as templates, playbooks, and scripts to improve consistency and reduce manual tasks.
  • Work with DevOps teams to integrate network automation into existing deployment and operational workflows without owning the DevOps platform.
  • Support network observability and telemetry, including streaming telemetry, flow data, and performance-focused monitoring capabilities. They should be very familiar with end-to-end monitoring/alerting of production network(s). The specific tool we'll use in this environment is NetQ but something equivalent would be ok.
  • Contribute to continuous improvement of network operations through standardization, documentation, and operational best practices.

Must-have (Required Qualifications)
Must have:

  • 35+ years in data center or enterprise network operations/support
    Must have real production support experience.
  • CCNA-level knowledge or equivalent hands-on networking foundation
    Enough to be productive quickly without teaching basic networking.
  • Strong L2/L3 and TCP/IP fundamentals
    Proven troubleshooting across L1L3: physical, switching, routing, forwarding.
  • Experience supporting on-prem physical infrastructure
    Switches, optics, cabling, transceivers, port health, physical fault isolation.
  • Strong Linux operational comfort
    CLI, logs, interfaces, routing tables, packet tools, basic host/network troubleshooting.
  • General network monitoring and observability experience
    Must be strong in end-to-end production monitoring, alerting, and fault isolation; NetQ-equivalent experience is acceptable.
  • Ability to troubleshoot performance issues across the traffic path
    Latency, packet loss, congestion, MTU, retransmissions, ECMP pathing.
  • Understanding of modern data center architecture
    Leaf-spine, ECMP, VLAN/VRF segmentation, east-west traffic patterns for AI/HPC-style environments
  • Familiarity with underlay/overlay concepts in VXLAN environments
    Enough to support day-2 operations and trace traffic logically and physically from point A to point B
  • Foundational knowledge of RDMA/RoCE concepts
    Not deep design expertise, but enough to understand AI/HPC traffic sensitivity and why congestion behavior matters
  • Working exposure to automation/config management
    Ansible-like repeatable playbooks, structured execution, and basic scripting for network operations. This should be operational exposure, not deep Python engineering.
  • Ability to collaborate with DevOps teams
    Can work within existing tooling, workflows, and change practices.
  • Experience working in Agile/Kanban fashion
  • Basic VPC-to-cloud awareness
    Keep this as the lowest-priority must-have, since it is not central to the role you described.

Nice to have:

  • Experience with NVIDIA NetQ or equivalent tools such as NMX
    Helpful because NetQ aligns closely to the telemetry-heavy operating model used in Spectrum environments
  • Exposure to NVIDIA Spectrum switching platforms/ecosystem
    Valuable, but teachable within a month for someone strong in core networking
  • In modern NVIDIA AI fabrics, telemetry and operations visibility are a core part of day-2 support, not an afterthought
  • Hybrid networking experience
    Cloud interconnectivity, routing, and policy across on-prem and cloud.
  • Production EVPN-VXLAN experience
    MP-BGP EVPN, VNIs, anycast gateway, multi-tenancy/VRFs, day-2 troubleshooting
  • Good structured automation practices
    Templates, version control, peer review, rollback discipline.
  • Basic Python depth
    Useful, but now clearly lower priority than network troubleshooting, Linux, and monitoring.
  • Kubernetes networking familiarity
    CNI, overlays, service networking, and interaction with the underlay.
  • Host-side NIC tuning experience
  • Experience with high-performance or parallel storage networking
  • AI/HPC networking experience
    Rail-optimized, dual-plane, CLOS, GPU clusters, east-west optimization.
  • Exposure to GPU-cluster communication patterns such as NCCL
    Strong differentiator, not a gate.

Job Classification

Industry: Recruitment / Staffing
Functional Area / Department: Engineering - Hardware & Networks
Role Category: IT Network
Role: Network Programmer / Analyst
Employement Type: Full time

Contact Details:

Company: Cirruslabs
Location(s): Hyderabad

+ View Contactajax loader


Keyskills:   High Performance Computing Networking Linux Data Center enterprise HPC

 Fraud Alert to job seekers!

₹ 15-25 Lacs P.A

Similar positions

IT Network Operations Consultant

  • TP
  • 2 - 6 years
  • Chennai
  • 2 days ago
₹ Not Disclosed

Network Development Engineer, Office Network Reliability Engineering

  • Amazon HR Muskan
  • 2 - 7 years
  • Bengaluru
  • 5 days ago
₹ Not Disclosed

Network Security Engineer III

  • Rackspace Technology
  • 5 - 7 years
  • Noida, Gurugram
  • 6 days ago
₹ Not Disclosed

Network Security Engineer III

  • Rackspace Technology
  • 5 - 7 years
  • Noida, Gurugram
  • 6 days ago
₹ Not Disclosed

Cirruslabs

We are CirrusLabs. Our vision is to become the world's most sought-after niche digital transformation company that helps customers realize value through innovation. Our mission is to co-create success with our customers, partners and community. Our goal is to enable employees to dream, grow and make...