Deploy, manage, and optimize storage solutions using ZFS and iSCSI across global data centers.
Implement and maintain automation and monitoring tools such as Puppet, Grafana, Zabbix, and Jenkins to enhance system performance and reliability.
Utilize storcli for managing server storage configurations.
Linux Systems Expertise:
Manage and maintain Ubuntu-based systems, ensuring security and compliance.
Conduct performance tuning and capacity planning for Linux servers.
Develop and implement self-healing systems and automated recovery processes on Linux platforms.
Reliability Engineering:
Develop and implement strategies for improving system availability and performance.
Conduct root-cause analysis and incident response for storage-related issues.
Collaborate with SDEs to support software development infrastructure and deploy new product features.
Preferred candidate profile
Proven experience in site reliability engineering, with a focus on storage solutions and Linux systems.
Strong knowledge of ZFS, iSCSI, and Ubuntu.
Expertise in automation and configuration management tools (e.g., Bash, Ansible, Puppet).
Familiarity with Hashicorp tools, SSH, and LDAP.
Experience with storcli for storage configuration.
Experience with monitoring tools such as Grafana, Zabbix, InfluxDB.
Ability to conduct root-cause analysis and implement effective solutions.
High level of ownership for assigned team problem space, including driving predictable delivery, continuous iteration and improvement, consistent and effective communication team, gracefully coordinating with upstream and downstream stakeholders, and project status.
Project management skills, including experience with task estimation, scheduling, Gantt charts, unblocking dependencies, Agile methodologies (such as sprint planning or Scrum), being detail-oriented, and keeping projects on track. Ability to define broad, complex problems and break into discrete, specific tasks that can be delegated.
Documentation skills including writing standard operating procedures, design docs, policy documents, runbooks.
Job Classification
Industry: IT Services & ConsultingFunctional Area / Department: IT & Information SecurityRole Category: IT Infrastructure ServicesRole: IT Infrastructure Services - OtherEmployement Type: Full time