Project Description:Are you passionate about computer graphics and high-performance computing? Would you like to have hands-on experience with state-of-the-art HW, sometimes even before others get a chance?We are looking for an experienced ML Ops Engineer or Dev Ops to contribute to deploy, maintain, and develop automation and infrastructure systems for major hardware vendor.The ideal candidate should have a background in ML operations, proficient in collaboration with Data Engineering teams, and is well versed with automation tools on clustered deployments.
Responsibilities:Your responsibilities will be focused around supporting local Data Engineering, Software Infrastructure and Research teams: - Build tailored automation systems for teams of ML developers- Facilitate collaboration between Data Engineering, ML Research and Software Infrastructure teams- Implement new helper tools, focusing on practical deployment and cost/resource management
Mandatory Skills Description:- Decent understanding of Unix/Linux- Decently experienced in either Slurm or Kubernetes (both preferred; should be able to setup, configure, manage and expand the clusters as needed)- Can work with Bash, JS and Python with at least good experience in one of them- Experienced with GraphQL and Rest API (throttling, caching, on client side using redis, custom solutions etc.)- Experienced with Ansible, Terraform or Pulumi, Docker, Helm
Nice-to-Have Skills Description:- Experience with ML Frameworks- Technical academin degree in IT or related- General MLOPs pipeline expertise
Languages:English: B2 Upper Intermediate