Manager, Development Operations - RAPIDS Data Science
Excelero Storage
NVIDIA RAPIDS is an open-source software suite that leverages NVIDIA GPUs to accelerate data science and machine learning workflows. It provides a collection of libraries and APIs that enable users to perform tasks like data manipulation, machine learning, and graph analytics entirely on GPUs, offering significant speedups compared to traditional CPU-based processing. RAPIDS supports customers and partners ranging from individual data scientists to the world’s largest supercomputers and Fortune 500 companies.
This team supports the Data Science engineering team, including RAPIDS, an open-source GPU-accelerated data science platform that speeds up pandas, scikit-learn, and NetworkX workflows, along with NeMo Retriever for AI retrieval pipelines and NeMo curator for processing text, image, and video data at scale for AI model training. We support CI/CD systems, infrastructure deployment and maintenance, build systems, and security compliance, working closely with build engineers, developers, and leadership to ensure the delivery of high-quality software used by data scientists and AI developers across the world. We are looking for a hardworking manager to lead NVIDIA’s Data Science Engineering Operations team, supporting multiple engineering teams working on data science (and adjacent) libraries such as RAPIDS. As a manager, you’ll have the opportunity to help support and grow the RAPIDS project. You will work closely with RAPIDS build and development teams to ensure high-quality releases of CUDA/C++ and Python libraries as well as containers.
What you’ll be doing:
Lead a team of DevOps engineers supporting multiple software projects in the data science and AI domain, many of them open-source
Collaborate with build engineers, developers, and management to ensure the delivery of high-quality software
Lead by doing, taking a hands-on approach working with the engineers on the team
Help lead a range of DevOps initiatives, including CI/CD, security/legal compliance, and SysAdmin
Help to operate and run our infrastructure and development processes
What we need to see:
Bachelor of Science in Computer Engineering, Computer Science, or related technical field, or equivalent experience
8+ overall years of technical experience primarily related to DevOps with 3+ of those years as a team or technical leader
Excellent communication and interpersonal skills
Detail-oriented and comfortable supporting and prioritizing amongst multiple teams
Experience with administration, optimization, and troubleshooting of CI/CD and related tools (including Jenkins, Git, GitHub Actions)
Linux system administration experience (Ubuntu strongly preferred)
Knowledge of programming and automation with scripting languages (Bash and Python preferred)
Ways to stand out from the crowd:
You have worked with cloud services (AWS, Azure, and others), especially permissions, budget, and cost management
Experience with NVIDIA’s technology stack, including CUDA toolkit and drivers
Background with Conda and/or PyPI packaging, and container technologies such as Docker, especially building and publishing
Experience with GitHub operations, including user, repository, and organization management and permissions along with open-source development and community building on GitHub
NVIDIA is widely considered to be one of the technology world’s most desirable employers. We have some of the most forward-thinking and hardworking people in the world working for us. If you're creative and autonomous, we want to hear from you!
The base salary range is 208,000 USD - 396,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.You will also be eligible for equity and benefits. NVIDIA accepts applications on an ongoing basis.