Senior Software Infrastructure Engineer - Chip Design
Excelero Storage
NVIDIA is looking for a passionate and experienced software developer to join our Chip Design Technologies group, helping to build the software infrastructure and tools that shape the future of Chip Design and Verification. This role works at the intersection of infrastructure engineering, tooling, and database ownership. You’ll be responsible for the backbone systems that enable the development of NVIDIA’s industry-leading networking chips – the highway for the AI revolution.
We're looking for a senior hands-on engineer who combines deep expertise in Python automation and SQL data flows with a passion for large-scale compute environments. If you’re excited about optimizing critical systems, owning the health of compute farms, and driving reliability across cross-functional teams, you’ll feel right at home here.
What you’ll be doing:
Partner with Chip Design teams to solve engineering bottlenecks: Collaborate closely with Design, Verification, and Methodology teams to identify workflow pain points and deliver infrastructure solutions that directly improve chip design productivity and quality.
Build automation infrastructure: Design, build, and maintain robust Python-based automation infrastructure that orchestrates chip design flows, monitoring, and infrastructure tooling.
Drive data integrity and performance: Own the MySQL/SQL schemas, migrations, and data integrity for our infrastructure tools. You will optimize queries and indexes to ensure our databases scale with our massive growth.
Own the platform health: Take ownership of the day-to-day health, capacity, and reliability of our compute farm and storage systems. This includes leading incident response, triage, and implementing long-term fixes to ensure high availability.
What we need to see:
B.Sc. or M.Sc. in Computer Science, Computer Engineering, or a related field.
5+ years of experience in Python infrastructure engineering, specifically focused on automation or tooling for Linux-based compute and storage systems.
Strong SQL/MySQL expertise, including schema design, migrations, query tuning, indexing/partitioning, and performance troubleshooting.
Solid Linux systems fundamentals (processes, networking, filesystems), with a comfort level managing NFS, permissions, and quotas.
Proven incident response capabilities and a track record of owning critical systems with high availability requirements.
Ways to stand out from the crowd:
Familiarity with hardware verification workflows and the specific compute/storage demands of the semiconductor industry.
Hands-on experience with observability stacks for infrastructure and databases (metrics, logs, tracing) and alert tuning.
Exposure to building and maintaining REST services for internal tooling, service accounts, and secrets management.
Proven ability to identify efficiency gaps in infrastructure workflows and deliver impactful automation improvements.
NVIDIA has some of the most forward-thinking and hardworking people in the world working for us. Are you a creative and autonomous engineer who loves a challenge? Come join our team and help us build the future HPC and data centers.