Find your dream job at Australia's leading startups and VCs

Our exceptional communities of founders and investors are constantly seeking passionate individuals like you to join their team. Find your fit in the postings below. Just browsing? Sign up to our newsletter here, and stay up to date on the latest jobs.
companies
Jobs

Deep Learning Performance Architect - Perf Tools

Excelero Storage

Excelero Storage

Software Engineering, IT, Data Science
Shanghai, China
Posted on Sep 19, 2025

We are looking for a first-class Deep Learning Performance architect to join us to shape the performance analysis infrastructures for GPUs. We build cutting-edge analysis tools and visualization frameworks that empower engineers to optimize GPU performance for Deep Learning and HPC workloads—spanning pre-silicon architectural exploration to post-silicon validation and optimization. Your work will directly shape the tools that define how NVIDIA GPUs are analyzed, tuned, and scaled for next-gen AI systems, and impact the next-gen GPUs architectures.

What you'll be doing:

  • Architect Performance Tooling: Develop infrastructure tools/libraries for GPU performance analysis, visualization, and automated workflows used across GPU SW/HW development life cycle.

  • Unlock Architectural Insights: Analyze GPU workloads to identify bottlenecks and define new hardware profiling features that enhance perf debug and profiling capabilities.

  • AI-Powered Automation: Build AI/ML-driven tools to automate performance analysis, generate perf optimization guidance, and improve user experience of profiling infrastructure.

  • Cross-Stack Collaboration: Partner with kernel developers, system software teams, and hardware architects to support performance study, improve CUDA software stack, and co-design performance-centric solutions for current and next-generation GPU architecture

What we need to see:

  • BS+ in Computer Science, Electronic Engineering or related (or equivalent experience)

  • 4+ years of software development

  • Strong software skill in design, coding (C++ and Python), analytical and debugging in low-level program

  • Strong grasp of computer architecture (pipelines, memory hierarchies) and operating system fundamentals

  • Experience with performance modeling, architecture simulation, profiling, and analysis.

  • Self-starter who thrives in dynamic environments and manages competing priorities effectively.

Ways to stand out from the crowd:

  • Experience with building performance debugging and analysis tools on silicon and simulators. Experience of developing application snapshot and replay tool is a big plus.

  • Familiar with CUDA System Software Stack(e.g., CUDA Driver/Runtime APIs), CUDA kernel optimization and understand GPU architecture

  • Familiarity with GPU performance profiling tools like Nsight System, Nsight Compute, NVTX, etc, or experience for developing similar tools for other processors.

  • Practical experience or projects demonstrating AI/ML-based code generation, automated data analysis, or workflow assistants.