Internship - Optimizing GPU Utilization for Concurrent Application Execution in Healthcare Systems (Kortrijk, BE) at Barco
Barco · Kortrijk, Belgium · Onsite
- Office in Kortrijk
Barco
Barco designs technology that makes everyday life a little better. Seeing beyond the image, we develop sight, sound, and sharing solutions to help customers work together, share insights, and wow audiences. Healthcare is one of the key markets of Barco. For many years Barco has been contributing to improved healthcare by means of solutions in radiology, mammography, surgery, dermatology, dentistry, pathology etc.
Innovation in healthcare
The Barco Labs Healthcare team is constantly looking for new innovative solutions that push forward the state-of-the-art and can improve healthcare models. This group takes care of the entire innovation cycle: ideation and MVP definition, market evaluation and business case creation, R&D and clinical work for creation of proof of concepts and solutions, market and clinical / regulatory validation of the solutions, business model and business plan creation, up to commercial introduction and early pilot sales.
The task at hand
In modern compute environments, both AI-based and traditional algorithms are increasingly transitioning from CPU to GPU execution. Within our healthcare applications, low latency and deterministic behavior are critical requirements. However, GPU architectures handle core isolation and multithreading differently compared to CPUs, which introduces new challenges.
With the latest generation of hardware, we have access to substantial GPU resources—yet these are often underutilized due to suboptimal configuration or scheduling strategies.
The primary goal of this project is to evaluate various methods for running applications concurrently on GPUs, with a focus on comparing default execution behavior to the use of NVIDIA’s Multi-Process Service (MPS). The study aims to:
- Identify key settings and launch parameters that influence performance under MPS.
- Understand the low-level GPU behaviors that impact concurrency, such as register usage, cache misses, warp stalls and other architectural bottlenecks
The learning process will be based on a combination of literature review of GPU architecture and concurrency models, and empirical measurements using both existing applications and custom-built, minimal test cases.
The outcome of this project will be a systematic evaluation procedure that identifies key performance contributors of the running applications and uses this insight to recommend optimal MPS configurations. This procedure will be tailored to the specific range of GPUs in use (RTX 6000), including: Ampere, ADA and Blackwell architectures.
Qualifications
- Basic proficiency in C and C++ programming
- Working knowledge of Linux and basic scripting (e.g., Python, Bash).
- CUDA experience is not required: it can be learned during the project.
- Familiarity with profiling tools such as Visual Studio Code and NVIDIA Nsight Compute is a plus, but not mandatory—these skills can be developed throughout the project.
- Fluent spoken and written communication in English
Desired Interests and Skills
- A strong interest in low-level hardware and systems programming.
- Motivation to learn how parallel programming with CUDA works in practice.
- Interest in understanding how hardware is abstracted into a generic software model, and how this abstraction impacts performance and programmability.
- Curiosity to explore and understand:
- GPU memory hierarchy and cache behaviour
- Thread scheduling and swapping
- Resource partitioning and utilization
- A drive to deeply understand the internal workings of GPU hardware and how it affects application performance.
Furthermore, you should be a student in a technical discipline, eligible to work at our HQ in Kortrijk Belgium
Apply Now