[Feature Improvement] Change large GRF warnings to trigger by debug flag #2251

Stonepia · 2024-09-14T05:52:38Z

PR #1654 has been introduced using large GRF mode automatically.

Could we make the cout in these lines be triggered by a debug-only flag so that normal users could safely ignore this? Those should be treated as warnings in my opinion.

intel-xpu-backend-for-triton/third_party/intel/backend/driver.c

Lines 188 to 201 in 614efe2

    
           if (!is_GRF_mode_specified && n_spills > max_reg_spill) { 
        
             std::cout << "(I): Detected " << n_spills 
        
                       << " spills, recompiling the kernel using large GRF mode" 
        
                       << std::endl; 
        
             const std::string new_build_flags = 
        
                 build_flags_str.append(" -cl-intel-256-GRF-per-thread"); 
        
             l0_module = 
        
                 checkSyclErrors(create_module(l0_context, l0_device, binary_ptr, 
        
                                               binary_size, new_build_flags.c_str())); 
        
             l0_kernel = checkL0Errors(l0_module); 
        
             gpuAssert(zeKernelGetProperties(l0_kernel, &props)); 
        
             n_spills = props.spillMemSize; 
        
             std::cout << "(I): Kernel has now " << n_spills << " spills" << std::endl; 
        
           }

xpu  train AlbertForQuestionAnswering         

// We wish those won't exposed to the normal user
(I): Detected 9472 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 512 spills
(I): Detected 20032 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 10816 spills
(I): Detected 33600 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 25408 spills

The text was updated successfully, but these errors were encountered:

Stonepia · 2024-09-14T06:03:56Z

I would like to explain my concern as to why we don't set the grf_mode=auto in the triton config from the inductor side.

The reason is that, from the PyTorch inductor side, we are currently trying to keep the same config with CUDA/HIP, so that we would avoid possible unalignments. Large GRF mode for compiling kernels is an optimization for XPU only, thus we would like to hide the complexity from the users familiar with CUDA.

BTW, I am not quite familiar with the differences between different grf_modes. So if there are any concerns, please point out and let's have a discussion.

Stonepia mentioned this issue Sep 14, 2024

[E2E] Warnings on using large GRF mode intel/torch-xpu-ops#919

Open

Stonepia assigned etiotto and unassigned etiotto Sep 14, 2024

vlad-penkin added this to the 0.3 [Triton] Language and Runtime milestone Sep 16, 2024

vlad-penkin added enhancement New feature or request upstream: pytorch dependencies: pytorch labels Sep 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Improvement] Change large GRF warnings to trigger by debug flag #2251

[Feature Improvement] Change large GRF warnings to trigger by debug flag #2251

Stonepia commented Sep 14, 2024 •

edited

Loading

Stonepia commented Sep 14, 2024

[Feature Improvement] Change large GRF warnings to trigger by debug flag #2251

[Feature Improvement] Change large GRF warnings to trigger by debug flag #2251

Comments

Stonepia commented Sep 14, 2024 • edited Loading

Stonepia commented Sep 14, 2024

Stonepia commented Sep 14, 2024 •

edited

Loading