Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Improvement] Change large GRF warnings to trigger by debug flag #2251

Open
Stonepia opened this issue Sep 14, 2024 · 1 comment
Open

Comments

@Stonepia
Copy link
Contributor

Stonepia commented Sep 14, 2024

PR #1654 has been introduced using large GRF mode automatically.

Could we make the cout in these lines be triggered by a debug-only flag so that normal users could safely ignore this? Those should be treated as warnings in my opinion.

if (!is_GRF_mode_specified && n_spills > max_reg_spill) {
std::cout << "(I): Detected " << n_spills
<< " spills, recompiling the kernel using large GRF mode"
<< std::endl;
const std::string new_build_flags =
build_flags_str.append(" -cl-intel-256-GRF-per-thread");
l0_module =
checkSyclErrors(create_module(l0_context, l0_device, binary_ptr,
binary_size, new_build_flags.c_str()));
l0_kernel = checkL0Errors(l0_module);
gpuAssert(zeKernelGetProperties(l0_kernel, &props));
n_spills = props.spillMemSize;
std::cout << "(I): Kernel has now " << n_spills << " spills" << std::endl;
}

xpu  train AlbertForQuestionAnswering         

// We wish those won't exposed to the normal user
(I): Detected 9472 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 512 spills
(I): Detected 20032 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 10816 spills
(I): Detected 33600 spills, recompiling the kernel using large GRF mode
(I): Kernel has now 25408 spills
@Stonepia
Copy link
Contributor Author

I would like to explain my concern as to why we don't set the grf_mode=auto in the triton config from the inductor side.

The reason is that, from the PyTorch inductor side, we are currently trying to keep the same config with CUDA/HIP, so that we would avoid possible unalignments. Large GRF mode for compiling kernels is an optimization for XPU only, thus we would like to hide the complexity from the users familiar with CUDA.

BTW, I am not quite familiar with the differences between different grf_modes. So if there are any concerns, please point out and let's have a discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants