Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port "sub-group transpose reduction" to default path #2266

Open
victor-eds opened this issue Sep 17, 2024 · 0 comments
Open

Port "sub-group transpose reduction" to default path #2266

victor-eds opened this issue Sep 17, 2024 · 0 comments

Comments

@victor-eds
Copy link
Contributor

#2109 explores layout conversion in the advanced path to improve reduction performance (see #1637 for investigation). Porting this to the default path would involve a transformation similar to (after heuristics to check profitability):

  1. Reshape input tensor so no data movement is needed and we can perform reduction of elements within the work-item tt.reshape
  2. Perform reduction within the work-item tt.reduce
  3. Convert layout so a transposition within the sub-group as explained in the investigation is performed triton_gpu.convert_layout
  4. Finalize reduction (within work-item and possibly within the work-group) tt.reduce
  5. Convert back to initial layout triton_gpu.convert_layout

Note 5 can be dropped in case the new layout is beneficial for performance.

@victor-eds victor-eds self-assigned this Sep 17, 2024
@vlad-penkin vlad-penkin added this to the 4.0 [Performance] Core milestone Sep 17, 2024
@victor-eds victor-eds changed the title Port #2109 to default path Port "sub-group transpose reduction" to default path Sep 18, 2024
@victor-eds victor-eds removed their assignment Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants