When Flash Attention 2 is used and "use_dora = True", errored out: "RuntimeError: FlashAttention only support fp16 and bf16 data type" #1013

rohhro · 2024-09-11T08:42:08Z

When FA2 is enabled ("FA2=True" shows up when tuning),

"Unsloth 2024.8: Fast Llama patching. Transformers = 4.44.2.
\ /| GPU: NVIDIA GeForce RTX 4090. Max memory: 23.617 GB. Platform = Linux.
O^O/ _/ \ Pytorch: 2.4.0. CUDA = 8.9. CUDA Toolkit = 12.1.
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.27.post2. FA2 = True]"

and "use_dora = True," in the script,

it always errors out "RuntimeError: FlashAttention only support fp16 and bf16 data type".
And there is no way to disable FA2 in the script - I have tried many FA2 configs in the script.

The only way to use dora is to use Unsloth in a env which has no FA2 installed.

danielhanchen · 2024-09-14T08:31:13Z

@rohhro Sorry on the delay! Did you use bf16 = True or fp16 = True in the trainer?

rohhro · 2024-09-14T15:28:47Z

@rohhro Sorry on the delay! Did you use bf16 = True or fp16 = True in the trainer?

I have tried bf16 = True or fp16 = True.
Same error in both cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When Flash Attention 2 is used and "use_dora = True", errored out: "RuntimeError: FlashAttention only support fp16 and bf16 data type" #1013

When Flash Attention 2 is used and "use_dora = True", errored out: "RuntimeError: FlashAttention only support fp16 and bf16 data type" #1013

rohhro commented Sep 11, 2024

danielhanchen commented Sep 14, 2024

rohhro commented Sep 14, 2024

When Flash Attention 2 is used and "use_dora = True", errored out: "RuntimeError: FlashAttention only support fp16 and bf16 data type" #1013

When Flash Attention 2 is used and "use_dora = True", errored out: "RuntimeError: FlashAttention only support fp16 and bf16 data type" #1013

Comments

rohhro commented Sep 11, 2024

danielhanchen commented Sep 14, 2024

rohhro commented Sep 14, 2024