Fails to load saved model : Trying to set a tensor of shape torch.Size([1376, 4096]) in "qweight" (which has shape torch.Size([4096, 1376])), this look incorrect. #1407

kranipa · 2024-03-21T15:58:29Z

Loading saved model runs into following error
It also takes a very long time to run and save quantized models.

2024-03-21 08:48:58 [INFO] loading weights file models/4_bit_llama2-rtn/model.safetensors
2024-03-21 08:48:58 [ERROR] Trying to set a tensor of shape torch.Size([1376, 4096]) in "qweight" (which has shape torch.Size([4096, 1376])), this look incorrect.
2024-03-21 08:48:58 [ERROR] Saved low bit model loading failed, please check your model.

Tried following example.

import torch
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, RtnConfig, GPTQConfig, AwqConfig

model_path = "meta-llama/Llama-2-7b-chat-hf" # your_pytorch_model_path_or_HF_model_name
saved_dir = "models/4_bit_llama2-rtn" # your_saved_model_dir
#model_path  = "Intel/neural-chat-7b-v3-3" 
#saved_dir = "models/4_bit_neural_chat_7b-v3-3-rtn"
# quant
woq_config = RtnConfig(bits=4, compute_dtype="int8", scale_dtype='fp32', group_size=32)
model = AutoModelForCausalLM.from_pretrained(model_path, 
                                            device_map='cpu',
                                            torch_dtype=torch.float16,
                                            quantization_config=woq_config, 
                                            trust_remote_code=True,
                                            use_neural_speed=False)
# save quant model
model.save_pretrained(saved_dir)
load quant model
loaded_model = AutoModelForCausalLM.from_pretrained(saved_dir,trust_remote_code = True)

intel-extension-for-transformers ==1.4rc2.dev8+g494a5712fa2
neural-compressor==2.4.1
neural-speed==0.4.dev21+g0ec1a6e

The text was updated successfully, but these errors were encountered:

intellinjun · 2024-03-26T00:44:06Z

model = AutoModelForCausalLM.from_pretrained(model_path, device_map='cpu', torch_dtype=torch.float16, quantization_config=woq_config, trust_remote_code=True, _use_neural_speed=False_)
Do you want to use neural_speed? If yes, try to use neural speed = True.

kranipa · 2024-03-26T13:26:56Z

Thank you for the response.

using use_neural_speed=True save function doesnt work.

I get following error

AttributeError: 'Model' object has no attribute 'save_pretrained'

can you share an example how to save quantized model ( Model object.) with neural_speed

kevinintel · 2024-03-26T14:07:12Z

It looks like load/save mismatch, can you try to use latest commit instead of g494a5712fa2 and set use_neural_speed=False?

kranipa · 2024-03-28T10:41:09Z

Hi, Thank you. Saving works, however loading the saved model leads to following error


    raise ValueError(
ValueError: Unknown quantization type, got rtn - supported types are: ['awq', 'bitsandbytes_4bit', 'bitsandbytes_8bit', 'gptq', 'aqlm']

following is the code snippet

import torch
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, RtnConfig, GPTQConfig, AwqConfig


model_path = "meta-llama/Llama-2-7b-chat-hf" # your_pytorch_model_path_or_HF_model_name
saved_dir = "models/4_bit_llama2-rtn" # your_saved_model_dir
#model_path  = "Intel/neural-chat-7b-v3-3" 
#saved_dir = "models/4_bit_neural_chat_7b-v3-3-rtn"
# quant
woq_config = RtnConfig(bits=4)
model = AutoModelForCausalLM.from_pretrained(model_path, 
                                            device_map='cpu',
                                            #torch_dtype=torch.float16,
                                            quantization_config=woq_config, 
                                            trust_remote_code=True,
                                            use_neural_speed=False)
# save quant model
model.save_pretrained(saved_dir)
#load quant model
loaded_model = AutoModelForCausalLM.from_pretrained(saved_dir,trust_remote_code = True)

PenghuiCheng · 2024-04-04T07:41:04Z

@kranipa , This issue is caused by mismatch the version of ITREX and neural-compressor. You can use neural-compressor version 2.5.1 and try it again. ITREX 1.4 is released now, Please try it. thanks very much.

kranipa · 2024-04-15T16:36:12Z

okay , thank you.

PhzCode · 2024-05-31T02:03:48Z

@kranipa Did you get it to run? I'm having the same problem.

PenghuiCheng · 2024-06-20T04:10:28Z

@PhzCode , could you post your code and let me try to reproduce it. thanks very much.

kevinintel assigned intellinjun Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fails to load saved model : Trying to set a tensor of shape torch.Size([1376, 4096]) in "qweight" (which has shape torch.Size([4096, 1376])), this look incorrect. #1407

Fails to load saved model : Trying to set a tensor of shape torch.Size([1376, 4096]) in "qweight" (which has shape torch.Size([4096, 1376])), this look incorrect. #1407

kranipa commented Mar 21, 2024 •

edited

Loading

intellinjun commented Mar 26, 2024

kranipa commented Mar 26, 2024

kevinintel commented Mar 26, 2024

kranipa commented Mar 28, 2024

PenghuiCheng commented Apr 4, 2024

kranipa commented Apr 15, 2024

PhzCode commented May 31, 2024

PenghuiCheng commented Jun 20, 2024

Fails to load saved model : Trying to set a tensor of shape torch.Size([1376, 4096]) in "qweight" (which has shape torch.Size([4096, 1376])), this look incorrect. #1407

Fails to load saved model : Trying to set a tensor of shape torch.Size([1376, 4096]) in "qweight" (which has shape torch.Size([4096, 1376])), this look incorrect. #1407

Comments

kranipa commented Mar 21, 2024 • edited Loading

intellinjun commented Mar 26, 2024

kranipa commented Mar 26, 2024

kevinintel commented Mar 26, 2024

kranipa commented Mar 28, 2024

PenghuiCheng commented Apr 4, 2024

kranipa commented Apr 15, 2024

PhzCode commented May 31, 2024

PenghuiCheng commented Jun 20, 2024

kranipa commented Mar 21, 2024 •

edited

Loading