You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The error seems to be related to pixel_values being padded
WARNING:root:libtpu.so and TPU device found. Setting PJRT_DEVICE=TPU.
config.json: 100%|█████████████████████████████████████████████████████████| 3.95k/3.95k [00:00<00:00, 23.8MB/s]
configuration_internvl_chat.py: 100%|██████████████████████████████████████| 3.85k/3.85k [00:00<00:00, 26.1MB/s]
configuration_intern_vit.py: 100%|█████████████████████████████████████████| 5.55k/5.55k [00:00<00:00, 29.6MB/s]
A new version of the following files was downloaded from https://huggingface.co/radna/XLA-InternVL2-8B:
- configuration_intern_vit.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
configuration_internlm2.py: 100%|██████████████████████████████████████████| 7.00k/7.00k [00:00<00:00, 40.8MB/s]
A new version of the following files was downloaded from https://huggingface.co/radna/XLA-InternVL2-8B:
- configuration_internlm2.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/radna/XLA-InternVL2-8B:
- configuration_internvl_chat.py
- configuration_intern_vit.py
- configuration_internlm2.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
modeling_internvl_chat.py: 100%|███████████████████████████████████████████| 16.3k/16.3k [00:00<00:00, 70.4MB/s]
modeling_internlm2.py: 100%|███████████████████████████████████████████████| 61.2k/61.2k [00:00<00:00, 77.4MB/s]
A new version of the following files was downloaded from https://huggingface.co/radna/XLA-InternVL2-8B:
- modeling_internlm2.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
conversation.py: 100%|█████████████████████████████████████████████████████| 15.0k/15.0k [00:00<00:00, 82.2MB/s]
A new version of the following files was downloaded from https://huggingface.co/radna/XLA-InternVL2-8B:
- conversation.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
modeling_intern_vit.py: 100%|██████████████████████████████████████████████| 18.1k/18.1k [00:00<00:00, 75.5MB/s]
A new version of the following files was downloaded from https://huggingface.co/radna/XLA-InternVL2-8B:
- modeling_intern_vit.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
A new version of the following files was downloaded from https://huggingface.co/radna/XLA-InternVL2-8B:
- modeling_internvl_chat.py
- modeling_internlm2.py
- conversation.py
- modeling_intern_vit.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
FlashAttention2 is not installed.
model.safetensors.index.json: 100%|█████████████████████████████████████████| 51.2k/51.2k [00:00<00:00, 637kB/s]
model-00001-of-00004.safetensors: 100%|████████████████████████████████████| 4.94G/4.94G [08:10<00:00, 10.1MB/s]
model-00002-of-00004.safetensors: 100%|████████████████████████████████████| 4.92G/4.92G [01:28<00:00, 55.8MB/s]
model-00003-of-00004.safetensors: 100%|████████████████████████████████████| 4.92G/4.92G [01:25<00:00, 57.2MB/s]
model-00004-of-00004.safetensors: 100%|████████████████████████████████████| 1.38G/1.38G [00:34<00:00, 39.8MB/s]
Downloading shards: 100%|████████████████████████████████████████████████████████| 4/4 [11:40<00:00, 175.20s/it]
Warning: Flash attention is not available, using eager attention instead.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.86it/s]
generation_config.json: 100%|███████████████████████████████████████████████████| 115/115 [00:00<00:00, 679kB/s]
tokenizer_config.json: 100%|███████████████████████████████████████████████| 4.00k/4.00k [00:00<00:00, 26.3MB/s]
tokenization_internlm2.py: 100%|███████████████████████████████████████████| 8.79k/8.79k [00:00<00:00, 54.9MB/s]
A new version of the following files was downloaded from https://huggingface.co/radna/XLA-InternVL2-8B:
- tokenization_internlm2.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
tokenizer.model: 100%|█████████████████████████████████████████████████████| 1.48M/1.48M [00:00<00:00, 13.1MB/s]
added_tokens.json: 100%|███████████████████████████████████████████████████████| 179/179 [00:00<00:00, 2.05MB/s]
special_tokens_map.json: 100%|█████████████████████████████████████████████████| 844/844 [00:00<00:00, 8.83MB/s]
Traceback (most recent call last):
File "/home/kojoe/EasyAnimate/easyanimate/image_caption/template.py", line 132, in <module>
response = model.chat(tokenizer, pixel_values, question, generation_config)
File "/dev/shm/modules/transformers_modules/radna/XLA-InternVL2-8B/746cd35e611234c48f8dc5c61dbe30b5a782a208/modeling_internvl_chat.py", line 356, in chat
generation_output = self.generate(
File "/home/kojoe/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/dev/shm/modules/transformers_modules/radna/XLA-InternVL2-8B/746cd35e611234c48f8dc5c61dbe30b5a782a208/modeling_internvl_chat.py", line 410, in generate
outputs = self.language_model.generate(
File "/home/kojoe/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 2024, in generate
result = self._sample(
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 3038, in _sample
unfinished_sequences = unfinished_sequences & ~stopping_criteria(input_ids, scores)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/stopping_criteria.py", line 511, in __call__
is_done = is_done | criteria(input_ids, scores, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/stopping_criteria.py", line 502, in __call__
is_done = torch.isin(input_ids[:, -1], self.eos_token_id)
RuntimeError: Bad StatusOr access: RESOURCE_EXHAUSTED: XLA:TPU compile permanent error. Ran out of memory in memory space vmem. Used 29.95M of 16.00M vmem. Exceeded vmem capacity by 13.95M.
Program vmem requirement 29.95M:
scoped 29.95M
Largest program allocations in vmem:
1. Size: 29.66M
XLA label: register allocator spill slots call depth 2
Allocation type: scoped
==========================
2. Size: 64.0K
Shape: f32[128,3]{1,0}
Unpadded size: 1.5K
Extra memory due to padding: 62.5K (42.7x expansion)
XLA label: reduce-window.8 = reduce-window(bitcast.1020, bitcast.1021, constant.3067, constant.3067), window={size=1x1x128 pad=0_0x0_0x127_0}, to_apply=AddComputation.5421.clone
Allocation type: scoped
==========================
3. Size: 64.0K
Shape: f32[128,3]{1,0}
Unpadded size: 1.5K
Extra memory due to padding: 62.5K (42.7x expansion)
XLA label: reduce-window.8 = reduce-window(bitcast.1020, bitcast.1021, constant.3067, constant.3067), window={size=1x1x128 pad=0_0x0_0x127_0}, to_apply=AddComputation.5421.clone
Allocation type: scoped
==========================
4. Size: 64.0K
Shape: f32[128,3]{1,0}
Unpadded size: 1.5K
Extra memory due to padding: 62.5K (42.7x expansion)
XLA label: reduce-window.8 = reduce-window(bitcast.1020, bitcast.1021, constant.3067, constant.3067), window={size=1x1x128 pad=0_0x0_0x127_0}, to_apply=AddComputation.5421.clone
Allocation type: scoped
==========================
5. Size: 64.0K
Shape: f32[128,3]{1,0}
Unpadded size: 1.5K
Extra memory due to padding: 62.5K (42.7x expansion)
XLA label: reduce-window.8 = reduce-window(bitcast.1020, bitcast.1021, constant.3067, constant.3067), window={size=1x1x128 pad=0_0x0_0x127_0}, to_apply=AddComputation.5421.clone
Allocation type: scoped
==========================
6. Size: 4.0K
Shape: u8[4096]{0}
Unpadded size: 4.0K
XLA label: reduce-window.8 = reduce-window(bitcast.1020, bitcast.1021, constant.3067, constant.3067), window={size=1x1x128 pad=0_0x0_0x127_0}, to_apply=AddComputation.5421.clone
Allocation type: scoped
==========================
7. Size: 4.0K
Shape: u8[4096]{0}
Unpadded size: 4.0K
XLA label: reduce-window.8 = reduce-window(bitcast.1020, bitcast.1021, constant.3067, constant.3067), window={size=1x1x128 pad=0_0x0_0x127_0}, to_apply=AddComputation.5421.clone
Allocation type: scoped
==========================
8. Size: 4.0K
Shape: u8[4096]{0}
Unpadded size: 4.0K
XLA label: reduce-window.8 = reduce-window(bitcast.1020, bitcast.1021, constant.3067, constant.3067), window={size=1x1x128 pad=0_0x0_0x127_0}, to_apply=AddComputation.5421.clone
Allocation type: scoped
==========================
9. Size: 4.0K
Shape: u8[4096]{0}
Unpadded size: 4.0K
XLA label: reduce-window.8 = reduce-window(bitcast.1020, bitcast.1021, constant.3067, constant.3067), window={size=1x1x128 pad=0_0x0_0x127_0}, to_apply=AddComputation.5421.clone
Allocation type: scoped
==========================
10. Size: 2.0K
Shape: u8[2048]{0}
Unpadded size: 2.0K
XLA label: reduce-window.8 = reduce-window(bitcast.1020, bitcast.1021, constant.3067, constant.3067), window={size=1x1x128 pad=0_0x0_0x127_0}, to_apply=AddComputation.5421.clone
Allocation type: scoped
==========================
11. Size: 2.0K
Shape: u8[2048]{0}
Unpadded size: 2.0K
XLA label: reduce-window.8 = reduce-window(bitcast.1020, bitcast.1021, constant.3067, constant.3067), window={size=1x1x128 pad=0_0x0_0x127_0}, to_apply=AddComputation.5421.clone
Allocation type: scoped
==========================
12. Size: 2.0K
Shape: u8[2048]{0}
Unpadded size: 2.0K
XLA label: reduce-window.8 = reduce-window(bitcast.1020, bitcast.1021, constant.3067, constant.3067), window={size=1x1x128 pad=0_0x0_0x127_0}, to_apply=AddComputation.5421.clone
Allocation type: scoped
==========================
13. Size: 2.0K
Shape: u8[2048]{0}
Unpadded size: 2.0K
XLA label: reduce-window.8 = reduce-window(bitcast.1020, bitcast.1021, constant.3067, constant.3067), window={size=1x1x128 pad=0_0x0_0x127_0}, to_apply=AddComputation.5421.clone
Allocation type: scoped
==========================
To Reproduce
Steps to reproduce the behavior:
template.py
import math
import numpy as np
import torch
import torchvision.transforms as T
from decord import VideoReader, cpu
from PIL import Image
from torchvision.transforms.functional import InterpolationMode
from transformers import AutoModel, AutoTokenizer
import os
import torch_xla
import torch_xla.distributed.spmd as xs
import torch_xla.core.xla_model as xm
from torch_xla import runtime as xr
xr.use_spmd(auto=False)
from torch_xla.experimental.spmd_fully_sharded_data_parallel import (
_prepare_spmd_partition_spec,
SpmdFullyShardedDataParallel as FSDPv2,
)
IMAGENET_MEAN = (0.485, 0.456, 0.406)
IMAGENET_STD = (0.229, 0.224, 0.225)
def build_transform(input_size):
MEAN, STD = IMAGENET_MEAN, IMAGENET_STD
transform = T.Compose([
T.Lambda(lambda img: img.convert('RGB') if img.mode != 'RGB' else img),
T.Resize((input_size, input_size), interpolation=InterpolationMode.BICUBIC),
T.ToTensor(),
T.Normalize(mean=MEAN, std=STD)
])
return transform
def find_closest_aspect_ratio(aspect_ratio, target_ratios, width, height, image_size):
best_ratio_diff = float('inf')
best_ratio = (1, 1)
area = width * height
for ratio in target_ratios:
target_aspect_ratio = ratio[0] / ratio[1]
ratio_diff = abs(aspect_ratio - target_aspect_ratio)
if ratio_diff < best_ratio_diff:
best_ratio_diff = ratio_diff
best_ratio = ratio
elif ratio_diff == best_ratio_diff:
if area > 0.5 * image_size * image_size * ratio[0] * ratio[1]:
best_ratio = ratio
return best_ratio
def dynamic_preprocess(image, min_num=1, max_num=12, image_size=448, use_thumbnail=False):
orig_width, orig_height = image.size
aspect_ratio = orig_width / orig_height
# calculate the existing image aspect ratio
target_ratios = set(
(i, j) for n in range(min_num, max_num + 1) for i in range(1, n + 1) for j in range(1, n + 1) if
i * j <= max_num and i * j >= min_num)
target_ratios = sorted(target_ratios, key=lambda x: x[0] * x[1])
# find the closest aspect ratio to the target
target_aspect_ratio = find_closest_aspect_ratio(
aspect_ratio, target_ratios, orig_width, orig_height, image_size)
# calculate the target width and height
target_width = image_size * target_aspect_ratio[0]
target_height = image_size * target_aspect_ratio[1]
blocks = target_aspect_ratio[0] * target_aspect_ratio[1]
# resize the image
resized_img = image.resize((target_width, target_height))
processed_images = []
for i in range(blocks):
box = (
(i % (target_width // image_size)) * image_size,
(i // (target_width // image_size)) * image_size,
((i % (target_width // image_size)) + 1) * image_size,
((i // (target_width // image_size)) + 1) * image_size
)
# split the image
split_img = resized_img.crop(box)
processed_images.append(split_img)
assert len(processed_images) == blocks
if use_thumbnail and len(processed_images) != 1:
thumbnail_img = image.resize((image_size, image_size))
processed_images.append(thumbnail_img)
return processed_images
def load_image(image_file, input_size=448, max_num=12):
image = Image.open(image_file).convert('RGB')
transform = build_transform(input_size=input_size)
images = dynamic_preprocess(image, image_size=input_size, use_thumbnail=True, max_num=max_num)
pixel_values = [transform(image) for image in images]
pixel_values = torch.stack(pixel_values)
return pixel_values
path = 'radna/XLA-InternVL2-8B'
model = AutoModel.from_pretrained(
path,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
use_flash_attn=True,
trust_remote_code=True,
).eval()
# Define the mesh and partition_spec
num_devices = xr.global_runtime_device_count()
mesh_shape = (num_devices, 1)
device_ids = np.array(range(num_devices))
# To be noted, the mesh must have an axis named 'fsdp', which the weights and activations will be sharded on.
mesh = xs.Mesh(device_ids, mesh_shape, ("fsdp", "model"))
xs.set_global_mesh(mesh)
model = FSDPv2(model)
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True, use_fast=False)
# set the max number of tiles in `max_num`
pixel_values = load_image('./image1.jpg', max_num=1).to(torch.bfloat16).to(xm.xla_device())
generation_config = dict(max_new_tokens=1024, do_sample=True)
xs.mark_sharding(pixel_values, xs.get_global_mesh(), _prepare_spmd_partition_spec(pixel_values, shard_maximal=True))
# single-image single-round conversation (单图单轮对话)
question = '<image>\nPlease describe the image shortly.'
response = model.chat(tokenizer, pixel_values, question, generation_config)
print(f'User: {question}\nAssistant: {response}')
🐛 Bug
The error seems to be related to pixel_values being padded
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Should run the Modified XLA version of InternVL2-8B model at https://huggingface.co/radna/XLA-InternVL2-8B
Environment
Additional context
Reproducable on TPU V2 && V3
The text was updated successfully, but these errors were encountered: