Error building extensions #699

lightandshadow68 · 2024-08-22T18:15:28Z

I'm attempting to install BasicSR via pip and conda.

After setting the environment variable to enable extension compilation, when I build I receive the following error.

 The detected CUDA version (11.5) mismatches the version that was used to compile
  PyTorch (12.1). Please make sure to use the same CUDA versions.

However, when I check the versions of the drivers and CUDA installed, I receive....

(ml) ubuntu@ip-10-0-83-223:~$ nvidia-smi
Thu Aug 22 18:07:23 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1E.0 Off |                    0 |
| N/A   26C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
(ml) ubuntu@ip-10-0-83-223:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

What's odd is that nvcc and Nvidia-smi do not seem to agree on the version of CUDA installed, or it's referring to the toolkit version, which is different than the actual CUDA api?

The text was updated successfully, but these errors were encountered:

lightandshadow68 · 2024-08-22T22:52:42Z

Looks like this is due to manually installing NVDIA drivers and CUDA Toolkit on an EC2 created from a base AMI.

When I use a Ubuntu PyTorch AMI to create an EC2 instance, nvcc matches, but now I'm receiving an error when BasicSR references torchvision.transforms.functional_tensor

  File "/home/ubuntu/ml/GFPGAN/overlap_fb_retouch.py", line 6, in <module>
    from gfpgan import GFPGANer
  File "/home/ubuntu/ml/GFPGAN/gfpgan/__init__.py", line 2, in <module>
    from .archs import *
  File "/home/ubuntu/ml/GFPGAN/gfpgan/archs/__init__.py", line 2, in <module>
    from basicsr.utils import scandir
  File "/opt/conda/lib/python3.10/site-packages/basicsr/__init__.py", line 4, in <module>
    from .data import *
  File "/opt/conda/lib/python3.10/site-packages/basicsr/data/__init__.py", line 22, in <module>
    _dataset_modules = [importlib.import_module(f'basicsr.data.{file_name}') for file_name in dataset_filenames]
  File "/opt/conda/lib/python3.10/site-packages/basicsr/data/__init__.py", line 22, in <listcomp>
    _dataset_modules = [importlib.import_module(f'basicsr.data.{file_name}') for file_name in dataset_filenames]
  File "/opt/conda/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/opt/conda/lib/python3.10/site-packages/basicsr/data/realesrgan_dataset.py", line 11, in <module>
    from basicsr.data.degradations import circular_lowpass_kernel, random_mixed_kernels
  File "/opt/conda/lib/python3.10/site-packages/basicsr/data/degradations.py", line 8, in <module>
    from torchvision.transforms.functional_tensor import rgb_to_grayscale
ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor'

Seems related to: TencentARC/GFPGAN#539

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error building extensions #699

Error building extensions #699

lightandshadow68 commented Aug 22, 2024

lightandshadow68 commented Aug 22, 2024 •

edited

Loading

Error building extensions #699

Error building extensions #699

Comments

lightandshadow68 commented Aug 22, 2024

lightandshadow68 commented Aug 22, 2024 • edited Loading

lightandshadow68 commented Aug 22, 2024 •

edited

Loading