Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error building extensions #699

Open
lightandshadow68 opened this issue Aug 22, 2024 · 1 comment
Open

Error building extensions #699

lightandshadow68 opened this issue Aug 22, 2024 · 1 comment

Comments

@lightandshadow68
Copy link

I'm attempting to install BasicSR via pip and conda.

After setting the environment variable to enable extension compilation, when I build I receive the following error.

 The detected CUDA version (11.5) mismatches the version that was used to compile
  PyTorch (12.1). Please make sure to use the same CUDA versions.

However, when I check the versions of the drivers and CUDA installed, I receive....

(ml) ubuntu@ip-10-0-83-223:~$ nvidia-smi
Thu Aug 22 18:07:23 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1E.0 Off |                    0 |
| N/A   26C    P8               9W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
(ml) ubuntu@ip-10-0-83-223:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

What's odd is that nvcc and Nvidia-smi do not seem to agree on the version of CUDA installed, or it's referring to the toolkit version, which is different than the actual CUDA api?

@lightandshadow68
Copy link
Author

lightandshadow68 commented Aug 22, 2024

Looks like this is due to manually installing NVDIA drivers and CUDA Toolkit on an EC2 created from a base AMI.

When I use a Ubuntu PyTorch AMI to create an EC2 instance, nvcc matches, but now I'm receiving an error when BasicSR references torchvision.transforms.functional_tensor

  File "/home/ubuntu/ml/GFPGAN/overlap_fb_retouch.py", line 6, in <module>
    from gfpgan import GFPGANer
  File "/home/ubuntu/ml/GFPGAN/gfpgan/__init__.py", line 2, in <module>
    from .archs import *
  File "/home/ubuntu/ml/GFPGAN/gfpgan/archs/__init__.py", line 2, in <module>
    from basicsr.utils import scandir
  File "/opt/conda/lib/python3.10/site-packages/basicsr/__init__.py", line 4, in <module>
    from .data import *
  File "/opt/conda/lib/python3.10/site-packages/basicsr/data/__init__.py", line 22, in <module>
    _dataset_modules = [importlib.import_module(f'basicsr.data.{file_name}') for file_name in dataset_filenames]
  File "/opt/conda/lib/python3.10/site-packages/basicsr/data/__init__.py", line 22, in <listcomp>
    _dataset_modules = [importlib.import_module(f'basicsr.data.{file_name}') for file_name in dataset_filenames]
  File "/opt/conda/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/opt/conda/lib/python3.10/site-packages/basicsr/data/realesrgan_dataset.py", line 11, in <module>
    from basicsr.data.degradations import circular_lowpass_kernel, random_mixed_kernels
  File "/opt/conda/lib/python3.10/site-packages/basicsr/data/degradations.py", line 8, in <module>
    from torchvision.transforms.functional_tensor import rgb_to_grayscale
ModuleNotFoundError: No module named 'torchvision.transforms.functional_tensor'

Seems related to: TencentARC/GFPGAN#539

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant