Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Run GFPGAN in docker container #103

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

mmenbawy
Copy link

Why we need it:

  • Maintainers can develop in a container
  • Potential users can re-train GFPGAN in a containerized environment
  • Others can try it in an isolated environment by pulling the image and running a container only

Issue: #102

Remarks for your reviewer:

I used my personal dockerhub account to store the docker image. After approving and before merging GFPGAN project can create a free dockerhub account and use it instead.

@mmenbawy mmenbawy changed the title Run GFPGAN in docker container chore: Run GFPGAN in docker container Nov 30, 2021
@marcellodesales
Copy link

marcellodesales commented Dec 28, 2021

hey @mmenbawy, I can't verify this MR... The build in your project is failing and I can't run the examples...

  • I re-wrote the Dockerfile to correctly use python3.8
  • Ran into this issue and I can't run it in a MacOS
GFPGAN_1  |   File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1436, in _write_ninja_file_and_build_library
GFPGAN_1  |     _write_ninja_file_to_build_library(
GFPGAN_1  |   File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1834, in _write_ninja_file_to_build_library
GFPGAN_1  |     cuda_flags = common_cflags + COMMON_NVCC_FLAGS + _get_cuda_arch_flags()
GFPGAN_1  |   File "/usr/local/lib/python3.8/dist-packages/torch/utils/cpp_extension.py", line 1606, in _get_cuda_arch_flags
GFPGAN_1  |     arch_list[-1] += '+PTX'
GFPGAN_1  | IndexError: list index out of range
gfpgan_GFPGAN_1 exited with code 1

@marcellodesales
Copy link

marcellodesales commented Dec 28, 2021

Running Docker-compose Build  And Original Image

  • Same error while running in a regular machine using your image at mostafaelmenbawy/gfpgan:latest
$ docker run -ti -v $PWD/inputs:/app/inputs -v $PWD/results:/app/results -v $PWD/experiments:/app/exps mostafaelmenbawy/gfpgan:latest python3 inference_gfpgan.py --model_path /app/exps/GFPGANv1.pth --test_path /app/inputs/whole_imgs --save_root /apps/results --arch original --channel 1
No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
Traceback (most recent call last):
  File "inference_gfpgan.py", line 7, in <module>
    from basicsr.utils import imwrite
  File "/usr/local/lib/python3.6/dist-packages/basicsr/__init__.py", line 3, in <module>
    from .archs import *
  File "/usr/local/lib/python3.6/dist-packages/basicsr/archs/__init__.py", line 16, in <module>
    _arch_modules = [importlib.import_module(f'basicsr.archs.{file_name}') for file_name in arch_filenames]
  File "/usr/local/lib/python3.6/dist-packages/basicsr/archs/__init__.py", line 16, in <listcomp>
    _arch_modules = [importlib.import_module(f'basicsr.archs.{file_name}') for file_name in arch_filenames]
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.6/dist-packages/basicsr/archs/rrdbnet_arch.py", line 6, in <module>
    from .arch_util import default_init_weights, make_layer, pixel_unshuffle
  File "/usr/local/lib/python3.6/dist-packages/basicsr/archs/arch_util.py", line 13, in <module>
    from basicsr.ops.dcn import ModulatedDeformConvPack, modulated_deform_conv
  File "/usr/local/lib/python3.6/dist-packages/basicsr/ops/dcn/__init__.py", line 1, in <module>
    from .deform_conv import (DeformConv, DeformConvPack, ModulatedDeformConv, ModulatedDeformConvPack, deform_conv,
  File "/usr/local/lib/python3.6/dist-packages/basicsr/ops/dcn/deform_conv.py", line 19, in <module>
    os.path.join(module_path, 'src', 'deform_conv_cuda_kernel.cu'),
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1136, in load
    keep_intermediates=keep_intermediates)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1347, in _jit_compile
    is_standalone=is_standalone)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1445, in _write_ninja_file_and_build_library
    is_standalone=is_standalone)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1834, in _write_ninja_file_to_build_library
    cuda_flags = common_cflags + COMMON_NVCC_FLAGS + _get_cuda_arch_flags()
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/cpp_extension.py", line 1606, in _get_cuda_arch_flags
    arch_list[-1] += '+PTX'
IndexError: list index out of range

@mmenbawy
Copy link
Author

I fixed the pipeline and the error.

The problem with the error was that the docker image was meant to run on GPUs only that's what I used the BASICSR_JIT=True env variable during build time. Now I removed it during building the image to give the freedom of running the image on CPU or on GPU by adding the flag again during run time as described in the README.md

@andreafalzetti
Copy link

It doesn't build anymore :( Does anyone have a solution by any chance?

W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease' is not signed.

@benjaminbrumbaugh
Copy link

This should be merged and maintained

@andreafalzetti
Copy link

It doesn't build anymore :( Does anyone have a solution by any chance?

W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY A4B469963BF863CC
E: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease' is not signed.

Solved with a tip from @mmenbawy:

Try to add the following cmd after the FROM in the Dockerfile

 RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub after the FROM cmd in the Dockerfile

@majedazzam
Copy link

This no longer builds. Could someone please update the Dockerfile? Any help would be much appreciated. Thank you

@doniaa24
Copy link

hello everyone, any idea about how to train GFPGAN model in docker container please !!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants