-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Labels
DockerRelated to dockerRelated to docker
Description
When running the v25 branch in Docker with GPU support, the container immediately exits on startup with a FileNotFoundError. The error originates from torch.distributed.nn.jit.instantiator when creating a temporary directory under /app/tmp/….
Creating /app/tmp manually (both on the host and inside the container) does not resolve the issue.
Steps to reproduce
- Clean environment and clone the repo:
docker system prune -a
cd ..
rm -rf ebook2audiobook/
git clone -b v25 https://github.com/DrewThomasson/ebook2audiobook.git
cd ebook2audiobook- Use the following
docker-compose.yml(GPU enabled, building locally):
x-gpu-enabled: &gpu-enabled
devices:
- driver: nvidia
count: all
capabilities:
- gpu
x-gpu-disabled: &gpu-disabled
devices: []
services:
ebook2audiobook:
build:
context: .
args:
TORCH_VERSION: cuda128 # Available tags: [cuda121, cuda118, cuda128, rocm, xpu, cpu]
SKIP_XTTS_TEST: "true"
entrypoint: ["python", "app.py", "--script_mode", "full_docker"]
command: []
tty: true
stdin_open: true
ports:
- 7860:7860
deploy:
resources:
reservations:
<<: *gpu-enabled
limits: {}
volumes:
- ./:/app- Build and start:
docker compose up -d
docker compose logs -fActual behavior
The container prints:
v25.11.11 full_docker mode
Traceback (most recent call last):
File "/app/app.py", line 495, in <module>
main()
File "/app/app.py", line 378, in main
import lib.functions as f
File "/app/lib/functions.py", line 48, in <module>
from lib.classes.voice_extractor import VoiceExtractor
File "/app/lib/classes/voice_extractor.py", line 17, in <module>
from lib.classes.background_detector import BackgroundDetector
File "/app/lib/classes/background_detector.py", line 5, in <module>
from pyannote.audio import Model
File "/usr/local/lib/python3.12/site-packages/pyannote/audio/__init__.py", line 29, in <module>
from .core.inference import Inference
File "/usr/local/lib/python3.12/site-packages/pyannote/audio/core/inference.py", line 33, in <module>
from pytorch_lightning.utilities.memory import is_oom_error
File "/usr/local/lib/python3.12/site-packages/pytorch_lightning/__init__.py", line 25, in <module>
from lightning_fabric.utilities.seed import seed_everything # noqa: E402
File "/usr/local/lib/python3.12/site-packages/lightning_fabric/__init__.py", line 35, in <module>
from lightning_fabric.fabric import Fabric # noqa: E402
File "/usr/local/lib/python3.12/site-packages/lightning_fabric/fabric.py", line 38, in <module>
from lightning_fabric.accelerators.accelerator import Accelerator
File "/usr/local/lib/python3.12/site-packages/lightning_fabric/accelerators/__init__.py", line 15, in <module>
from lightning_fabric.accelerators.accelerator import Accelerator
File "/usr/local/lib/python3.12/site-packages/lightning_fabric/accelerators/accelerator.py", line 19, in <module>
from lightning_fabric.accelerators.registry import _AcceleratorRegistry
File "/usr/local/lib/python3.12/site-packages/lightning_fabric/accelerators/registry.py", line 18, in <module>
from lightning_fabric.utilities.exceptions import MisconfigurationException
File "/usr/local/lib/python3.12/site-packages/lightning_fabric/utilities/__init__.py", line 16, in <module>
from lightning_fabric.utilities.apply_func import move_data_to_device
File "/usr/local/lib/python3.12/site-packages/lightning_fabric/utilities/apply_func.py", line 24, in <module>
from lightning_fabric.utilities.imports import _NUMPY_AVAILABLE
File "/usr/local/lib/python3.12/site-packages/lightning_fabric/utilities/imports.py", line 39, in <module>
_TORCHMETRICS_GREATER_EQUAL_1_0_0 = compare_version("torchmetrics", operator.ge, "1.0.0")
File "/usr/local/lib/python3.12/site-packages/lightning_utilities/core/imports.py", line 78, in compare_version
pkg = importlib.import_module(package)
File "/usr/local/lib/python3.12/importlib/__init__.py", line 90, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "/usr/local/lib/python3.12/site-packages/torchmetrics/__init__.py", line 37, in <module>
from torchmetrics import functional # noqa: E402
File "/usr/local/lib/python3.12/site-packages/torchmetrics/functional/__init__.py", line 129, in <module>
from torchmetrics.functional.text._deprecated import _bleu_score as bleu_score
File "/usr/local/lib/python3.12/site-packages/torchmetrics/functional/text/__init__.py", line 50, in <module>
from torchmetrics.functional.text.bert import bert_score
File "/usr/local/lib/python3.12/site-packages/torchmetrics/functional/text/bert.py", line 56, in <module>
from transformers import AutoModel, AutoTokenizer
File "/usr/local/lib/python3.12/site-packages/transformers/generation/utils.py", line 48, in <module>
from ..masking_utils import create_masks_for_generate
File "/usr/local/lib/python3.12/site-packages/transformers/masking_utils.py", line 29, in <module>
from torch.nn.attention.flex_attention import _DEFAULT_SPARSE_BLOCK_SIZE as flex_default_block_size # noqa: N811
File "/usr/local/lib/python3.12/site-packages/torch/nn/attention/flex_attention.py", line 15, in <module>
from torch._dynamo._trace_wrapped_higher_order_op import TransformGetItemToIndex
File "/usr/local/lib/python3.12/site-packages/torch/distributed/nn/jit/instantiator.py", line 21, in <module>
_TEMP_DIR = tempfile.TemporaryDirectory()
File "/usr/local/lib/python3.12/tempfile.py", line 886, in __init__
self.name = mkdtemp(suffix, prefix, dir)
File "/usr/local/lib/python3.12/tempfile.py", line 384, in mkdtemp
_os.mkdir(file, 0o700)
FileNotFoundError: [Errno 2] No such file or directory: '/app/tmp/tmpgbtrn3qt'
The container exits with code 1.
Environment
- Branch:
v25 - Mode:
full_docker - Docker: Docker Compose (v2)
- Base image:
python:3.12 - Build args:
TORCH_VERSION=cuda128,SKIP_XTTS_TEST=true - GPU: NVIDIA (NVIDIA Container Toolkit installed)
- Host OS: Linux (WSL2-based environment)
Question
Could you please check whether:
TMPDIRor any other temp-related env variable is set to/app/tmpin the Dockerfile / code, and- whether there is a recommended temp directory configuration for GPU / PyTorch / transformers in this project?
If you want, I can also test a patch that forces TMPDIR=/tmp or similar in the container entrypoint.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
DockerRelated to dockerRelated to docker