When I train my PyTorch Lightning model on two GPUs on jupyter lab with strategy=“ddp_notebook”, only two CPUs are used and their usages are 100%. How can I overcome this CPU bottleneck?

Edit: I tested with PyTorchProfiler and it was because of old ssds used on the server

  • @troye888
    link
    311 months ago

    Yup this, if you would like more help we need the code, or at least a minimal viable reproduction scenario.