[Feature request]: Allow weights transfer and/or restarting from earlier checkpoint with experiment CLI #263

laserkelvin · 2024-07-29T23:48:49Z

Feature/behavior summary

To my knowledge, the experiment CLI doesn't support load_from_checkpoint or from_pretrained_encoder type of restarts, which would be very useful for either restarting training (with more epochs, say), or to transfer an encoder to a new task.

Request attributes

Would this be a refactor of existing code?
Does this proposal require new package dependencies?
Would this change break backwards compatibility?
Does this proposal include a new model?
Does this proposal include a new dataset?
Does this proposal include a new task/workflow?

Related issues

No response

Solution description

An example solution would be including key/value pairs in a model/task YAML configuration that will let you set the type of reload a user would like to, and where the weights are located.

For example for continuing training:

load_weights:
   method: checkpoint
   type: local
   path: ./path/to/lightning_checkpoint.ckpt

To transfer the weights from a pretrained encoder, and load weights from a wandb artifact:

load_weights:
   method: pretrained
   type: wandb
   path: "<wandb_user>/<project>/<model_name>:v<version>"

The method key would determine whether you would use Task.load_from_checkpoint or Task.from_pretrained_encoder methods. The type can be used to indicate the kind of checkpoint, i.e. a local file or from wandb. In the latter case, it would be a little bit more complicated as you would have would have use the wandb API to initialize a run that doesn't synchronize, download the weight artifact, then map to a load method.

An alternative interface could be to just pass those method names directly:

load_from_checkpoint: "local:./path/to/lightning_checkpoint.ckpt"

and

from_pretrained_encoder: "wandb:<wandb_user>/<project>/<model_name>:v<version>"

Additional notes

No response

The text was updated successfully, but these errors were encountered:

laserkelvin added enhancement New feature or request ux User experience, quality of life changes training Issues related to model training labels Jul 29, 2024

laserkelvin assigned laserkelvin and melo-gonzo Jul 29, 2024

melo-gonzo mentioned this issue Aug 14, 2024

Allow Loading of Checkpoints in Experiments Pipeline #272

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request]: Allow weights transfer and/or restarting from earlier checkpoint with experiment CLI #263

[Feature request]: Allow weights transfer and/or restarting from earlier checkpoint with experiment CLI #263

laserkelvin commented Jul 29, 2024

[Feature request]: Allow weights transfer and/or restarting from earlier checkpoint with experiment CLI #263

[Feature request]: Allow weights transfer and/or restarting from earlier checkpoint with experiment CLI #263

Comments

laserkelvin commented Jul 29, 2024

Feature/behavior summary

Request attributes

Related issues

Solution description

Additional notes