Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request]: Allow weights transfer and/or restarting from earlier checkpoint with experiment CLI #263

Open
1 of 6 tasks
laserkelvin opened this issue Jul 29, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request training Issues related to model training ux User experience, quality of life changes

Comments

@laserkelvin
Copy link
Collaborator

Feature/behavior summary

To my knowledge, the experiment CLI doesn't support load_from_checkpoint or from_pretrained_encoder type of restarts, which would be very useful for either restarting training (with more epochs, say), or to transfer an encoder to a new task.

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?
  • Does this proposal include a new model?
  • Does this proposal include a new dataset?
  • Does this proposal include a new task/workflow?

Related issues

No response

Solution description

An example solution would be including key/value pairs in a model/task YAML configuration that will let you set the type of reload a user would like to, and where the weights are located.

For example for continuing training:

load_weights:
   method: checkpoint
   type: local
   path: ./path/to/lightning_checkpoint.ckpt

To transfer the weights from a pretrained encoder, and load weights from a wandb artifact:

load_weights:
   method: pretrained
   type: wandb
   path: "<wandb_user>/<project>/<model_name>:v<version>"

The method key would determine whether you would use Task.load_from_checkpoint or Task.from_pretrained_encoder methods. The type can be used to indicate the kind of checkpoint, i.e. a local file or from wandb. In the latter case, it would be a little bit more complicated as you would have would have use the wandb API to initialize a run that doesn't synchronize, download the weight artifact, then map to a load method.

An alternative interface could be to just pass those method names directly:

load_from_checkpoint: "local:./path/to/lightning_checkpoint.ckpt"

and

from_pretrained_encoder: "wandb:<wandb_user>/<project>/<model_name>:v<version>"

Additional notes

No response

@laserkelvin laserkelvin added enhancement New feature or request ux User experience, quality of life changes training Issues related to model training labels Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request training Issues related to model training ux User experience, quality of life changes
Projects
None yet
Development

No branches or pull requests

2 participants