Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request]: Improve functionality of GradientCheckCallback by recording skipped samples #231

Open
1 of 6 tasks
laserkelvin opened this issue Jun 3, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@laserkelvin
Copy link
Collaborator

Feature/behavior summary

The callback allows us to skip training samples that produce NaNs, which would otherwise break model training instantly. The hope is that these will eventually get picked up in later epochs, but we currently do not have a great way to monitor this.

The feature suggestion would be to refactor the callback to record indices of skipped samples to make histograms for viewing the frequency of skipping per sample. This would hopefully diagnose problematic training runs.

Request attributes

  • Would this be a refactor of existing code?
  • Does this proposal require new package dependencies?
  • Would this change break backwards compatibility?
  • Does this proposal include a new model?
  • Does this proposal include a new dataset?
  • Does this proposal include a new task/workflow?

Related issues

No response

Solution description

Ostensibly, we would need to:

  1. Add a callback global state that appends skipped samples per epoch
  2. Optionally add some logging functionality, so that wandb or related platforms can be used to view the records meaningfully.

Additional notes

No response

@laserkelvin laserkelvin added the enhancement New feature or request label Jun 3, 2024
@laserkelvin laserkelvin self-assigned this Jun 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant