Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring back loss information for multiple outputs #20023

Merged
merged 7 commits into from
Jul 24, 2024

Conversation

james77777778
Copy link
Contributor

@james77777778 james77777778 commented Jul 21, 2024

Closes #18993
Closes #19219
Related to #18467

This is an important feature, especially for a multi-modal model, that is currently missing in Keras 3 compared to Keras 2.
The implementation is quite simple by adding the Mean metric in CompileLoss.

All TODOs in model_tests.py have been resolved.

A simple demo:

import numpy as np

from keras import backend
from keras import layers
from keras import models

np.random.seed(42)
x = np.random.rand(8, 3)
y1 = np.random.rand(8, 1)
y2 = np.random.randint(0, 2, (8, 1))

inputs = layers.Input(shape=(3,), name="inputs")
output_a = layers.Dense(1, name="a")(inputs)
output_b = layers.Dense(1, name="b")(inputs)
model = models.Model(inputs, {"a": output_a, "b": output_b})
init_weights = model.get_weights()
model.compile(
    optimizer="sgd",
    loss={"a": "mean_squared_error", "b": "binary_crossentropy"},
)
if backend.backend() != "numpy":
    model.fit(x, (y1, y2), batch_size=2, epochs=1, shuffle=False, verbose=1)
model.evaluate(x, (y1, y2), batch_size=2, verbose=1)

Got this:

4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 31ms/step - a_loss: 0.6758 - b_loss: 6.0316 - loss: 6.7074
4/4 ━━━━━━━━━━━━━━━━━━━━ 0s 10ms/step - a_loss: 0.5584 - b_loss: 6.0045 - loss: 6.5630

EDITED:
I have introduced a new scope (SymbolicScope) to control the behavior in compute_loss. The reason is that self.losses will break when using backend.compute_output_spec. Some edge cases in CI also require this.

@codecov-commenter
Copy link

codecov-commenter commented Jul 21, 2024

Codecov Report

Attention: Patch coverage is 94.20290% with 4 lines in your changes missing coverage. Please review.

Project coverage is 79.27%. Comparing base (36a0628) to head (603525c).
Report is 2 commits behind head on master.

Files Patch % Lines
keras/src/trainers/compile_utils.py 88.00% 0 Missing and 3 partials ⚠️
keras/src/backend/numpy/trainer.py 75.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #20023      +/-   ##
==========================================
+ Coverage   79.26%   79.27%   +0.01%     
==========================================
  Files         500      501       +1     
  Lines       46842    46890      +48     
  Branches     8626     8637      +11     
==========================================
+ Hits        37127    37172      +45     
  Misses       7977     7977              
- Partials     1738     1741       +3     
Flag Coverage Δ
keras 79.12% <94.20%> (+0.01%) ⬆️
keras-jax 62.45% <79.71%> (+0.01%) ⬆️
keras-numpy 57.54% <78.26%> (+0.04%) ⬆️
keras-tensorflow 63.80% <76.81%> (+0.04%) ⬆️
keras-torch 62.52% <84.05%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@james77777778 james77777778 marked this pull request as draft July 21, 2024 11:12
@james77777778 james77777778 force-pushed the multiple_outputs_log branch 2 times, most recently from b0a318c to 6da46d3 Compare July 21, 2024 14:05
@james77777778 james77777778 marked this pull request as ready for review July 21, 2024 15:04
@james77777778 james77777778 marked this pull request as draft July 21, 2024 15:52
@james77777778
Copy link
Contributor Author

Still some bugs need to be resolved.
Will update this PR tomorrow.

@james77777778 james77777778 marked this pull request as ready for review July 22, 2024 03:01
@james77777778
Copy link
Contributor Author

This PR should be ready now.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

This is an important feature, especially for a multi-modal model

What's wrong with just letting users specify what they want to monitor via the metrics argument?

I think that if we're going to do this, we should find a solution that only adds a minimal amount of complexity, e.g. if the loss is non-unary, add the loss configuration to CompileMetrics.


# If in symbolic scope, skip `self.losses` to ensure we don't access
# any variables. Otherwise, it might break.
if not in_symbolic_scope():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels quite brittle -- compute_loss is a method that is intended to be overridden by users in various cases, and those users should not need to be aware of SymbolicScope. Why is it necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a workaround specific to the torch backend.
The root cause is that backend.get_stateless_scope().get_current_value(variable) in _get_regularization_losses in self.losses returns None, which breaks the following computation within backend.compute_output_spec.

Let me try to fix it another way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved the logic to Layer._get_regularization_losses.
This should be more robust now.

@james77777778
Copy link
Contributor Author

james77777778 commented Jul 22, 2024

What's wrong with just letting users specify what they want to monitor via the metrics argument?

It is fine if the loss computation is not intensive; the redundancy in CompileMetrics is negligible.

import numpy as np

from keras import backend
from keras import layers
from keras import models

np.random.seed(42)
x = np.random.rand(8, 3)
y1 = np.random.rand(8, 1)
y2 = np.random.randint(0, 2, (8, 1))

inputs = layers.Input(shape=(3,), name="inputs")
output_a = layers.Dense(1, name="a")(inputs)
output_b = layers.Dense(1, name="b")(inputs)
model = models.Model(inputs, {"a": output_a, "b": output_b})
model.compile(
    optimizer="sgd",
    loss={"a": "mean_squared_error", "b": "binary_crossentropy"},
    metrics={"a": "mean_squared_error", "b": "binary_crossentropy"},
)
if backend.backend() != "numpy":
    model.fit(x, (y1, y2), batch_size=2, epochs=1, shuffle=False, verbose=1)
logs = model.evaluate(x, (y1, y2), batch_size=2, verbose=1, return_dict=True)

However, things become noticeably complicated if we override compute_loss for an intensive loss computation like this one in KerasCV:
https://github.com/keras-team/keras-cv/blob/master/keras_cv/src/models/object_detection/yolo_v8/yolo_v8_detector.py#L534-L584
There is a YOLOV8LabelEncoder which is computation-intensive.

Additionally, it is inconvenient to require users to implement almost the same logic in compute_metrics to monitor both box loss and classification loss.

I think that if we're going to do this, we should find a solution that only adds a minimal amount of complexity, e.g. if the loss is non-unary, add the loss configuration to CompileMetrics.

I don't think adding the loss config to CompileMetrics resolves the issue of redundant loss computation.
The reason is that the information about multiple losses is exclusive in CompileLoss without recomputation, which is why we have to add the logic there.

@james77777778
Copy link
Contributor Author

james77777778 commented Jul 22, 2024

@fchollet
I just spotted a missing training bug in TorchWrapper and fixed it. The corresponding tests have been included.

Please let me know if this PR is a good fit or not for Keras3.
If we don't want this multi-output logs feature, I can separate the TorchWrapper fix into another PR.

Just pinging some comments for visibility:

There is a comment mentioning the clarity of logs: #19159 (comment)
If this is the case, it should be easy to provide an option to turn off multi-output logs.
Something like Model.fit(disable_multi_output_logs=True)

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest update, I think this change can be merged. Thank you for the contribution!

@google-ml-butler google-ml-butler bot added kokoro:force-run ready to pull Ready to be merged into the codebase labels Jul 24, 2024
@fchollet fchollet merged commit f6a81cc into keras-team:master Jul 24, 2024
6 checks passed
@google-ml-butler google-ml-butler bot removed awaiting review ready to pull Ready to be merged into the codebase kokoro:force-run labels Jul 24, 2024
@james77777778 james77777778 deleted the multiple_outputs_log branch July 24, 2024 05:28
@ghsanti
Copy link
Contributor

ghsanti commented Jul 25, 2024

this is just a random comment for the most part but shouldn't keras.utils.set_random_seed() be used instead of numpy's random seed since the Dense layers' kernel initializer will not allow for reproducibility otherwise unless explicitly added.

also should use lists instead of dicts imho, as it makes it confusing ? since it's .fit with (...)

@james77777778
Copy link
Contributor Author

Hey @ghsanti
I'm unsure what you mean.
If you are talking about the code for the simple demo, it's just a demo to showcase this PR.

@ghsanti
Copy link
Contributor

ghsanti commented Jul 26, 2024

Hey @ghsanti I'm unsure what you mean. If you are talking about the code for the simple demo, it's just a demo to showcase this PR.

i see it was for showing the losses, ty for the reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Assigned Reviewer
Development

Successfully merging this pull request may close these issues.

[Feature Request] Re-enable multi-output logs loss tracker for multiple model outputs
5 participants