Correct minor issue in the guide "Understanding masking & padding" #1904

palc001 · 2024-08-01T10:52:18Z

Correct the axis for computing the denominator of softmax in the example which creates a TemporalSoftmax class.

Correct the axis for computing the denominator of softmax in the example which creates a `TemporalSoftmax` class.

google-cla · 2024-08-01T10:52:23Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Correct the axis for computing the denominator of softmax in the example which creates a `TemporalSoftmax` class.

palc001 · 2024-08-01T12:06:19Z

@fchollet As per my understanding, a softmax over the time dimension (axis 1) of an input sequence implies that the softmax should be applied along the sequence length dimension individually over each embedding dimension.

To illustrate with a better example:
You'd typically use such a TemporalSoftmax function to compute weights over the given input sequence (to, say, get a single embedding for the entire sequence). Here, you'd want the weights across the sequence to sum to 1 (and not the embeddings for each time-step to sum to 1, as is done in the guide using axis=-1)
Rewriting the code snippet given in the example to compute weights (I've only changed the variable names, everything else remains the same):

inputs = tf.keras.Input(shape=(None,), dtype="int32")
embeddings = tf.keras.layers.Embedding(input_dim=10, output_dim=32, mask_zero=True)(inputs)
weights = tf.keras.layers.Dense(1)(embeddings)
weights = TemporalSoftmax()(weights)

model = tf.keras.Model(inputs, weights)
y = model(np.random.randint(0, 10, size=(32, 100)))

With my suggested correction to axis, you get the weights correctly; and you can further use them to weigh the embeddings for each time-step OR say, to combine the embeddings of all time-steps in a sequence into a single embedding for the sequence.

Edit: Also saw from git blame and PR #1424 that it was previously implemented correctly; unsure as to why it was changed. In case my understanding is incorrect, please let me know.

Update understanding_masking_and_padding.md

2ea5489

Correct the axis for computing the denominator of softmax in the example which creates a `TemporalSoftmax` class.

palc001 requested review from fchollet, MarkDaoust and pcoet as code owners August 1, 2024 10:52

github-actions bot assigned sachinprasadhs Aug 1, 2024

Update understanding_masking_and_padding.ipynb

73541a4

Correct the axis for computing the denominator of softmax in the example which creates a `TemporalSoftmax` class.

palc001 changed the title ~~Update understanding_masking_and_padding.md~~ Correct minor issue in the guide "Understanding masking & padding" Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct minor issue in the guide "Understanding masking & padding" #1904

Correct minor issue in the guide "Understanding masking & padding" #1904

palc001 commented Aug 1, 2024

google-cla bot commented Aug 1, 2024

palc001 commented Aug 1, 2024 •

edited

Loading

Correct minor issue in the guide "Understanding masking & padding" #1904

Are you sure you want to change the base?

Correct minor issue in the guide "Understanding masking & padding" #1904

Conversation

palc001 commented Aug 1, 2024

google-cla bot commented Aug 1, 2024

palc001 commented Aug 1, 2024 • edited Loading

palc001 commented Aug 1, 2024 •

edited

Loading