Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct minor issue in the guide "Understanding masking & padding" #1904

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

palc001
Copy link

@palc001 palc001 commented Aug 1, 2024

Correct the axis for computing the denominator of softmax in the example which creates a TemporalSoftmax class.

Correct the axis for computing the denominator of softmax in the example which creates a `TemporalSoftmax` class.
Copy link

google-cla bot commented Aug 1, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Correct the axis for computing the denominator of softmax in the example which creates a `TemporalSoftmax` class.
@palc001 palc001 changed the title Update understanding_masking_and_padding.md Correct minor issue in the guide "Understanding masking & padding" Aug 1, 2024
@palc001
Copy link
Author

palc001 commented Aug 1, 2024

@fchollet As per my understanding, a softmax over the time dimension (axis 1) of an input sequence implies that the softmax should be applied along the sequence length dimension individually over each embedding dimension.

To illustrate with a better example:
You'd typically use such a TemporalSoftmax function to compute weights over the given input sequence (to, say, get a single embedding for the entire sequence). Here, you'd want the weights across the sequence to sum to 1 (and not the embeddings for each time-step to sum to 1, as is done in the guide using axis=-1)
Rewriting the code snippet given in the example to compute weights (I've only changed the variable names, everything else remains the same):

inputs = tf.keras.Input(shape=(None,), dtype="int32")
embeddings = tf.keras.layers.Embedding(input_dim=10, output_dim=32, mask_zero=True)(inputs)
weights = tf.keras.layers.Dense(1)(embeddings)
weights = TemporalSoftmax()(weights)

model = tf.keras.Model(inputs, weights)
y = model(np.random.randint(0, 10, size=(32, 100)))

With my suggested correction to axis, you get the weights correctly; and you can further use them to weigh the embeddings for each time-step OR say, to combine the embeddings of all time-steps in a sequence into a single embedding for the sequence.


Edit: Also saw from git blame and PR #1424 that it was previously implemented correctly; unsure as to why it was changed. In case my understanding is incorrect, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants