Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding Projector using TensorBoard callback #20210

Open
miticollo opened this issue Sep 4, 2024 · 4 comments
Open

Embedding Projector using TensorBoard callback #20210

miticollo opened this issue Sep 4, 2024 · 4 comments
Assignees
Labels
stat:awaiting keras-eng Awaiting response from Keras engineer type:Bug

Comments

@miticollo
Copy link

miticollo commented Sep 4, 2024

Environment

  • Python 3.12.4
  • Tensorflow v2.16.1-19-g810f233968c 2.16.2
  • Keras 3.5.0
  • TensorBoard 2.16.2

How to reproduce it?

I tried to visualizing data using the embedding Projector in TensorBoard. So I added the following args to TensorBoard callback:

metadata_filename = "metadata.tsv"
os.makedirs(logs_path, exist_ok=True)

# Save Labels separately on a line-by-line manner.
with open(os.path.join(logs_path, metadata_filename), "w") as f:
    for token in vectorizer.get_vocabulary():
        f.write("{}\n".format(token))

keras.callbacks.TensorBoard(
    log_dir=logs_path,
    embeddings_freq=1,
    embeddings_metadata=metadata_filename
)

Anyway TensorBoard embedding tab only shows this HTML page.

Issues

The above HTML page is returned because dataNotFound is true. This happens because this route (http://localhost:6006/data/plugin/projector/runs) returns an empty JSON. In particular, this route is addressed by this Python function. Under the hood this function tries to find the latest checkpoint. In particular, it gets the path of the latest checkpoint using tf.train.latest_checkpoint. Like doc string states, this TF function finds a TensorFlow (2 or 1.x) checkpoint. Now, TensorBoard callback saves a checkpoint, at the end of the epoch, but it is a Keras checkpoint.

Furthermore, projector_config.pbtxt is written in the wrong place: TensorBoard expects this file in the same place where checkpoints are saved.

Finally, choosing a fixed name is a strong assumption. In my model, tensor associated to Embedding layer had a different name (obviously).

Notes

IMO this feature stopped working when the callback updated to TF 2.0. Indeed, callback for TF 1.x should work. For example, it saves checkpoint using TF format. But when callback was updated to be compatible with TF 2.0 it was used tf.keras.Model.save_weights and not tf.train.Checkpoint: perfectly legit like reported here.

Possible solution

Saving only weights from Embedding layer. Here, you can find an example. To get model, you can use self._model. Plus it is not necessary to specify tensor name because there is only one tensor to save. The only drawback is: how to handle two or more embeddings?

@mehtamansi29
Copy link
Collaborator

HI @miticollo -

Can you provide "metadata.tsv" file or relevant sample file to reproduce this issue ?

@miticollo
Copy link
Author

Look here. It is based on this tutorial.

@mehtamansi29
Copy link
Collaborator

Hi @miticollo -

Thanks for dataset. I am able to reproduce issue. Will dig into the issue and update.

@mehtamansi29 mehtamansi29 added the keras-team-review-pending Pending review by a Keras team member. label Sep 5, 2024
@miticollo
Copy link
Author

I updated the notebook to use keras-nightly tf-nightly tb-nightly

@SamanehSaadat SamanehSaadat added stat:awaiting keras-eng Awaiting response from Keras engineer and removed keras-team-review-pending Pending review by a Keras team member. labels Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting keras-eng Awaiting response from Keras engineer type:Bug
Projects
None yet
Development

No branches or pull requests

3 participants