[WIP] Use `privateuseone` dispatch key #7705

will-cromar · 2024-07-17T18:55:48Z

Following https://github.com/bdhirsh/pytorch_open_registration_example/blob/master/cpp_extensions/open_registration_extension.cpp as an example for C++ code.

With this change, we can actually create XLATensor2s using the "jax" device without explicitly overriding the dispatcher:

>>> torch.tensor([0], device='foo:0')
XLATensor2(..., device='meta', size=(1,), dtype=torch.int64)
>>> torch.tensor([0], device='jax:0') + torch.tensor([2], device='jax:0').numpy()
tensor([2])

will-cromar · 2024-07-17T23:24:23Z

This looks really promising so far.

We can at the very least get better device semantics by implementing torch.ops.aten._to_copy in Python and registering it with torch.library.impl to privateuseone. We can then rename privateuseone to anything more user-friendly (jax in my example) with torch.utils.rename_privateuse1_backend. Instead of having to wrap all tensor creation with a mode or using our custom to_xla utility, you could use tensor.to or create tensors directly on our device with jax:0, e.g.

>>> import torch
>>> import torch_xla2.custom_device
>>> torch.tensor([0]).to('jax:0')
XLATensor2(..., device='meta', size=(1,), dtype=torch.int64)
>>> torch.tensor([0], device='jax:0')
XLATensor2(..., device='meta', size=(1,), dtype=torch.int64)

Even that is a nice improvement.

What I'm experimenting with now is actually setting our XlaTensor2's device to jax (instead of meta) and effectively removing __torch_dispatch__ and relying on the torch dispatcher. I also registered all of the ops in jaten with torch.library.impl. What's really interesting here is that our lowerings actually still get passed our Python subclass XlaTensor2, so we can easily pull out our wrapped JAX array.

The problem is if I try to print an XlaTensor2 when it has the jax:0 device, I get this error:

>>> torch.tensor([0], device='jax:0').cpu()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NotImplementedError: Could not run 'aten::reshape' with arguments from the 'Autogradjax' backend. [...] 'aten::reshape' is only available for these backends: [list of every dispatch key that does not include an AutogradPrivateUse1....]`.

Why does printing require an reshape autograd implementation? No idea. More interestingly, why haven't we registered one? Looking at the PrivateUse1 doc it looks like we should be getting the autograd implementations for "free" with TORCH_LIBRARY_IMPL. Importantly, I'm using the Python version torch.library.impl which may not do the same thing.

Using the torch dispatcher instead of __torch_dispatch__ might help us correctly handle some code that's in C++ (e.g. the DDP implementation) or keep Dynamo from trying to trace into our Python dispatch logic. I'll keep investigating.

Base automatically changed from wcromar/fix-incorrect-registration to master July 17, 2024 21:30

will-cromar added 2 commits July 17, 2024 21:46

[WIP] Use privateuseone dispatch key

160a4d2

Rely on PyTorch dispatcher

146787b

will-cromar force-pushed the wcromar/tx2-privateuse1 branch from 8b8e05f to 146787b Compare July 17, 2024 21:46

qihqi self-requested a review September 5, 2024 23:11

qihqi mentioned this pull request Sep 12, 2024

Use a different str for device instead of meta #7995

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Use `privateuseone` dispatch key #7705

[WIP] Use `privateuseone` dispatch key #7705

will-cromar commented Jul 17, 2024 •

edited

Loading

will-cromar commented Jul 17, 2024

[WIP] Use privateuseone dispatch key #7705

Are you sure you want to change the base?

[WIP] Use privateuseone dispatch key #7705

Conversation

will-cromar commented Jul 17, 2024 • edited Loading

will-cromar commented Jul 17, 2024

[WIP] Use `privateuseone` dispatch key #7705

[WIP] Use `privateuseone` dispatch key #7705

will-cromar commented Jul 17, 2024 •

edited

Loading