You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the ShardedEmbeddingBagCollection, I found torchrec explicit make dp lookup as DistributedDataParallel(code here). And I also know inside DistributedModelParallel we have ddp wrapper to warp the non-sharded part of model such as mlp as ddp. And ddp wrapper is also using DistributedDataParallel.
So I am wondering why we choose to explictly wrapping dp lookup instead of letting ddp wrapper in DistributedModelParallel process dp lookup and mlp together? Is there any hidden restriction?
Since DistributedDataParallel is relying on .name_parameters()(code here), I am not sure if overriding .name_parameters() for ShardedEmbeddingBagCollection can enable ddp wrapper in DistributedModelParallel to process dp lookup?
The text was updated successfully, but these errors were encountered:
shijieliu
changed the title
[Question] why we explicit make dp lookup as DistributedDataParallel instead of letting DistributedModelParallel handle it?
[Question] why torchrec explicit make dp lookup as DistributedDataParallel instead of letting DistributedModelParallel handle it?
Mar 27, 2024
Hi, team,
In the
ShardedEmbeddingBagCollection
, I found torchrec explicit make dp lookup asDistributedDataParallel
(code here). And I also know inside DistributedModelParallel we have ddp wrapper to warp the non-sharded part of model such as mlp as ddp. And ddp wrapper is also usingDistributedDataParallel
.So I am wondering why we choose to explictly wrapping dp lookup instead of letting ddp wrapper in
DistributedModelParallel
process dp lookup and mlp together? Is there any hidden restriction?Since
DistributedDataParallel
is relying on.name_parameters()
(code here), I am not sure if overriding.name_parameters()
forShardedEmbeddingBagCollection
can enable ddp wrapper in DistributedModelParallel to process dp lookup?The text was updated successfully, but these errors were encountered: