Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drops duplicate edges in non-MultiGraph PLC SGGraph instances #4658

Open
wants to merge 2 commits into
base: branch-24.10
Choose a base branch
from

Conversation

rlratzel
Copy link
Contributor

Graph input with duplicate edges intended for Graph/DiGraph instances resulted in internal PLC SGGraph instances with duplicate edges, which were effectively treated as MultiGraphs and caused incorrect results from algorithms like pagerank.

This PR sets the drop_multi_edges PLC SGGraph ctor option to have PLC remove duplicate edges on SGGraph creation.

The overhead to drop duplicate edges for non-MultiGraphs is negligible, and in the case of a large test graph (wikipedia data, 37.5M nodes, 464.5M edges) resulted in an overall speedup for pagerank going from 12.2 seconds to 10.7 seconds on my workstation, likely due to fewer edges to process.

A test was added that uses pagerank to ensure Graphs vs. MultiGraphs are handled correctly and duplicate edges are dropped as needed. The results when run without drop_multi_edges set:

>       assert actual_pr_for_G == approx(expected_pr_for_G)
E       assert {0: 0.0875795...7955580949783} == approx({0: 0....32 ± 1.8e-07})
E
E         comparison failed. Mismatched elements: 4 / 4:
E         Max absolute difference: 0.08785887916592061
E         Max relative difference: 0.5007959662968462
E         Index | Obtained            | Expected
E         0     | 0.08757955580949783 | 0.17543839772251532 ± 1.8e-07
E         1     | 0.41242048144340515 | 0.32456160227748454 ± 3.2e-07
E         2     | 0.41242048144340515 | 0.32456160227748454 ± 3.2e-07
E         3     | 0.08757955580949783 | 0.17543839772251532 ± 1.8e-07

The same test passes when run with the changes in this PR to set drop_multi_edges.

…ank to ensure Graphs w/out multi edges are created and that MultiGraphs are still created correctly.
@rlratzel rlratzel added bug Something isn't working non-breaking Non-breaking change nx-cugraph labels Sep 19, 2024
@rlratzel rlratzel self-assigned this Sep 19, 2024
@rlratzel rlratzel requested a review from a team as a code owner September 19, 2024 19:54
@@ -702,6 +710,7 @@ def _get_plc_graph(
renumber=False,
do_expensive_check=False,
vertices_array=self._node_ids,
drop_multi_edges=not self.is_multigraph(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. All you did was setting a flag and no performance degradation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working non-breaking Non-breaking change nx-cugraph python
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants