PROTON-2818: Move epoll proctor connection logic to a task thread. #427

cliffjansen · 2024-04-22T14:43:04Z

This patch moves connection logic that can possibly block from the calling thread to the first actual scheduling of the task itself. It otherwise makes no change to the existing logic. Including in the following cases:

The trimmed "if (conn->disconnected)..." bits are code simplifications where the value of "disconnected" cannot have changed.

Similarly, the schedule_if_inactive() calls are trimmed as the inactive test is necessarily false with the newly added task in the task list.

astitcher · 2024-04-22T21:41:14Z

c/src/proactor/epoll.c

@@ -1139,6 +1141,21 @@ static pn_event_batch_t *pconnection_process(pconnection_t *pc, uint32_t events,
  }
  if (sched_ready) schedule_done(&pc->task);

+  if (pc->first_schedule) {
+    // Normal case: resumed logic from pn_proactor_connect2.
+    // But possible tie: pn_connection_wake() or pn_proactor_disconnect().


typo? "possibility of"?

Not a typo but obviously not clear. While the original logic is preserved as much as possible, the dropping of the task lock and context switch allows competitor threads that were not possible prior to this change. Either of those two calls are possible from an arbitrary thread between the setting of first_schedule and arriving at this code.

A comment which doesn't make sense on its own is obviously not helpful. I will try to rework the comments and code structure for clarity on their own.

astitcher

In general if this works and passes the tests it's fine. But I feel that we've added (yet another) state in the implicit lifecycle state machine of connections and raw_connections. It would be easier to understand the lifecycle IMO if this state machine was explicit instead of being represented by ostensibly orthogonal booleans which I suspect are not truely orthogonal.

astitcher · 2024-04-22T22:01:44Z

c/src/proactor/epoll_raw_connection.c

+    if (rc->first_schedule) {
+      // Normal case: resumed logic from pn_proactor_raw_connect.
+      // But possible tie: pn_raw_connection_wake()
+      // Defer wake check until getaddrinfo is done.
+      rc->first_schedule = false;
+      assert(!events); // No socket yet.
+      praw_connection_first_connect_lh(rc);  // Drops and reacquires lock.
+      if (rc->psocket.epoll_io.fd != -1 && !pni_task_wake_pending(&rc->task)) {
+        unlock(&rc->task.mutex);
+        return NULL;
+      }
+    }


I wonder if this code should be the first piece of code in the possibilities, as it is the first that should happen; currently the logi cto kick off the connect is first; but now the logic to do the lookup must be earlier in the lifecycle of the connection so for clarity it should be the first condition in the sequence (unless the semantics mean that this doesn't work for some reason).

Hopefully addressed in the reworked version.

Agreed about the lifecycle state issue. This proposed fix strives for the minimal code logic changes to move the blocking activity to a different thread. The subsequent "real fix" for the parent JIRA will necessarily introduce a new state (presumably with early cancel option compared to the current blocked-until-done). The initiating of the getaddrinfo call will also presumably be sensibly moved back to the pn_xxx_connect call to avoid a pointless thread switch and the first_call boolean will have no purpose.

astitcher · 2024-05-06T18:32:53Z

@cliffjansen Thanks for these changes - this is definitely clearer than the original change.
I'd still love a clearer lifecycle state machine but this will do for now.

astitcher · 2024-05-30T15:07:34Z

Merged

PROTON-2818: Move epoll proctor connection logic to a task thread.

079bf64

astitcher reviewed Apr 22, 2024

View reviewed changes

astitcher approved these changes Apr 22, 2024

View reviewed changes

PROTON-2818: clearer code and comments.

7a035ae

astitcher closed this May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PROTON-2818: Move epoll proctor connection logic to a task thread. #427

PROTON-2818: Move epoll proctor connection logic to a task thread. #427

cliffjansen commented Apr 22, 2024

astitcher Apr 22, 2024 •

edited

Loading

cliffjansen May 6, 2024

astitcher left a comment

astitcher Apr 22, 2024

cliffjansen May 6, 2024

astitcher commented May 6, 2024

astitcher commented May 30, 2024

PROTON-2818: Move epoll proctor connection logic to a task thread. #427

PROTON-2818: Move epoll proctor connection logic to a task thread. #427

Conversation

cliffjansen commented Apr 22, 2024

astitcher Apr 22, 2024 • edited Loading

Choose a reason for hiding this comment

cliffjansen May 6, 2024

Choose a reason for hiding this comment

astitcher left a comment

Choose a reason for hiding this comment

astitcher Apr 22, 2024

Choose a reason for hiding this comment

cliffjansen May 6, 2024

Choose a reason for hiding this comment

astitcher commented May 6, 2024

astitcher commented May 30, 2024

astitcher Apr 22, 2024 •

edited

Loading