Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize connection pool implementation #924

Closed

Conversation

MarkusSintonen
Copy link
Contributor

@MarkusSintonen MarkusSintonen commented Jun 10, 2024

Summary

Second PR in series of optimizations to improve performance of httpcore and even reach performance levels of aiohttp (and urllib3 library).
Related discussion encode/httpx#3215 (comment)

Optimizes the connection pool by reducing the time complexity of the idling connections checks. Also it no longer checks the sockets readable status on every pool operation. Pools has_expired check uses socket polling via is_readable check which is relatively expensive. So now the polling is done using smudged intervals (smudging to avoid all polls being done at the exactly same time). This still should have relatively low chance of encountering broken keep alive connections when connection is picked up from the pool.

Async previously:
old1

Async with PR:
new2

Async request latency is not so stable yet (as this doesn't include #922) but the overall duration of the benchmark improves by 7.5x. (The difference diminishes when server latency increases over 100ms or so.)

Sync previously:
sync_old

Sync with PR:
sync_new

Sync request latency improves by x2.5. With this httpcore has same performance as urllib3 library.

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.

@@ -238,24 +238,27 @@ def _assign_requests_to_connections(self) -> List[AsyncConnectionInterface]:
those connections to be handled seperately.
"""
closing_connections = []
idling_connections = {c for c in self._connections if c.is_idle()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like we use this collection just for tracking the count of idle connections. Maybe it should just be an integer for simplicity?

Copy link
Contributor Author

@MarkusSintonen MarkusSintonen Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With integer we would need to recheck all the connections again. We can not just decrement below in the if-elif branches as we could end up going negative etc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh sorry, you are right. We can just check eg the expired one if it was also idle one and decrement!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that part significantly change the performance? Did you run any tests? Maybe we are gaining the same performance boost by only using polling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I checked with pyinstrument it showed this exact spot as a hot spot spending time iterating the connections in outer and inner loops. Ill rerun the benchmark to check it again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyways I pushed now the integer fix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the fix in _assign_requests_to_connections the performance starts to deteriorate when the amount of connections in the pool increases (which is not surprising as there is the loop in the loop).

With 20 concurrency (sync):
sync_20

With 40 concurrency (sync):
sync_40

@karpetrosyan
Copy link
Member

karpetrosyan commented Jun 11, 2024

I have similar results without this PR. Am I doing something wrong?
I am using benchmarks script from your another PR

Here is what I got on my machine without this PR:
image

UPDATE:

However, the test results for httpcore vary between 3500 and 5500 for me (without this change).

@MarkusSintonen
Copy link
Contributor Author

I have similar results without this PR. Am I doing something wrong?

What kind of system are you on and python version?

Comment on lines +297 to +307
# Checking the readable status is relatively expensive so check it at a lower frequency.
if (now - self._network_stream_used_at) > self._socket_poll_interval():
self._network_stream_used_at = now
server_disconnected = (
self._state == HTTPConnectionState.IDLE
and self._network_stream.get_extra_info("is_readable")
)
if server_disconnected:
return True

return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm not happy with interval calculating.

Could we improve is_readable instead? FWIW I noticed anyio and sync backend use httpcore._utils.is_socket_readable while trio backend uses its own, @MarkusSintonen is there any benchmark difference when you switch backend to trio?

For improving is_readable in get_extra_info:

  • Always assume it is readable and turn it to false by specified events such as received close socket from server.
  • Use synchronized Event on readability status change.

I didn't go deep to search these cases are possible or not. They are only my opinions 🤷‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trio also suffers badly from the constantly happening socket polling.

Without intervalled socket polling:
old_trio

With intervalled socket polling:
new_trio

So in trio its over 5x slower when constantly doing the socket polling.

Copy link
Contributor Author

@MarkusSintonen MarkusSintonen Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For improving is_readable in get_extra_info:
Always assume it is readable and turn it to false by specified events such as received close socket from server.

Im not aware of anyway to to get events about socket getting closed. As far as I know the only way to know it is to use the socket. 🤔 But I agree the is_readable could be better so its not so heavy weight. We could make it just a flag based so in networking side we just set some boolean flag when we detect a network error on usage. This has a downside that we have greater probability of giving out already broken connections from the pool. But as far as I know this is how its usually done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other options would be to move the intervalled polling into the network backends side. Or get even more elaborate and run the socket polling in a specific interval via loop.call_later which run the actual poll via loop.run_in_executor to avoid any possible nonasync socket IO in the async land. Gets easily hairy 😄

Copy link
Contributor

@T-256 T-256 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except that interval mechanism (perhaps could be excluded from this PR), it's LGTM, Thank You!

@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Jun 12, 2024

(perhaps could be excluded from this PR)

That was the beef of the PR as the constantly happening socket polling is the most expensive thing in the whole pooling implementation 😅 (loop complexity gets shadowed by the socket polling)

@tomchristie
Copy link
Member

Let's get this split into two seperate PRs please, so we can look at each in isolation and determine if it's a benefit.

@MarkusSintonen
Copy link
Contributor Author

MarkusSintonen commented Jun 13, 2024

Let's get this split into two seperate PRs please, so we can look at each in isolation and determine if it's a benefit.

Will do, FYI here was the pool without the loop complexity fix #924 (comment)

@MarkusSintonen
Copy link
Contributor Author

@tomchristie / @T-256
I have now split the PR into two. Ill close this one.

@MarkusSintonen MarkusSintonen deleted the optimize-conn-pool branch June 13, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants