Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/linux/ld: harden brewed_ld_so_diagnostics against TypeError #18333

Merged
merged 3 commits into from
Sep 17, 2024

Conversation

carlocab
Copy link
Member

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same change?
  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your changes? Here's an example.
  • Have you successfully run brew style with your changes locally?
  • Have you successfully run brew typecheck with your changes locally?
  • Have you successfully run brew tests with your changes locally?

I think this is a bug in Ruby, but I've no idea how to track it down. I
can reproduce it intermittently in a codespace when brew installing a
large number of formulae.

To work around this:

  • cache the return value of brewed_ld_so_diagnostics so that we can
    limit the number of calls to IO.popen
  • retry once when we see a TypeError

Closes #17828.

@carlocab carlocab requested a review from Bo98 September 16, 2024 04:48
@Bo98
Copy link
Member

Bo98 commented Sep 16, 2024

Hmm definitely a hack alright, but at least we're getting closer by narrowing it down to this so thanks for finding it! I wonder if it's something specific to this or if it's just from this having a higher hit rate from the lack of caching.

Good to know it's reproducible in Codespaces. How consistently are you seeing this? What commands are you running that give a reasonable hit rate?

I'm willing to spend some time this afternoon to track down the precise (potentially Ruby) bug here.

@carlocab
Copy link
Member Author

How consistently are you seeing this? What commands are you running that give a reasonable hit rate?

I've seen it twice in the past two days (but that's also the number of times I've tried using a codespace to test something). I definitely triggered it with a brew install gnupg, and I also triggered it while running

brew install --only-dependencies --verbose --formula --build-bottle na-trium-144/webcface/webcface@2

to test out Homebrew/discussions#5607.

I'm willing to spend some time this afternoon to track down the precise (potentially Ruby) bug here.

@carlocab carlocab force-pushed the ld-so-diagnostics-typeerror branch 2 times, most recently from 1519b0d to 65b76ed Compare September 16, 2024 05:07
I think this is a bug in Ruby, but I've no idea how to track it down. I
can reproduce it intermittently in a codespace when `brew install`ing a
large number of formulae.

To work around this:
- cache the return value of `brewed_ld_so_diagnostics` so that we can
  limit the number of calls to `IO.popen`
- retry once when we see a `TypeError`

Closes #17828.
Copy link
Member

@MikeMcQuaid MikeMcQuaid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @carlocab, looks good so far!

Library/Homebrew/os/linux/ld.rb Outdated Show resolved Hide resolved
Library/Homebrew/os/linux/ld.rb Outdated Show resolved Hide resolved
Library/Homebrew/os/linux/ld.rb Outdated Show resolved Hide resolved
@Bo98
Copy link
Member

Bo98 commented Sep 17, 2024

Haven't managed to even hit this once yet. I even did:
brew ruby -e '1000.times { OS::Linux::Ld.brewed_ld_so_diagnostics }'

With that said, this isn't even the only call site that triggers a TypeError anyway (#15037) so probably doesn't make sense special casing that here. The higher likelihood of it happening her is probably be tied to the high call rate so we can probably slim this down to the cache changes which will end up "fixing" the issue for the vast majority by reducing the surface area chance. Or at least try the cache change first and do a second iteration if need be.

I will still try to track down the TypeError since it has been happening for years now.

@carlocab
Copy link
Member Author

carlocab commented Sep 17, 2024

this isn't even the only call site that triggers a TypeError anyway (#15037)

This backtrace suggests that the call site is close by (or possibly even refactored into OS::Linux::Ld), though: #15037 (comment)

Not sure about the first one. But I'll drop the retry logic and we can see if caching is enough.

This is a hack, so let's see if we can get away with skipping it for
now.
@Bo98
Copy link
Member

Bo98 commented Sep 17, 2024

This backtrace suggests that the call site is close by

ld.so popen is very new - was done a few months ago.

Initial backtrace in that issue mentioned GCC post-install which also has several popen calls.

@carlocab
Copy link
Member Author

ld.so popen is very new - was done a few months ago.

Yup, but the second backtrace references dynamically_linked_libraries -- of which part has been refactored into here.

Initial backtrace in that issue mentioned GCC post-install which also has several popen calls.

Wonder if we should just stop passing Pathnames to Utils.popen_read, which seems to be something in common across all of these.

@carlocab carlocab merged commit 04a2c99 into master Sep 17, 2024
27 checks passed
@carlocab carlocab deleted the ld-so-diagnostics-typeerror branch September 17, 2024 04:09
@Bo98
Copy link
Member

Bo98 commented Sep 17, 2024

Possibly. A piece of information I'd love to know is if the TypeError is happening before, during or after the block is called.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

systemd brew.rb "Error: no implicit conversion of false into String" error
3 participants