-
Notifications
You must be signed in to change notification settings - Fork 279
-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to debug random unhandled EPIPE error #4470
Comments
It's hard to debug when there isn't any code provided. Maybe try asking on another platform? |
in general, installing an error handler and logging incidents for all your streams is a best practice, that way we will know which stream broke. |
@gireeshpunathil we do add error handler to all our streams, unfortunately this seems to not trigger in the case above. For example when we open a WebSocket we add: ws.addEventListener('error', (event: WebSocket.ErrorEvent) => {
logger.error(event.error, `WebSocket error encountered`);
}); In the case above tough we see no log lines, meaning some other stream seems to have faulted. As we don't control all code opening streams (such as e.g. code from NestJS or some other library we use) we can't just "check / instrument all streams". Thus the ask: How do we debug NodeJS errors like this one? If I may be so bold: A system that throws errors that are not debuggable isn't a great system - as any library / developer error has the power to break the system in a way that can't be traced. Surely there's some method to at least get a complete stacktrace as to what part failed to handle the error? |
If you increase your stack trace limit, the error will provide more context. |
agree, but that is where |
I followed your suggestion and managed to get a nice snippet:
Judging from that I'm going to debug our connection to Jaeger (we send telemetry there, and any failures of sending telemetry should not impact stability) |
this is great finding! looks like we already read 32KB data before the pipe broke. so you are right, telemetry shouldn't break the app, so worthwhile to see i) why the remote is breaking up in the middle, ii) if the jaeger connection has an error handler. |
Node.js Version
v20.17.0
NPM Version
10.8.1
Operating System
Darwin ip-192-168-201-130.ap-southeast-1.compute.internal 23.6.0 Darwin Kernel Version 23.6.0: Mon Jul 29 21:14:30 PDT 2024; root:xnu-10063.141.2~1/RELEASE_ARM64_T6030 arm64
Subsystem
events, process
Description
My application randomly terminates with the following lines being last in the log:
I am unsure how to best debug this issue as the stack trace doesn't contain any part in our codebase.
We use NestJS, which adds another layer of complexity as the pipe which seemingly couldn't be written to might not even be something in our application code.
Due to the random nature I'm hesitant to run always with
NODE_DEBUG=net
, as I cannot reproduce the issue reliably.Minimal Reproduction
Unable, codebase is proprietary and issue appears randomly
Output
Before You Submit
The text was updated successfully, but these errors were encountered: