Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLE high frequency of disconnects #761

Open
bendauphinee opened this issue Apr 1, 2024 · 12 comments
Open

BLE high frequency of disconnects #761

bendauphinee opened this issue Apr 1, 2024 · 12 comments

Comments

@bendauphinee
Copy link

Subject of the issue

Over 8 hours yesterday, I ran a survey with the new 4.0 firmware. I've had occasional disconnects before, but the new version seems particularly unstable, with well over 50 disconnects over the course of my day. After adjusting the update frequency down from 0.25 sec to 0.33 sec, it seemed to stabilize some, but still had issues.

I tried noticing a pattern, but there was none apparent, with disconnects happening because of:

  • moving too fast in a high clutter area (forest survey)
  • existing in a non-clutter area (a field)
  • tilting it away from the sky (random)
  • walking too fast
  • random random (I noticed a pattern where when surveying a point, the vertical accuracy would start dropping slowly, then disconnect)

I guess maybe as a first step, if y'all count either generate a build to log whatever data you might need to diagnose this, or tell me what to turn on to log it to SD card, we can start there. I have another long survey run this next weekend likely, so it would be a good time to do this.

Your workbench

  • RTK firmware: 4.0
  • ZED-F9P firmware: HPG 1.32
  • Radios in use: Bluetooth / BLE mode
  • Software: SW Maps / iPhone
  • Transmitting NTRIP back to the device? Yes
@tonycanike
Copy link
Contributor

Can you tell us more about your hardware configuration please?

Which specific SparkFun hardware are you using? Facet? Facet LB? Surveyor? Express? Breakout board? etc? Are you using two SparkFun units in Base/Rover? Or are you using one and getting connections from a network source?

By "transmitting NTRIP", are you using IP to transmit your own RTCM from your base to your rover, or are you getting RTCM via NTRIP from a network provider and transmitting it to your rover? (The problem could have been in the network or at the RTCM provider.)

Can you describe what you mean by "disconnect" ?
Are you saying SWMaps on the iPhone disconnects from the bluetooth connection with the Facet (for example) ?
Or is the NTRIP data stream interrupted (eg are you losing your internet connection) ?
Or are you saying you are losing RTK Fix?
Or something else?

@bendauphinee
Copy link
Author

Great stack of question! You can find my detailed writeup here, but I'll summarize:

Using a SparkFun GPS-RTK-SMA Breakout – ZED-F9P transmitting via RTK Base to RTK2GO. Receiving via SW Maps, and passing it into my SparkFun RTK Facet via SW Maps.

By disconnect I mean there seems to be some Bluetooth disconnect happening, so it drops out of SW Maps. I had it do this once after reconnecting even before I had reconnected to NTRIP. Usually it reconnects with no issue when I connect back to it in SW Maps, but every once in a while it would lag and refuse to connect again until rebooted.

RTK performance was fine, and it would always be in RTK Fix right up until a connection drop. NTRIP performance is also fine, with both base and my phone on stable LTE connections.

I'm assuming it's BLE bandwidth related, due to the fact that turning down the update frequency cut way back on the disconnections. This is in contrast to 3.10, where I would only hit occasional disconnects, with the default update frequency.

@nseidle
Copy link
Member

nseidle commented Jul 22, 2024

We've see a variety of issues with BLE on the v2.x of the ESP32 Arduino core. Without a way to replicate your exact issue/setup, I cannot explain why v3.x of the RTK Firmware was better. We've have had a lot more success on RTK Everywhere where we have more PSRAM on those products, so we are able to move to v3.0.x of the ESP32 core.

To clarify, are you running on your own hardware? Not a RTK product? If you're on your own hardware and up for some hacking, you could run RTK Everywhere on an ESP32 dev board with PSRAM.

If you can give us a way to replicate I'm happy to take a look.

@bendauphinee
Copy link
Author

To clarify, are you running on your own hardware? Not a RTK product?

I'm running y'all's RTK Facet, I just dumped the reported hardware for reference.

If you can give us a way to replicate I'm happy to take a look.

I don't know that I can give you a way to replicate, which is why I suggested if you can do me a logging build, I can run it and give you the results if it's still flaky.

@nseidle
Copy link
Member

nseidle commented Jul 23, 2024

Ah! I saw the Using a SparkFun GPS-RTK-SMA Breakout and was confused.

Are you running any settings that are of note? Did you turn on any additional messages?

@bendauphinee
Copy link
Author

Other than being in BLE mode, tweaking the frequency down, and transmitting NTRIP back to device, no settings of note.

@YashuSystems
Copy link

YashuSystems commented Aug 7, 2024

I just acquired an RTK Express+ since I need the ZED-F9R. I also have a Facet w/ ZED-F9P which I obtained about a year ago. I have v4.1 running in both and I'm using the latest SW Maps as of this post date.... with iPhone 13.

I am seeing these BLE disconnects on the Express+ but not on the Facet. The disconnects are unacceptably frequent (typ less than 60 sec to disconnect after the associate from SW maps). The disconnect issue seems more prevalent once I start streaming RTCM corrections from the SW Maps client. So, this topology I was planning to use for my rover is now essentially inutile with the Express+.

I am unsure if there is some kind issue with the ZED-F9R vs. ZED-F9P. At this point, I am trying to determine if the Express+ has hardware issues and if I need to exchange it for a new unit or something. At this time, I do not have time to trace this BLE disconnect issue with my Ellisys Vanguard. Maybe in a few weeks or something(?).

Advise if Sparkfun plans to investigate and resolve this issue with a FW update. Otherwise I will need to create my own corrections stream path and simply deem the Express+ BLE link as unreliable. I also do not have time to solve the ESP32 FW -or- HW issue.

_****We've see a variety of issues with BLE on the v2.x of the ESP32 Arduino core.**** ... please elaborate

FYI.. I had to reposition the OLED on the Express+ since the enclosure cutouts registration on the Hammond enclosure were way off ( > 2mm) and the graphic overlay viewing window was therefore not aligned obscuring clear view of all OLED pixels. The hot glue attachment should also changed to screws for these rather expensive units. I modified my Express+ unit to use self-tapping screws.

@YashuSystems
Copy link

Here is a Vanguard trace of the anomalous single byte from the Express+ when it is streaming NMEA sentences to SW Maps. SW Maps then looks to instigate a disconnection sequence or something.
Express+_SingleByteXmitDuringGPS-OutputStreamAnomaly_EllisysVanguard

@pkramer509
Copy link

Dont have anything to add other than running out of box express+ 4.0 the BLE to ios/SW maps is functionally useless as it doesn't stay connected long enough to do anything useful.

@nseidle
Copy link
Member

nseidle commented Sep 13, 2024

Here's what I can replicate:

With an RTK Express, running v4.1, connected to iOS and SW Maps, using factory defaults, we can very reliably connect over BLE. Once NTRIP is turned on (data is now being sent from the phone back to RTK device) the NTRIP Client will disconnect after 15 to 30 seconds. After lots of testing, it appears the BLE connection is getting saturated and the RTK device does not read the incoming BLE bytes from the phone. When this happens, iOS will disconnect the BLE RX service (because it's not being read from), and the NTRIP Client disconnects.

Reducing the RTK device's measurement rate from the default of 4Hz to 1Hz, and disabling the GSV NMEA message increases the NTRIP Connection time to more than 45 minutes (we're still running longer tests).

If you are experiencing problems over BLE, try the following:

  • Reduce the measurement rate on the GNSS receiver from 4Hz to 1Hz
  • Reduce the number of NMEA messages transmitted to the phone. For example, GSV (satellite information) is a large amount of data that is usually not needed by GIS operations.
  • Reduce the size of RTCM data being transmitted to the RTK device. Change the base station output to reduce the number and or rate of messages transmitted. For example, RTCM1005 and/or RTCM1230 can be reduced to 0.1Hz. and RTCM1074/1084/1094/1124 can be reduced to 0.5Hz.

While I can't replicate the original "BLE disconnects", I believe the combination of user's other base setups (which we can't easily replicate), and what we're seeing with the NTRIP Client disconnecting, it all may be related. I would love to get additional feedback or ways we can replicate this issue. Please keep the feedback coming.

image

Above, the GNSS receiver measurement rate is set to 1 Hz.

image

Above, UBX_NMEA_GSV is set to 0 (disabled).

@gdt
Copy link
Contributor

gdt commented Sep 14, 2024

I'm still on 3.10 (due to not having used my Express for a while and not updating when I started again a week or two ago), and it is is stable (Android, QField getting NMEA over BLE, and RTCM obtained over wifi from MaCORS). No issues, other than poor accuracy when really close to the north side of a building with bad sky view (no surprise). I am thinking about upgrading, and scared of 4.x. Are there any reports of BLE instability with Android, or does this feel like a dropped byte causes ios to be twitchy and give up?

You said "saturated", and I wonder if the problem is the ESP keeping up with reads, or the data transmission being more than fits in the channel, and the transmit buffers filling up. Since it's a shared physical channel it makes sense that high-rate NMEA from the F9P would cause RF congestion and thus cause the RTCM stream to back up, and the ios device to get upset about that. Is that what you think is happening?

I will read all the releases notes and plan testing. I might also try to configure BluetoothGNSS and mock GPS to get NMEA into QField, which would result in RTCM and NMEA both on BLE.

@nseidle
Copy link
Member

nseidle commented Sep 14, 2024

Are there any reports of BLE instability with Android

No, I believe this is predominantly an iOS issue.

Since it's a shared physical channel it makes sense that high-rate NMEA from the F9P would cause RF congestion and thus cause the RTCM stream to back up

I have removed the RTK Firmware from the equation and am not concerned about GNSS receivers at this point. I believe it has to due with the ESP32's BLE stack and how iOS deals with devices that do not empty out the iOS phone's buffer in a timely manner. Our testing is done with nothing but a stripped down BLE serial transmission and reception test. In one direction (TX from RTK device to iOS phone), everything is fine up to ~4kB/s. The BLE TX service remains intact even as buffers begin to fill. Once you add in data coming from the phone, data flowing in both directions reach about ~1.5kB/s at which time, if a packet from the phone doesn't get through because it is blocked by a TX from the RTK device, the phone will stop trying to transmit, the BLE RX service is closed, while the BLE TX service remains intact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants