So in other words while the underlying reliability of our transports has remained the same in 3.6.80 (vs 3.6.57), we now get a lot more “failed” states, which is what we use to signal to our users that their connections has been disrupted. So users complained, since they saw many more network notifications etc. The other side of the coin is worse, however, in that we’ve not been telling Chrome users that their connection has “failed” even when it did die, because we were not aware of it, as the state was “disconnected”.
So the updated version is great, causing us to come to terms with how unreliable these connections can be over the longer term (we get a 10-20% failure rate over say 20-30 minutes, which seems quite high, but I don’t have external data to compare it to - is it high?). As a result of these findings, we’re now implementing connection rescue and proper reconnection mechanisms, as well as digging into transport reliability.
Some more very relevant discussions on this topic for anyone who stumbles across this in the future: