I noticed the something occurring in the Kodi log, which seemed to coincide with the network connectivity issue. I think I may have nailed the issue down to the osmcsetting.updates module (or related module)
I believe that something is going wrong, causing the updatesetting service to initialize when it should not. The service appears to get stuck in a holding pattern indefinitely, and may be blocking other network traffic as a result.
Here’s a debug log taken from a unit that was working normally with network connection up.
Here’s the debug log of a unit that was working, but eventually stopped working. – I had to cut parts out because it was too big to fit in a single paste, but the messages in that chunk were just repeating ad infinitum.
The problem I’m having seems to coincide with the time when the following osmc.setting.updates
messages begin repeating infinitely. These messages do not appear when the network is working normally.
00:19:14 38730.066406 T:1796068336 DEBUG: script.module.osmcsetting.updates : - update_service.update_now received: pos0: <update_service.Main object at 0x5f3cff10>
00:19:14 38730.148438 T:1796068336 DEBUG: script.module.osmcsetting.updates : - update_service.check_update_conditions received: pos0: <update_service.Main object at 0x5f3cff10>
00:19:14 38730.152344 T:1796068336 DEBUG: script.module.osmcsetting.updates : result of Player.GetActivePlayers - {u'jsonrpc': u'2.0', u'id': 1, u'result': [{u'playerid': 2, u'type': u'picture'}]}
00:19:14 38730.152344 T:1796068336 DEBUG: script.module.osmcsetting.updates : - Update CONDITION : player playing
00:19:14 38730.152344 T:1796068336 DEBUG: script.module.osmcsetting.updates : - update_service.check_update_conditions [0.00170302391052] returns: (False, 'Update CONDITION : player playing')
00:19:14 38730.152344 T:1796068336 DEBUG: script.module.osmcsetting.updates : - update_service.update_now -nowait
00:19:14 38730.152344 T:1796068336 DEBUG: script.module.osmcsetting.updates : trigger_time - 2016-09-01 00:19:13.625363
00:19:14 38730.152344 T:1796068336 DEBUG: script.module.osmcsetting.updates : - update_service.holding_pattern_update received: pos0: <update_service.Main object at 0x5f3cff10>
00:19:14 38730.152344 T:1796068336 DEBUG: script.module.osmcsetting.updates : - update_service.check_update_conditions received: pos0: <update_service.Main object at 0x5f3cff10>
00:19:14 38730.152344 T:1796068336 DEBUG: script.module.osmcsetting.updates : result of Player.GetActivePlayers - {u'jsonrpc': u'2.0', u'id': 1, u'result': [{u'playerid': 2, u'type': u'picture'}]}
00:19:14 38730.152344 T:1796068336 DEBUG: script.module.osmcsetting.updates : - Update CONDITION : player playing
I don’t do much socket programming at all, but taking a look at the comms.py
file… I notice that the socket is essentially in an infinite try/except block of
log('Comms started')
while <some_condition>:
try:
conn, addr = self.sock.accept()
except socket.timeout:
continue
except:
log('An error occured while waiting for a connection')
break
log('Connection active.')
#more stuff
(The fact that it’s stuck here forever can be verified by looking at the logs, where "Comms started"
is logged, but "Connection active."
is never logged until I came along and interrupted things. 'An error occurred while waiting for a connection.'
is never logged, so the except condition being hit must be socket.timeout
)
Like I said, I’m no expert with sockets, but that doesn’t stop me from trying to string together a vague idea of what might be wrong… Some line items that almost make this click for me:
- A socket with a timeout blocks during the timeout period.
- If the
accept
is retried immediately after the timeout, that leaves no time in between this blocking call.
- Given 1 and 2, wouldn’t this essentially be the same thing as setting timeout to
None
?
- Do you suppose this behavior would be fixed if the socket was either nonblocking and/or waited a period of time (30 secs?) between trying to connect/accept?
All that said, this could be a byproduct of the problem and not necessarily the cause and I’m just barking up the wrong tree.
Anyhow, I’d like to test my theory. Is there a way to disable the settings update? If not, I can try monkeypatching the module to do effectively nothing to test the behavior, or adjust the socket behavior to try and fix what I believe the problem to be. That is, unless that will definitely break things in an unexpected way… who knows… Also tossing around the idea of increasing the _daemon
wait time to more than 500ms
To give a little background on my setup:
I’ve got several raspberry pi’s running OSMC. I have an autoexec.py
script that contacts a central server, providing its location, and makes a query for what should be in its playlist. From the result, the script downloads and adds/removes the appropriate files in the playlist folder. (/home/osmc/current) – Then the script starts a slideshow to that directory. In other cases, it may start a video playlist or something else, but the idea is the same.
I deployed the different units by setting up 1 unit, then cloning the SD card to the other units. After loading the image onto the units, I remove the network settings (the one tied to the base unit’s MAC address) from connman then setup a new network connection through the GUI.
It seems like a crapshoot as to when the problem will occur. Sometimes it’s multiple hours after starting the slideshow, sometimes it’s within minutes. But at some point, it always happens.
Thoughts?