Freeze with yellow activity light permanently on

isrealmathew · 9 June 2018 14:51

Hello, I’m hoping you can help with an intermittent issue.

My Raspberry Pi 3 B+ running OSMC freezes and cannot be contacted via SSH. When this happens, the activity light is permanently on till I give up waiting and unplug the power. Infrequently, only twice, if left in this state for a long time, it will finally show the OSMC sad face and reboot itself.

I’ve experienced this issue while watching a TV Show (like this evening) but I’ve also come across the Pi like this when it was left idle.

I’ve searched the forum but I cannot see anyone with this specific scenario or their causes were not part of my configuration.

I cannot replicate the issue on demand… it happens randomly maybe once every few days.

OSMC is running on a Raspberry Pi 3 B+ with the Element 14 Universal Power Supply 2.5A, 5.1V.

Peripherals… a powered USB hub with two powered NTFS formatted drives attached and a wireless keyboard dongle. I use ethernet, not WiFi.

I have experienced this across updates… no specific OSMC version.

My initial suspicion was that it was related to the drives, thus the introduction of the powered USB hub. Since it continued, and since it happens when unattended, I was thinking it was something I’ve installed but I can’t figure out what it might be. I’m using the Aeon Nox: SiLVO skin with various widgets enabled and thus I do see a little struggle sometimes when navigating between main menu screens but I can’t pin-point whether this is the cause.

Since I cannot reproduce the issue on demand, I haven’t enabled debugging. At this stage, I’m hoping the cause may be obvious without that (especially to those who know how to read logs): https://paste.osmc.tv/ohubojanux

Looking at the kodi log, there’s obviously a lot of “CRenderManager::WaitForBuffer - timeout waiting for buffer” and “CAESinkPi:AddPackets Underrun (delay:0.00 frames:2400)” in lead-up to this latest freeze but I don’t really know what that means or whether that was relevant.

Of course, I’m open to suggestions, including turning on the debugging for more detailed info.

I look forward to your advice. Many thanks.

fzinken · 9 June 2018 14:59

Try to limit ethernet to 100M as the Gig interface still has issues

Tom_Doyle · 9 June 2018 15:00

Hi,

cache settings out of date, still using kodi 16.

https://kodi.wiki/view/HOW-TO%3AModify_the_video_cache

Change:

 <network>
    <buffermode>1</buffermode>
    <readbufferfactor>1.5</readbufferfactor>
    <cachemembuffersize>104857600</cachemembuffersize>
    <curlclienttimeout>45</curlclienttimeout>
  </network>

To:

<cache>
   <buffermode>1</buffermode>
   <memorysize>104857600</memorysize>
   <readfactor>1.5</readfactor>
</cache>
<network>
   <curlclienttimeout>45</curlclienttimeout>
</network>

Thanks Tom.

isrealmathew · 9 June 2018 15:14

Ah, yes, thanks, I’ve made that change.

isrealmathew · 9 June 2018 15:18

Cool. Thank you for the suggestion. I’ll try to do that now. Curious… is the easier option just to switch over to WiFi and avoid Ethernet altogether or does that just open up a different can of worms?

fzinken · 9 June 2018 15:33

That would really depend on the quality of your Wifi environment. A good non congested wifi would be easier until the giga ethernet is fixed.

isrealmathew · 9 June 2018 16:43

OK, so I’ve stayed with Ethernet for the moment… As suggested, I installed ethtool and proved that the command switches it to 100 Mbits/s… And since there’s an issue with persistence after a reboot using rc.local… I followed the linked instructions and configured systemd instead… after a reboot it is now permanent.

I guess I’ll wait and see if I experience a freeze again over the next few days. I’ll report back accordingly. Thank you for the speedy response and assistance thus far. I appreciate it.

isrealmathew · 18 June 2018 13:26

Hello guys,

It took nine days for it to occur again in my presence… this is after addressing the advancedsettings and the 100Mbits issues from before. I’ve observed that the Raspberry Pi has rebooted by itself on a number of occasions since then but I haven’t caught it myself.

Tonight, I observed that the light was permanently on at 9:07pm… the time was stuck at 9:06pm… while the light was solid, I decided to just wait… there was no other activity except that the time updated at 9:15pm, then 9:28pm, then 9:36pm… Finally at 10:02pm the OSMC sad face appeared.

Once rebooted, I uploaded the logs: https://paste.osmc.tv/jizefuyipo

I tried to interpret them myself… and searched the forums again…

This article seemed to indicate that it was a memory issue… 'cos I can see the same return code and I see this in the logs “Out of memory: Kill process 570 (kodi.bin) score 867 or sacrifice child”.

Interestingly, they were asked but didn’t know the answer to this:

I knew my cache settings were modified as follows:
< buffermode>1< /buffermode>
< memorysize>104857600< /memorysize>
< readfactor>1.5< /readfactor>

So I went searching for what they should be…

So for the moment, I’ve decided to try changing those settings.
< buffermode>1< /buffermode>
< memorysize>83886080< /memorysize>
< readfactor>16< /readfactor>

I don’t understand how this cache setting can cause the Raspberry Pi to run out of memory if it isn’t playing anything and just sitting idle… so I’m grasping at straws really.

I’m open to other suggestions.

isrealmathew · 23 June 2018 15:19

So this issue still occurs.

Since the last update I turned debugging on… I tried to upload the logs but I can only assume they were too big… it kept generating the url appended with instead of a code. I instead exported locally… it was 39MB. I followed another forum entry which said the url fix was to reboot the Pi… That was a mistake because I lost the prior logs as a result.

Even with Debugging mode on there is a gap of 40mins in the logs when it kicks in and for whatever reason I don’t see the same Kill Process line from the non-debugging logs… I can only assume because I need to set the debugging level to 2… so I’ve done that now I order to catch it next time.

Even with Debugging set to 1, there was only one thing triggered some time before the freezing… A local backup. Which only occurs before an update. I opened the backup directory and the date/times kinda coincide with the freeze times.

When I look back at the non-debugging logs from before I can also see another pattern it seems to only occur after 24 hours up-time.

Thus, I think I’m looking for a task that happens daily (if not the backup or update itself).

I’ve also determined that the freeze is kinda temporary… It only lasts for about 50 minutes… that’s when the sad face appears (I assume that’s when the kodi.bin process is killed) and the Pi restarts.

Just in case the cause is the backup being written locally, I’ve moved the path to an external drive.

I’ve also manually rebooted at 9:30 pm this evening so I can see if it occurs at 9:30 pm tomorrow… Of course, if it is related to updates & backups, there will need to be an available update then.

Again, any other suggestions welcome. I’m still at a loss.

isrealmathew · 28 June 2018 14:19

OK, so even with debugging set to level 2 I’m still struggling to identify the offending process or trigger.

In order to eliminate System Updates or Backups from the equation, I scheduled those for 8:30am… At that time there were no memory spikes or solid lights.

On the up-shot, I have without question discovered that the issue will occur at exactly 24 hours uptime. I also know that it can go for as little as 10 minutes or even 1 hour and 10 minutes like it did tonight. Every time it will culminate in a Sad Face and Kodi will restart. I don’t know what that means… I’ve been trying to work out what happens at 24 hours uptime to no avail.

I’ve uploaded tonights logs here: https://paste.osmc.tv/iwakakivag

Tonight I remoted in and watched the performance via htop. Initially there was a slight increase in CPU but by 22:13 I had 84.8% memory used by kodi.bin… and CPU was negligible… this held the entire time… and the activity light was solid throughout. The kodi.bin process had 50/51 threads the entire time. I do have some screenshots of the htop output if anyone is interested.

Now, usually I only see a gap in my logs but tonight I have the following sequence repeating during this time…

22:16:54.007 T:1087365888 DEBUG: Thread JobWorker 1087365888 terminating (autodelete)
22:17:04.251 T:1925831168 INFO: CheckIdle - Closing session to http://www.msftncsi.com (easy=0x49437090, multi=(nil))
22:17:46.166 T:1087365888 DEBUG: Thread JobWorker start, auto delete: true
22:17:50.957 T:1087365888 INFO: easy_aquire - Created session to http://www.msftncsi.com
22:18:01.653 T:1087365888 ERROR: CCurlFile::Exists - Failed: Timeout was reached(28) for http://www.msftncsi.com/ncsi.txt
22:18:09.989 T:1087365888 INFO: easy_aquire - Created session to http://www.w3.org
22:18:17.892 T:1087365888 ERROR: CCurlFile::Exists - Failed: Timeout was reached(28) for http://www.w3.org/

Again, I don’t know what I’m looking at here… Apparently those URLs have something to do with testing network connectivity. I can’t tell if this is relevant to the issue or not.

I looked again through the forum… I found a reference to “100% cpu and frozen picture” which mentioned turning off UPnP:

I guess I’ll try that next. I don’t know what else to do. This is really starting to get me down. I don’t understand why it’s so difficult to find out what Kodi is doing (and why) at a given time.

Anyone?

bmillham · 28 June 2018 14:25

Well, I’d say that whatever add-on that seems to be grabbing random URLs may be your culprit here. Have you tried to do a new clean install?

sam_nazarko · 28 June 2018 14:27

W3 and MSFT URLs are used for Kodi’s network connectivity tests.

isrealmathew · 28 June 2018 15:00

Yes, that was what I had concluded from other forums… that Kodi was checking network connectivity. Of course, I guess that doesn’t rule out an addon doing something to trigger that check.

No, I haven’t tried that yet. It seems like such a sledgehammer for an issue that should be simple to diagnose. Plus, if I do that, I’ll never know the cause or how to avoid it or diagnose it in future. I surprised that there are no clues in the logs.

Is a clean install really where I’m at?

ActionA · 28 June 2018 15:37

Why not backup your current first?

isrealmathew · 28 June 2018 16:19

Thanks, yeah, I will. In fact I only finished a full backup just now in preparation.

I guess I got caught up with solving it every night, to the point of frustration, that I didn’t want to admit defeat.

Using such an approach may still identify the culprit… so long as I limit how many things I install every 24 hours. I’ll update here if I find it.

I appreciate the advice from you all, especially getting me out of the never-surrender head-space. Thank you all for the time and assistance thus far.

isrealmathew · 28 June 2018 16:21

Side-note: UPnP wasn’t enabled anyway.

sam_nazarko · 28 June 2018 17:17

A user had a similar issue with freezes due to a dodgy UPNP server on the network. This even occurred with it disabled.

We used iptables to block traffic to that machine in the end

isrealmathew · 28 June 2018 23:04

That’s interesting… Thanks for that info.

To that end, I downloaded an app on my phone to search for UPnP devices. It identified UPnP on my Router, including WPS, streaming capability on my Windows PC and Remote Device Renderer from my TV. I disabled each of their settings. Now the app shows nothing on my network. I hope that was a reasonable approach.

I’ll see how that goes. Thanks again.