No network and sad-face loop after Buster upgrade: how to get a rescue console?

It says something about osmc’s reliability that despite all the crazy things I’ve been doing with it this is the first time anything has ever gone wrong with it, after many years of usage. Even the lirc transition was painless, despite my putting it off for ages in the certainty that it would be a disaster and I’d have to reprogram my remote control again. No! Amazing!

… but I just upgraded to Buster, and, um, things did not go well.

I didn’t expect things to go exactly perfectly since I’m using my own OSMC build with an epoch attached[1] (and i haven’t rebuilt yet), so I’d expect OSMC to fail to start until I rebuilt it – but I didn’t expect what I actually saw, which was a complete failure to bring up the network (so none of my NFS mounts came up, NTP couldn’t come up, and I couldn’t ssh in to diagnose anything). A sad-face boot loop (but not apparently a reboot loop) soon followed, of course, likely due to the un-upgraded osmc, but with no network and no console I can’t really fix that. The network is actually wifi, but it comes over Ethernet from a Netgear wifi router so it appears as straight Ethernet with working DHCP.

I’m not aware of any way to interrupt the boot to get a console, so I attached the SD card to a working machine to try to diagnose it, and found no useful logs at all! /var/log has only a few logs from the upgrade, all of which look perfectly OK: fontconfig.log, alternatives.log and dpkg.log. No boot log. There is no /etc/network/interfaces at all: I guess connman is meant to be bringing it up? I do see working DHCPDISCOVERs and acks on the DHCP server, so I guess it’s getting an IP address, but with no logs I can’t see what’s going wrong after that. There is no sign of a systemd journal either.

There is a lot of whining in /var/log/apt/term.log, where dpkg is asking curses-mode questions about openssh-server and is just randomly striking ‘q’ until the installation continues. I’m not sure if that’s the only problem, because I don’t know how to get a rescue console or how to get the osmc splash screen out of the way so I can do some proper diagnosis on a running system.

I believe from the contents of checkmodifier.c that holding down a control key on the console terminal at boot should give me a shell, but this doesn’t appear to be documented. I will give it a try in umpty hours once OSMC has rebuilt.

(Some logs in http://www.esperi.org.uk/~nix/temporary/logs.tar.gz: can add more on request.)

[1] it’s got a patch attached which uses a stored procedure to create the mariadb database – I’m still using an old mysql 5.7 as the server, so I’m expecting to have to upgrade that too, not my favourite thing ever… for now so I can get things back up and running without also having to upgrade other things at the same time, I’ve tweaked the kodi configure flags back in my local build to turn mysql on again. lazy, moi? (yes. that’s why I’m building kodi from scratch in an osmc build tree in a containerized “installation” of Debian atop a totally different distro on the host system. doesn’t everyone do that? :slight_smile: )

(Again, there is probably nothing wrong and I suspect I’ll fix this myself once I get a chance, probably this weekend: but it’s worth noting the existence of this failure, and the complete lack of logs. Once I figure out the cause, I’ll note it here, so that the robustness of… whatever went wrong can improve in future. Unless, of course, it is “osmc doesn’t work because I didn’t provide a suitably rebuilt kodi at the right time”, in which case this is all my own fault…)

(btw, do you really mean to build on stretch and run on buster? The armv7-toolchain-osmc package has not been updated, and the osmc-pc-filesystem is still based on stretch.)

What makes you say that?

I wiped my /opt/osmc-tc/armv7-toolchain-osmc dir and let it recreate it using tip-of-master-branch https://github.com/osmc/osmc. It pulled in stretch, as I’d expect from this line in filesystem/osmc-pc-filesystem/build.sh:

RLS=“stretch”

… and the buster branch has that changed. Perhaps only the buster branch works now? Since master has a lot of buster stuff in it, maybe buster should be merged to master… mea culpa, I guess, I assumed master was tip-of-development :slight_smile:

FWIW, this is definitely the cause of my failure: the soname of libmariadb has changed between buster and stretch, so kodi fails to start.

(The hold-down-ctrl-at-booting thing works perfectly. I love osmc :slight_smile: I’ll update the accessing-command-line wiki page later today.)

Well spotted! I’m not sure why the Buster toolchain hasn’t been merged yet.

No, I can’t read. The buster branch is clearly outdated (so I should be using master) and in any case hasn’t changed either. In both cases,

+RLS=“stretch”

… but of course it’s easier just to edit /etc/apt/sources.list by hand. (It’s only new installations that will get confused). Rebuilding now…

The networking problem turned out to be that sshd didn’t start because the old /etc/ssh/sshd_config had a hardwired list of KexAlgorithms, Ciphers and MACs, and some of those were obsoleted and no longer valid by the time buster’s openssh came round. ucf tried to let me fix the problem on upgrade, but as noted above the upgrade script didn’t let me see that, leaving me with a broken openssh. (This was, it turned out, not too disastrous a problem, on account of the hold-down-ctrl way of getting to the console.)

I’m a bit confused. We updated our toolchains to use Buster in June, and this is merged in master. See:

It should – but unfortunately with newer kernels (4.x) this is somewhat broken. We looked in to it, but couldn’t find an exact cause. Our guess is that the boot sequence has changed slightly, and this means that we’re not able to process the input at the time when the checkmodifier runs.

Oh, sorry! I misinterpreted the purpose of the various osmc-*-filesystem directories. Clearly it is in fact updated, and I was a) looking at the wrong directory (as the lack of a static qemu should have made obvious) and b) expecting it to update the sources.list when I did a build.sh (it doesn’t: if you already have a build dir, you have to do it by hand).

So the only real bug here is the elimination of all user input across upgrade breaking ssh, which is a little hard to fix given that the whole point of this thing is to work without an attached keyboard. It would be nice to at least print output from ucf on the screen somewhere, or at least report that at least one ucf-suggested change was denied, so that users know that trouble might lie ahead.

(Note that while this problem stopped my sshd from starting, it still does not explain why my boots after upgrade but before I pulled the SD card for investigation were failing to bring up NFS. I have no explanation for that at all, and NFS is working fine now. I know it can’t be the NFS server because my home directories are served from the same machine, and it’s where most of my work takes place: if it went down I would know in a fraction of a second. Maybe my wifi stuttered, but it’s very odd that the wifi stutter still let DHCP through, and that it happened right after an upgrade.)

FYI, with 4.19.122 checkmodifier still worked really well for me. I held down the key constantly from power-on until the getty came up, just in case. So it still works sometimes! and is damn helpful when it does.

That’s good to know.
CC @DBMandrake

You might have some luck with a UART – we’ll spawn a getty on it automatically, if you’re able to get that far in the boot process.

OK, I can confirm that when the build tree is actually running the right distro, and the sshd_config’s hardwired ciphers list is removed so it no longer cites nonexistent ciphers and the like… everything works like a dream. osmc continues its record of excellent service to the cause of making my guests jealous :slight_smile: (assuming I could have any guests in the middle of a pandemic.)

1 Like

Thanks for the follow up and kind words. I’m glad things are working as expected now

Anything we can do for you, give us a shout.

Sam