scraping and cleanstrings

Some of my movie folder names (always in a set) use prefixes to maintain a certain order on the filesystem. I had done this long before I started to use Kodi and I would like to keep that system.

e.g. Fast and the Furious Tokyo Drift was the 3rd movie that came out, but was chronologically after Fast & Furious 6.
Another example here:

Alien Anthology
    01. Alien (1979)
    02. Aliens (1986)
    03. Alien 3 (1992)
    04. Alien Resurrection (1997)
    A. Prometheus (2012)
    B. Alien Covenant (2017)

Or for the Star Wars movies I use letter prefixes for the side story movies (like Rogue, Solo, …) and number prefixes for the episodes.

However, the scraper does not find anything, because the prefixes mess things up. There’s an advanced setting in Kodi called cleanstrings, but I am a bit puzzled by the desription which states: Please note that everything right of the match (at the end of the file name) is removed.

I don’t have issues with writing regex, but this sentence makes no sense to me, especially when you look at the default settings for cleanstrings.

How can I use cleanstrings so that the prefixes are ignored by the scraper?

Maybe I should have posted this in the Kodi forum, but I reckoned somebody here must have had a similar problem.

Please don’t tell me to remove the prefixes or to use NFO files or the sort field in Kodi. I asked a specific question and if you don’t know the answer, please don’t reply with a workaround.

Any pointers are highly appreciated.

I do exactly the same thing with movie sets…files are in the format “01 - Movie Title.mkv”.

The only way to deal with it is to help the scraper find the right movie by creating a .NFO file with the basic information. For example, here’s one for Alien:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<movie>
    <title>Alien</title>
    <sorttitle>Alien 01: Alien</sorttitle>
    <set>
        <name>Alien (collection)</name>
    </set>
    <year>1979</year>
</movie>
http://www.imdb.com/title/tt0078748

The title and year are generally enough to let the scraper find the right match, but you can also add the URL to IMDB as the last line (like above). The sorttitle is used to make the movies sort the way I want in the set. Since this seems important to you, you’ll have to do the same thing if you want Kodi to show the movies in an order that isn’t one of the defaults (year, title, etc.).

The set info allows me to name the set the way I want, and to put movies into sets the way I want (so that all of Marvel Cinematic Universe is in one set, instead of a bunch of sets with a few standalone movies, which is how TMDB does it).

This is the only way to do it, because the cleanstrings feature removes everything after the regex match. Since you want to remove stuff at the beginning, it won’t work. The default settings are used to deal with file names like the following:

Tolkien.2019.1080p.BluRay.REMUX.AVC.DTS-HD.MA.5.1-FGT.mkv

The default regex would match at “1080p”, resulting in:

Tolkien.2019.mkv

The scraper can find this.

So, you could rename your movie files to use “suffixes” instead of prefixes, but then the files on disk won’t sort the way you want, which I guess is why you do it (since that’s why I do it).

Sorry I told you a way that you don’t like, but it’s the only automated way that doesn’t make you rename the files. The non-automated way is to manually scrape each file and pick the right movie from the dialog that you get when the scraper can’t find a match.

1 Like

Hmm, I really hoped I misunderstood the cleanstrings thingy. It’s kind of strange that the kodi devs never had to deal with this. I mean it’s a lot easier to write one singular regex than to create possibly hundreds of NFO files. Oh, well. I guess it’s either that or removing all prefixes. :frowning:

Thank you though for clearing that up.

One more question (for now :wink: ):

Is the sort title valid within the set only or globally. e.g. I’d use Alien A: Prometheus as sort title, but I couldn’t do that, if the sort title was global. By global I mean that if there was an option in kodi to list all movies (flattened), one movie per line would be shown. (I don’t even know if one can do that in kodi. I’m new to this. I’ve been accessing my media via a share only until now.)

You can save a bit of work with movies and just use a parsing nfo which is just a plain URL.

Right, but in that case the sort title is missing and the movies in the set might be jumbled.

You misunderstand. You make a text file that is named the same as a movie but with a nfo extension and inside is only a url to TheMovieDB.

As for the other it sounds like you are trying to do something outside of movie sets. Is there a reason for this. Movie sets are automatic for most titles and can be modified in the GUI.

The sort title is global. The option to show/not show movie sets is in Settings->Media Settings->Videos.

What’s even worse is that although sorttitle is global, it’s still only seems to be used on movies in sets.

This sounds like a bug in kodi.

I got that, but if I were to do that (only the URL in the NFO), Alien 3 would be sorted after Alien, instead of after Aliens.

You can go into a movie set and context menu on a movie then click manage and then edit sort title

It is much, much faster to bang out a .NFO file than to manually deal with all that is wrong with TMDB’s movie sets, especially if you already have .NFO files for other movies in the set. Copy the file, change a couple of lines, and you don’t have to do anything in the GUI.

I always include the IMDB URL at the end, because Kodi uses IMDB IDs in the id field. It stores both IMDB and TMDB IDs, but always puts the IMDB ID into the id tag.

I get that too, but using the UI for that stuff is even worse. At least for me. I tried that once with the web thingy so that I don’t have to use the remote to enter characters, but it didn’t take. But maybe that was because of the issue @nabsltd mentioned (sort title doesn’t work unless within a set).

I’ll just write a script that will create the NFO files for me. That’s fine.

It may now be fixed. Last I checked, I was testing if I could disable “Ignore Articles When Sorting” and set the sort title to what you would get by ignoring articles, and it did not work. Now, it seems to.

But, it is definitely global.

I think that is dependant of how your doing it. I get your point, but for me since I can auto scrape without the nfo and have a keyboard handy doing it in the GUI is actually less effort.

Since Kodi will happily create the .NFO files for you (export library, mutliple files), right-click drag to copy, rename, double click to open in a text editor, and change what you need.

Since I was already in that directory to copy the movie in the first place, it’s really quick.

Me too, except the movies with my prefixes. They are less than 10%, but if you have 2,000 movies, it’s still a lot of manual intervention. I’m gonna get some sleep now, but I will write the script tomorrow. No UI, no copy and paste, and no editing. Problem solved.

I might even be able to trigger it with the watchdog add-on.

Thanks guys! Have a great day or night!

1 Like

This is actually a bad example, because the scraper will get the following names for the movies (shown sorted):

Alien
Alien: Resurrection
Aliens
AlienÂł

So, yeah Alien: Resurrection is sorted wrong, but the others are sorted correctly.

I learned a long time ago that it is really not a good idea to try and manually impose a folder structure on your movie/music library that Kodi doesn’t really understand. Use an external scraper tool that can both scrape metadata and rename files and folders to your system of choice at the same time. I use Tiny Media Manager but there are others. Then set Kodi/OSMC to use local info only. This has the added advantage of when your database goes south or you need to install a new box your library can be recreated in minutes by just pointing Kodi at the relevant root folder.

Note that this is the same as exporting the database as multiple files, regardless of the “local info only” setting.

If a scraper finds information in a .NFO file, it does not override that info with data from the Internet. This setup is useful for eventually catching additions to the data, like when the cast list is updated.