Home > Uncategorised > More on MythTV File Fragmentation

More on MythTV File Fragmentation

October 9th, 2008 Leave a comment Go to comments

I posted last night about slow delete response time in MythTV. At some point, I mentioned that I would explain why file fragmentation could be an issue on a MythTV box. To be fair, this isn’t MythTV’s fault – it’s a problem that would affect all media centre computers, regardless of file system, OS or architecture.

This is all from information I learnt last night, so it’s an amalgamation of various sources, some which I forget.

The typical line about Linux file systems (particularly ext3) is that they do not need to be typically defragmented. From the reasons that were given, it does make sense that in a lot of instances, that is correct and they do not need to be defragged.

Basically, when a file is created, the file system drivers do not simply try and fill all available blocks from the start of the disk (e.g. like FAT does), that is to say, imagine two files – A and B – (size/number of blocks isn’t important) which are separated by a space of 10 free blocks. Now imagine a third file (C) is written to disk – it is 15 blocks in size. With a FAT (or similar) file system is used, it will write the first 10 blocks of C in-between files A and B, and write the final 5 blocks after file B. With a more intelligent file system (say ext3), the system will see the empty 10 blocks and ignore them, it will seek for a space which is 15 blocks in size. Only if it cannot find a space of an appropriate size, will it use the free spaces between existing files.

As well as this, file systems such as ext3 will preallocate/reserve a few extra blocks around a file, to allow for future file growth. This again reduces the need to find new spaces, as they is usually enough room to accommodate the increase.

Also, something else to note is that (allegedly) ext3 is good at this, xfs is much better, and ReiserFS (v3) is crap at doing this (and hence there is, apparently, a need to defragment ReiserFS partitions).

This is all very well and good, and I can understand where people are coming from when they say “You don’t need to defragment Linux file systems”, however there is a problem with this and it applies to media centres.

The problem is quite simple: Recordings on a media centre are large, and they grow at an unknown rate for long periods of time.

Think about it for a second, if you copy a Word document on to a partition, the chances are it’s not going to grow too much, and if it does, it will happen in a quick burst. With a media file, that is being recorded from a live stream, it will be drip-fed data over a period of time – an hour long programme is not an unusual programme length.

This means that the preallocation of blocks isn’t going to be enough – you are quickly going to overrun those buffer-blocks. As more and more data is fed in, the file system driver is not going to perpetually move the recording file into free space – imagine copying an hour long recording of approximately 4GB every time you over run a few kilobytes of buffer blocks.

Couple this with the fact that you are not going to delete recordings in the order you recorded them in – if you’re like me, you delete them as you watch them, and maybe save a few for later watching. That behaviour is going to leave free-space holes all over the place, and I really don’t know how the file system picks up those holes – may be it will start to use those holes up first as initially your recording file is going to be small.

So, so far this points to the idea that a media centre is more likely to need defragmenting than a usual desktop machine, or even server. So I would expect to see more fragmentation on a media centre, and therefore, this could be part of my problem.

Applying a bit of lateral thinking, and a bit of background knowledge, I can also understand why this wouldn’t affect something like a database server. The reason is quite simple: for large database servers, the database engine will often preallocate large files to store the data in, rather than growing them organically. Therefore, the files are often big enough to soak up the growing and shrinking of a database and you don’t get the same fragmentation problem.

So, now here is the killer blow as well…. What if you’re recording two programmes at the same time?

I’m in this exact situation – I have two tuner cards in my MythTV box, so that I can watch one channel whilst I record another, or I can record two programmes at the same time. Each programme will have it’s own media file which will grow organically. Each time the file grows, it conflicts with the free space of the other recording, continually battling for free space. I can see it now in my head – the disk cluttered with alternating blocks from two files! Fragmentation hell…..

I think I’ll have to defragment my drives…..

I’ll post on that later.

Categories: Uncategorised Tags:
  1. No comments yet.
  1. No trackbacks yet.