The Socorro radix file system storage classes were written originally to be permanent crash storage. We eventually started using HBase instead of the file system. However, the file system storage lives on as temporary storage on the Socorro collectors. Even through the 2013 rewrite of the file system classes, the mind set of this being permanent storage persisted.
This creates a problem. The collectors
dump all the crashes into file system storage. The crash movers then
spool these temporarily stored crashes into HBase. As the crash
movers do their work, crashes are removed from file system storage.
For efficiency sake when working with a resource that is touched for
reads and writes by other processes simultaneously, the file system
classes do not clean up empty directories. As the days progress,
this leaves large empty directory trees hanging around on the disk.
Eventually, this become a problem as the disk runs out of inodes.
We did not solve this problem in 2013 FS rewrite. We couldn't come up with a good solution as for how to clean up the
old directories without slowing down the collectors, crashmover or
having a separate cleaning process or cron.
Here's a simple solution: recycling.
The file system classes could have a setting for the number of days
to keep old directories, say defaulting to one week. So in a week,
we've got these directories created:
primaryCrashStorage/20130501 primaryCrashStorage/20130502 primaryCrashStorage/20130503 primaryCrashStorage/20130504 primaryCrashStorage/20130505 primaryCrashStorage/20130506 primaryCrashStorage/20130507
On May 8th, we'd normally create a new
directory called
primaryCrashStorage/20130508
. All those older directories are
empty trees, the crashmover consumed them days previously. I suggest
that the file system classes take the oldest and simply rename it for
the current day. The internal directory structure is already been
built, storing crashes can begin immediately.
mv primaryCrashStorage/20130501 primaryCrashStorage/20130508
The
number_of_days
should be set to a
bit larger than the longest time that we'd could expect the
crashmovers to out of commission in a crisis. If we want the file
system to be permanent storage, set the constant to
MAXINT
(or make a special case for 0).
This system has two benefits: first
the inode problem goes away without having to slow down anything or
create another process; second it actually speeds up the collector,
after the first week, it doesn't have to create radix directories any
more, they've already been created.
I've no idea why this idea didn't come
to me before – it seems to be nothing but win-win.