System: Identifying unused resources with inotifywait
This article relates mainly to websites, but the technique could be applied in other situations. The issue at hand is a website which has accumulated a large number of files over time, and how to easily identify files which are no longer required.
Generating a list of files
The first step is to generate a list of filenames to be monitored. In this case we're limiting the process to files in the /images/ directory of a single website, but the file location and type is not really relevant.
To generate a list of image files, we run the following as root:
sudo ls -1 /var/www/example.net/images/*.* > ~/inotify-filelist.txt
You should check that the contents of this file are as expected.
All files in the list will be monitored by the daemon. To exclude certain files you can either delete the respective line, or prefix it with @.
Monitoring files with inotifywait
Next we create our script, which does the following:
- scan the events log for 'OPEN' events;
- prefix matching files in the file list with @;
- truncate the events log;
- kill any existing inotifywait daemon; and
- start a new inotifywait daemon.
~/inotify-monitor.sh:
#!/bin/bash
FILELIST=~/inotify-filelist.txt
EVENTSLOG=~/inotify-events.log
OPENED=$(awk '($2 = /OPEN/){print $1}' $EVENTSLOG)
for file in $OPENED
do
perl -pi -e "s~^$file~\@$file~" $FILELIST
done
> $EVENTSLOG
pkill inotifywait
inotifywait --daemon --fromfile $FILELIST --outfile $EVENTSLOG -e open
exit 0;
You will need to make the script executable, and initialise the events log:
sudo chmod 750 ~/inotify-monitor.sh
sudo touch ~/inotify-events.log
and then it can be run, using:
sudo ~/inotify-monitor.sh
You may need to restart your web server to flush its internal cache:
sudo apache2ctl configtest && sudo apache2ctl graceful
The bash script needs to be re-run periodically to record which files have been accessed and restart the daemon monitoring only the remaining files. More on that below.
Make sure not to run a grep or similar command line action over files in the monitored files list as this will generate an 'OPEN' event on all files. Some backup or security scripts may have the same effect.
For this reason we advise running the script only manually and after checking that the events log hasn't been flooded.
Processing the events log
What appears in ~/inotify-events.log will be a series of events logging when a file is opened:
sudo tail -f ~/inotify-events.log
...
/var/www/example.net/images/apple-podcasts-white.png OPEN
/var/www/example.net/images/overcast.png OPEN
/var/www/example.net/images/spotify.png OPEN
/var/www/example.net/images/google-podcasts.png OPEN
/var/www/example.net/images/new.png OPEN
...
Running the script again now will extract the list of files that have been opened, and prefix them with @ in the file list:
...
/var/www/example.net/images/social-buttons.png
@/var/www/example.net/images/spotify.png
/var/www/example.net/images/stars-gold.png
/var/www/example.net/images/stars-grey.png
...
Over time, any 'active' files in the files list will be marked with @ allowing you to identify inactive files. They can then be moved to a separate location pending deletion (or restoration in the case of new 404 errors).
Cleaning up
Once you've finished with the script all that remains is to kill the final daemon:
sudo pkill inotifywait
Note that the script as written supports only a single instance of the inotifywait daemon, as pkill is indiscriminate.