I implemented file based caching on one of my projects to cache large number of responses which are CPU and memory intensive. The file caching worked as a charm at start but soon I started to hit the storage capacity of the web server. So I wanted a script that can be configured in cron job which then can checks the configured directory in regular intervals to see whether the directory has reached a defined threshold. If so then delete stale files that are no more in use.

My algorithm flow:

  • Check whether the directory reached threshold.
  • If reached then start deleting files that are not accessed in last 30 days.
  • If still the threshold is not reached then decrement access time by 10 days and proceed deleting those files. ( This recursive loop goes on till the access time reaches a configured minimum limit. I don't want to delete all the caches files that are recently being accessed).
  • If still the threshold is not reached then start deleting the oldest created files starting from 30 days and decrementing by 10 days till a pre configured minimum limit is reached.
  • If still the threshold is not reached then email a notification to a configured email address.

Implementation

The following script is designed to be run either manually or can be configured as a cron job to monitor a specific directory. The script takes 2 arguments.

  • Directory path.
  • Threshold limit ( in MB)

Usage:

  $./file_cache_clean.sh [directory_path] [threshold_limit_in_mb]

Optional Step: Set up your crontab

To make this run every 12 hours, I added this to my crontab (using crontab -e):

0 0,12 * * * /home/user/scripts/file_cache_clean.sh /var/files/cache 1000