Delete stale files by last accessed time and created time (Bash)
I implemented file based caching on one of my projects to cache large number of responses which are CPU and memory intensive. The file caching worked as a charm at start but soon I started to hit the storage capacity of the web server. So I wanted a script that can be configured in cron job which then can checks the configured directory in regular intervals to see whether the directory has reached a defined threshold. If so then delete stale files that are no more in use.
My algorithm flow:
- Check whether the directory reached threshold.
- If reached then start deleting files that are not accessed in last 30 days.
- If still the threshold is not reached then decrement access time by 10 days and proceed deleting those files. ( This recursive loop goes on till the access time reaches a configured minimum limit. I don't want to delete all the caches files that are recently being accessed).
- If still the threshold is not reached then start deleting the oldest created files starting from 30 days and decrementing by 10 days till a pre configured minimum limit is reached.
- If still the threshold is not reached then email a notification to a configured email address.
Implementation
The following script is designed to be run either manually or can be configured as a cron job to monitor a specific directory. The script takes 2 arguments.
- Directory path.
- Threshold limit ( in MB)
Usage:
$./file_cache_clean.sh [directory_path] [threshold_limit_in_mb]
Optional Step: Set up your crontab
To make this run every 12 hours, I added this to my crontab (using crontab -e):
0 0,12 * * * /home/user/scripts/file_cache_clean.sh /var/files/cache 1000