Archiving files in HDFS after n days

By | 14th June 2021

In this blog, we will see how to archive/delete a file in HDFS if it is n days older. We can use this to check for any number of days.

For example, let us say that we need to monitor an HDFS folder and delete the files when they become 7 days older.

today=`date +'%s'`
hdfs dfs -ls /HDFS_PARENT_FOLDER/ | grep "^d" | while read line ; do
filepath=$(echo ${line} | awk '{print $8}')
echo ""
echo "Currently checking ${filepath} directory.."
    hdfs dfs -ls ${filepath} | grep "${filepath}" | while read innerline ; do
    hdfsfile=$(echo ${innerline} | awk '{print $8}')
    filedate=$(echo ${innerline} | awk '{print $6}')
    difference=$(( ( ${today} - $(date -d ${filedate} +%s) ) / ( 24*60*60 ) ))
    if [ ${difference} -gt 7 ]; then
      echo "File: ${hdfsfile} is ready to be deleted,  Diff:${difference} days"
      hdfs dfs -rm ${hdfsfile}
    fi
    done
done