{"id":924,"date":"2021-06-14T13:59:19","date_gmt":"2021-06-14T08:29:19","guid":{"rendered":"https:\/\/techieshouts.com\/?p=924"},"modified":"2022-08-09T19:04:07","modified_gmt":"2022-08-09T13:34:07","slug":"archiving-files-in-hdfs-after-n-days","status":"publish","type":"post","link":"https:\/\/techieshouts.com\/home\/archiving-files-in-hdfs-after-n-days\/","title":{"rendered":"Archiving files in HDFS after n days"},"content":{"rendered":"\n<p>In this blog, we will see how to archive\/delete a file in HDFS if it is n days older. We can use this to check for any number of days.<\/p>\n\n\n\n<p>For example, let us say that we need to monitor an HDFS folder and delete the files when they become 7 days older. <\/p>\n\n\n\n<pre class=\"EnlighterJSRAW\" data-enlighter-language=\"shell\" data-enlighter-theme=\"\" data-enlighter-highlight=\"\" data-enlighter-linenumbers=\"\" data-enlighter-lineoffset=\"\" data-enlighter-title=\"\" data-enlighter-group=\"\">today=`date +'%s'`\nhdfs dfs -ls \/HDFS_PARENT_FOLDER\/ | grep \"^d\" | while read line ; do\nfilepath=$(echo ${line} | awk '{print $8}')\necho \"\"\necho \"Currently checking ${filepath} directory..\"\n    hdfs dfs -ls ${filepath} | grep \"${filepath}\" | while read innerline ; do\n    hdfsfile=$(echo ${innerline} | awk '{print $8}')\n    filedate=$(echo ${innerline} | awk '{print $6}')\n    difference=$(( ( ${today} - $(date -d ${filedate} +%s) ) \/ ( 24*60*60 ) ))\n    if [ ${difference} -gt 7 ]; then\n      echo \"File: ${hdfsfile} is ready to be deleted,  Diff:${difference} days\"\n      hdfs dfs -rm ${hdfsfile}\n    fi\n    done\ndone<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>In this blog, we will see how to archive\/delete a file in HDFS if it is n days older. We can use this to check for any number of days. For example, let us say that we need to monitor an HDFS folder and delete the files when they become 7 days older.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[11,125,16,8],"tags":[130,128,129],"_links":{"self":[{"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/posts\/924"}],"collection":[{"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/comments?post=924"}],"version-history":[{"count":1,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/posts\/924\/revisions"}],"predecessor-version":[{"id":925,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/posts\/924\/revisions\/925"}],"wp:attachment":[{"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/media?parent=924"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/categories?post=924"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/techieshouts.com\/home\/wp-json\/wp\/v2\/tags?post=924"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}