There are different modes in which we can execute a spark program. This is can be done while running the Spark-submit command to Yarn in the Hadoop cluster Mode 1 – Local In this mode, both the driver and executor program will run in the same machine. Whatever logs that are added to both the… Read More »
In this post, we will see how to establish a passwordless SSH connection between two Linux machines. This can be done by using the Public and Private keys. Once the keys are set up, the authentication will be done using these keys instead of the password. Let us consider there are two Linux machines/servers ServerA… Read More »
In this blog, we will see how to archive/delete a file in HDFS if it is n days older. We can use this to check for any number of days. For example, let us say that we need to monitor an HDFS folder and delete the files when they become 7 days older.
In Python, we can merge two dictionary objects very easily. First, we create two dictionaries then we can merge and sort the dictionaries based on the values. In python 2.x we can use the following code, Python 2.x In the below example, the two dictionaries are created and then merge the dictionary keys with the… Read More »
In the following code, we are assigning default values to the dictionary. First, we can create the dictionary and assigning the values to the names. Then, we are trying to fetch the dictionary value based on the key. If there is no such key, we will assign a default value.
Let us take a real-time example of the banking system. The bank wants to offer a loan only to the active premium customers who are not on the defaulter’s list. For this, we are using the following code in python. Method: 1 Method: 2 In this example, if the bank gives a loan to the… Read More »
BTEQ is a powerful utility in Teradata for various reasons. You can write the data of a table into a file using the BTEQ export utility. You can also use BTEQ for executing conditional statements based on certain logic, BTEQ can also be used for executing all kind of DML statements. In this post, we… Read More »
Looping through the list is a very useful and much-needed function for every scripting and programming language. The most common looping method is the programming world is the for loop. In this blog, we will see how to iterate over strings separated by a delimiter in a shell script. For loop Syntax Let us see… Read More »
In the ETL world, it is always a need for processing the delimited files. While reading a delimited file can be done through all the scripts and programming languages, we will see how this can be done in a shell script using the cut command Syntax cut -d “|” => This is to tell the… Read More »
Command line arguments are essential for any programming or scripting language to have a control of the the input parameters. Passing arguments to the shell script can be done in multiple ways. In this blog, we will see the two ways of passing arguments to a shell script Option 1 – Using getops This method… Read More »