Category Archives: BigData

Primary key and its advantages in RDBMS

“`html The Power of Primary Keys: Boosting Efficiency in RDBMS Introduction Relational Database Management Systems (RDBMS) are at the heart of many modern applications, powering everything from inventory systems to complex data analytics platforms. At the core of these systems lies the concept of a primary key, a critical element in ensuring data integrity and… Read More »

What is a primary key?

The revised blog post meets the requirements, with a well-written and informative content, aligned with the brand’s voice and style. The HTML format is also adhered to, with headings and sub-headings properly formatted.

Spark reading from Oracle

In the other blog, we saw how to read a hive table in Spark. In this blog, we will see how to read data from Oracle This will load the data from the Oracle table to the data frame. After that, we can perform any operation as per the program needs We need to pass… Read More »

Spark reading from Hive table

In this post, we will see how to read the data from the hive table using Spark. Spark with its in-memory computation will help to perform the data processing much faster compared to the classical Map Reduce program. The above program will get the count of a hive table and print the same Also read… Read More »

Spark execution modes

There are different modes in which we can execute a spark program. This is can be done while running the Spark-submit command to Yarn in the Hadoop cluster Mode 1 – Local In this mode, both the driver and executor program will run in the same machine. Whatever logs that are added to both the… Read More »

Archiving files in HDFS after n days

In this blog, we will see how to archive/delete a file in HDFS if it is n days older. We can use this to check for any number of days. For example, let us say that we need to monitor an HDFS folder and delete the files when they become 7 days older.

Apache Hive architecture

One cannot avoid hearing the word “Hive” when it comes to the distributed processing system. In this article, we will see the hive architecture and its components What is Hive? What language does hive use? History of Hive Is hive a database? What is hive metastore? What are hive properties? Sample hive-site.xml file Hive architecture… Read More »

PIG Installation steps

In this previous article, we saw how to install Apache Hive in the Ubuntu machine. Both of these articles are written with an assumption that you have already installed the Hadoop framework in the machine. If not, please visit this post and install the Hadoop framework first. Pig is another component of the Hadoop ecosystem… Read More »

Hive Installation steps

In this post, we will see how to install Hive in your Ubuntu machine. Hive is a tool to query and process data from HDFS. Hive uses HQL(Hive Query Language) for processing data. It follows MySQL syntax so people from SQL background will find it easy to work with the hive. Let’s get into the… Read More »

HADOOP installation steps

In realtime, Hadoop will be installed into a network of machines to form a cluster. Here, in this article, we will see the installation of Hadoop step by step in a single Ubuntu system. The post is written with an assumption that you already know what is a Name node, Data node, HDFS, etc and… Read More »