Spark reading from Hive table

By | 15th June 2021

In this post, we will see how to read the data from the hive table using Spark. Spark with its in-memory computation will help to perform the data processing much faster compared to the classical Map Reduce program.

from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext, Row
from pyspark.sql.types import *
from pyspark.sql.functions import lit
from pyspark.sql import HiveContext
from pyspark.sql import SparkSession

appname = "Application name"
spark = SparkSession.builder.appName(appname).enableHiveSupport().getOrCreate()
countsql = "select count(*) as tot_cnt from database.tablename"
print("Count SQL")
print("---------")
print(countsql)
rec_cnt = spark.sql(countsql).first()[0]
print(rec_cnt)

The above program will get the count of a hive table and print the same

Also read

  1. Spark execution modes
  2. Spark reading from Oracle