In this post, we will see how to read the data from the hive table using Spark. Spark with its in-memory computation will help to perform the data processing much faster compared to the classical Map Reduce program.
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext, Row
from pyspark.sql.types import *
from pyspark.sql.functions import lit
from pyspark.sql import HiveContext
from pyspark.sql import SparkSession
appname = "Application name"
spark = SparkSession.builder.appName(appname).enableHiveSupport().getOrCreate()
countsql = "select count(*) as tot_cnt from database.tablename"
print("Count SQL")
print("---------")
print(countsql)
rec_cnt = spark.sql(countsql).first()[0]
print(rec_cnt)
The above program will get the count of a hive table and print the same
Also read
