In this post, we will see how to read the data from the hive table using Spark. Spark with its in-memory computation will help to perform the data processing much faster compared to the classical Map Reduce program.
from pyspark import SparkConf, SparkContext from pyspark.sql import SQLContext, Row from pyspark.sql.types import * from pyspark.sql.functions import lit from pyspark.sql import HiveContext from pyspark.sql import SparkSession appname = "Application name" spark = SparkSession.builder.appName(appname).enableHiveSupport().getOrCreate() countsql = "select count(*) as tot_cnt from database.tablename" print("Count SQL") print("---------") print(countsql) rec_cnt = spark.sql(countsql).first()[0] print(rec_cnt)
The above program will get the count of a hive table and print the same
Also read